Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

Overview

A method to solve the Higgs boson challenge using Least Squares - Novae

This project is the Project 1 of EPFL CS-433 Machine Learning. The project is the same as the Higgs Boson Machine Learning Challenge posted on Kaggle. The dataset and the detailed description can also be found in the GitHub repository of the course.

Team name: Novae

Team members: Giacomo Orsi, Vittorio Rossi, Chun-Tso Tsai

About the Project

The task of this project is to train a model based on the provided train.csv to have the best prediction on the data given in test.csv or any other general case.

We built our model for the problem using regularized linear regression after applying some data cleaning and features engineering techniques. A report describing our approach and our results can be found in the file report.pdf. In the end, we obtained an accuracy of 0.836 and an F1 score of 0.751 on the test.csv dataset.

Instructions

  • The project runs under Python 3.8 and requires NumPy=1.19.
  • Please make sure to place train.csv and test.csv inside the data folder. Those files can be downloaded here.
  • Go to the script/ folder and execute run.py. A model will be trained with the given hyper-parameters and predictions for the test dataset will be outputed in the file out.csv.

Modules

implementations.py

Contains the implementations of different learning algorithms. Including

  • Least squares linear regression
    • least_squares: Direct computation from linear equations.
    • least_squares_GD: Gradient descent.
    • least_squares_SGD: Stochastic gradient descent.
    • ridge_regression: Regularized linear regression from direct computation.
  • Logistic regression
    • logistic_regression: Gradient descent
    • reg_logistic_regression: Gradient descent with regularization.

There are also some helper functions in this file to facilitate the above functions.

data_processing.py

Calls the following files to process the data.

  • data_cleaning.py: Contains functions used to
    1. Categorize data into subgroups.
    2. Replace missing values with the median.
    3. Standardize the features.
  • feature_engineering.py: Contains functions used to generate our interpretable features.

run.py

Generates the submission .csv file based on the data of test.csv stored in the folder data/. Our optimized model is also defined in this file.

Some helper Functions

  • models.py: Create the models for predicting the labels for new data points without true labels.
  • expansions.py: Contains a function to apply polynomial expansion to our features to add extra degrees of freedom for our models.
  • proj1_helpers.py: Contains functions which loads the .csv files as training or testing data, and create the .csv file for submission.
  • cross_validation.py: Contains a function to build the index for k-fold cross_validation.
  • disk_helper.py: Save/load the NumPy array to disk for further usage. Useful for saving hyper-parameters when trying a long training process.

Notebook

It is possible to use the Jupyter notebook project_notebook.ipynb located in the scripts folder to train the best hyper-parameters for the model. In the notebook it is possible to cross-validate a logistic and a least square regression model over given lambdas and degrees.

Owner
Giacomo Orsi
CS Student at EPFL. Previously at University of Bologna
Giacomo Orsi
Official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels".

WarPI The official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels". Run python main.py --corruption_type

Haoliang Sun 3 Sep 03, 2022
An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

PyTorch implementation of SFNet This is the implementation of the paper "SFNet: Learning Object-aware Semantic Correspondence". For more information,

CV Lab @ Yonsei University 87 Dec 30, 2022
Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion

CSF Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion Tips: For testing: CUDA_VISIBLE_DEVICES=0 python main.py For trai

Han Xu 14 Oct 31, 2022
Unofficial pytorch implementation of 'Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization'

pytorch-AdaIN This is an unofficial pytorch implementation of a paper, Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [Hua

Naoto Inoue 873 Jan 06, 2023
ICLR2021 (Under Review)

Self-Supervised Time Series Representation Learning by Inter-Intra Relational Reasoning This repository contains the official PyTorch implementation o

Haoyi Fan 58 Dec 30, 2022
Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Certified Robustness to Adversarial Word Substitutions This is the official GitHub repository for the following paper: Certified Robustness to Adversa

Robin Jia 38 Oct 16, 2022
Official repository for ABC-GAN

ABC-GAN The work represented in this repository is the result of a 14 week semesterthesis on photo-realistic image generation using generative adversa

IgorSusmelj 10 Jun 23, 2022
My personal Home Assistant configuration.

About This is my personal Home Assistant configuration. My guiding princile is to have full local control of all my devices. I intend everything to ru

Chris Turra 13 Jun 07, 2022
Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs

Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs In this work, we propose an algorithm DP-SCAFFOLD(-warm), whic

19 Nov 10, 2022
Thermal Control of Laser Powder Bed Fusion using Deep Reinforcement Learning

This repository is the implementation of the paper "Thermal Control of Laser Powder Bed Fusion Using Deep Reinforcement Learning", linked here. The project makes use of the Deep Reinforcement Library

BaratiLab 11 Dec 27, 2022
Model Serving Made Easy

The easiest way to build Machine Learning APIs BentoML makes moving trained ML models to production easy: Package models trained with any ML framework

BentoML 4.4k Jan 08, 2023
Centroid-UNet is deep neural network model to detect centroids from satellite images.

Centroid UNet - Locating Object Centroids in Aerial/Serial Images Introduction Centroid-UNet is deep neural network model to detect centroids from Aer

GIC-AIT 19 Dec 08, 2022
Selective Wavelet Attention Learning for Single Image Deraining

SWAL Code for Paper "Selective Wavelet Attention Learning for Single Image Deraining" Prerequisites Python 3 PyTorch Models We provide the models trai

Bobo 9 Jun 17, 2022
Code to reproduce the results for Compositional Attention

Compositional-Attention This repository contains the official implementation for the paper Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal 58 Nov 30, 2022
Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

CorrNet This project provides the code and results for 'Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation'

Gongyang Li 13 Nov 03, 2022
Place holder for HOPE: a human-centric and task-oriented MT evaluation framework using professional post-editing

HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation Place holder for dat

Lifeng Han 1 Apr 25, 2022
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Seokeon Choi 35 Oct 26, 2022
"SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image", Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Humphrey Shi, Zhangyang Wang

SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image [Paper] [Website] Pipeline Code Environment pip install -r requirements

VITA 250 Jan 05, 2023
A simple software for capturing human body movements using the Kinect camera.

KinectMotionCapture A simple software for capturing human body movements using the Kinect camera. The software can seamlessly save joints and bones po

Aleksander Palkowski 5 Aug 13, 2022
N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting Recent progress in neural forecasting instigated significant improvements in the

Cristian Challu 82 Jan 04, 2023