Using the provided dataset which includes various book features, in order to predict the price of books, using various proposed methods and models.

Last update: Jan 13, 2022

Overview

Predict-The-Price-Of-Books

For this task, a big dataset which consists of book of different genres and authors was utilized. The provided dataset included various book features, such as Author, Edition, Reviews, etc. Those features have been used as regressors in order to predict the price of books, using various proposed methods and models.

Author: Nikolas Petrou, MSc in Data Science

Technical-Report and Code Availability

A complete file-folder guide is located in the folder-file guide folder
The technical report and analysis of the work is available and located in report.pdf file
The implementation and code of the project is located in the code files folder

Dataset Overview

Regarding the data of this work, there is an online competition for this task, which has been up since 27/09/2019. Currently, the competition has 3579 participants in total. The data was downloaded directly from MachineHack. There were two files forthe train and test sets. The training and test sets included 6237 and 1560 records respectively. In addition, the values of the target variable (Price) were not included in the test set, as the evaluation of the test set is employed through the website of MachineHack.

Methodology

Some of the key methods which were used throughout the work are:

Visualization
TF-IDF and LDA Topic Extraction
Text-tranlsation using Google Trasnlate Ajax API
Cyclical feature encoding for time-based feature extraction
Price Prediction using different conventional and advanced algorithms (e.g. GBM, RF, SVM, CatBoost, LightGBM)

An abstract methodology scheme of the work is illustrated in the following Figure.

Summarizing, firstly the exploratory data understanding process was commenced. Each feature was assessed in order to obtain a better understanding of what it represents and how it could affect book pricing. Next, each future was brought into a format that was appropriate for model development. Following, through visualization, it was examined how the different features were correlated to the dependent-target variable. Furthermore, the processed data were used to implement the employed models. The prediction-modelling phase was conducted with two different approaches. Finally, the whole methodology procedure followed a cyclical behaviour, until the final prediction model was implemented.

Using the provided dataset which includes various book features, in order to predict the price of books, using various proposed methods and models.

Related tags

Overview

Predict-The-Price-Of-Books

Technical-Report and Code Availability

Dataset Overview

Methodology

Owner

Nikolas Petrou

Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

Implemented fully documented Particle Swarm Optimization algorithm (basic model with few advanced features) using Python programming language

Official implementation for paper Render In-between: Motion Guided Video Synthesis for Action Interpolation

FlingBot: The Unreasonable Effectiveness of Dynamic Manipulations for Cloth Unfolding

DP-CL(Continual Learning with Differential Privacy)

PyTorch Implementation of Sparse DETR

AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

Python package for downloading ECMWF reanalysis data and converting it into a time series format.

Implementation of Online Label Smoothing in PyTorch

TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics.

Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)

Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

DROPO: Sim-to-Real Transfer with Offline Domain Randomization

A TensorFlow implementation of SOFA, the Simulator for OFfline LeArning and evaluation.

Face Recognition & AI Based Smart Attendance Monitoring System.

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

OOD Generalization and Detection (ACL 2020)

Transformer model implemented with Pytorch