This is a project for analysis and estimation of House Prices in King County USA The .csv file contains the data of the house and the .ipynb file contians the analysis and code This project is done on Jupyter notebook The project uses Linear Regression and Pipeline() to fit and predict the prices.
This is an analysis and prediction project for house prices in King County, USA based on certain features of the house
Overview
Python reader for Linked Data in HDF5 files
Linked Data are becoming more popular for user-created metadata in HDF5 files.
Pyspark Spotify ETL
This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data
PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems
PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems
Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions.
About Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions. The tool provides rich data and a summary g
statDistros is a Python library for dealing with various statistical distributions
StatisticalDistributions statDistros statDistros is a Python library for dealing with various statistical distributions. Now it provides various stati
WaveFake: A Data Set to Facilitate Audio DeepFake Detection
WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.
Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm
An extension to pandas dataframes describe function.
pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie
Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.
Elicited Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations. Credit to Brett Hoove
A probabilistic programming language in TensorFlow. Deep generative models, variational inference.
Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilis
Approximate Nearest Neighbor Search for Sparse Data in Python!
Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents).
Functional Data Analysis, or FDA, is the field of Statistics that analyses data that depend on a continuous parameter.
Functional Data Analysis Python package
Data analysis and visualisation projects from a range of individual projects and applications
Python-Data-Analysis-and-Visualisation-Projects Data analysis and visualisation projects from a range of individual projects and applications. Python
peptides.py is a pure-Python package to compute common descriptors for protein sequences
peptides.py Physicochemical properties and indices for amino-acid sequences. 🗺️ Overview peptides.py is a pure-Python package to compute common descr
Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.
Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.
This project is the implementation template for HW 0 and HW 1 for both the programming and non-programming tracks
This project is the implementation template for HW 0 and HW 1 for both the programming and non-programming tracks
songplays datamart provide details about the musical taste of our customers and can help us to improve our recomendation system
Songplays User activity datamart The following document describes the model used to build the songplays datamart table and the respective ETL process.
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.
pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit
scikit-survival is a Python module for survival analysis built on top of scikit-learn.
scikit-survival scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizi
Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.
ETL Pipeline with Airflow, Spark, s3, MongoDB and Amazon Redshift