Predicting diabetes over a five year period using logistic regression and the Pima First-Nation dataset

Last update: Mar 28, 2022

Related tags

Overview

Diabetes

This script uses the Pima First Nations dataset to create a model to predict whether or not an individual will develop Diabetes Mellitus Type 2 within a five year time span

This is a quick little project involving regression analysis and diabetes. I have created this project to better my understanding of not only the content currently being covered in my anatomy and physiology course, but also to practice working with simple regression models and common libraries.

So far, this model is able to predict values with a ~75% accuracy (not bad given the lack of data and size of the model, but not great). There are several ways to optimize this model. A few I can think of off the top of my head would be gathering more data to train it on, and cleaning the data in a different way (ie... not replacing 0 values with the mean value of that column).

Dataset found on kaggle: https://www.kaggle.com/kumargh/pimaindiansdiabetescsv

Owner

GitHub Repository

PennyLane is a cross-platform Python library for differentiable programming of quantum computers

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural ne

1.6k Jan 01, 2023

a distributed deep learning platform

Apache SINGA Distributed deep learning system http://singa.apache.org Quick Start Installation Examples Issues JIRA tickets Code Analysis: Mailing Lis

2.7k Jan 05, 2023

A machine learning project that predicts the price of used cars in the UK

Car Price Prediction Image Credit: AA Cars Project Overview Scraped 3000 used cars data from AA Cars website using Python and BeautifulSoup. Cleaned t

7 Oct 13, 2022

Extreme Learning Machine implementation in Python

Python-ELM v0.3 --- ARCHIVED March 2021 --- This is an implementation of the Extreme Learning Machine [1][2] in Python, based on scikit-learn. From

511 Dec 20, 2022

A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

1.9k Dec 31, 2022

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022

Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

2k Jan 02, 2023

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

1 Jan 11, 2022

This handbook accompanies the course: Machine Learning with Hung-Yi Lee

472 Dec 31, 2022

Python based GBDT implementation

Py-boost: a research tool for exploring GBDTs Modern gradient boosting toolkits are very complex and are written in low-level programming languages. A

20 Sep 21, 2022

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

65 Dec 20, 2022

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

6 Jun 30, 2022

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning My

3 Apr 10, 2022

A benchmark of data-centric tasks from across the machine learning lifecycle.

61 Dec 28, 2022

XAI - An eXplainability toolbox for machine learning

XAI - An eXplainability toolbox for machine learning XAI is a Machine Learning library that is designed with AI explainability in its core. XAI contai

875 Dec 27, 2022

A Python package to preprocess time series

Disclaimer: This package is WIP. Do not take any APIs for granted. tspreprocess Time series can contain noise, may be sampled under a non fitting rate

57 Dec 17, 2022

Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

Parallelized symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution, simulated annealing, and gradient-free optimization.

924 Jan 03, 2023

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

2.5k Jan 06, 2023

Xeasy-ml is a packaged machine learning framework.

xeasy-ml 1. What is xeasy-ml Xeasy-ml is a packaged machine learning framework. It allows a beginner to quickly build a machine learning model and use

9 Mar 14, 2022

This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022