A naive Bayes model for cancer classification using a set of documents

Last update: Nov 24, 2021

Related tags

Machine Learning naivebayes

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

Purpose
Requirements/files included
How to use

1. Purpose

The Purpose of this program is to read in from csv files containing two columns:

                    Document | classifcation
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer

This program uses the data to read into classes containing each documents one file is used as the training set, and the other as the testing set. Each set goes through the same tokenization. From there one is trained and the other is tested.

2. Requirements/files used

* python3 * numpy library - for calculating log * pandas library - for reading in csv files * main.py and naivesbayes.py * stopwords.txt - list of stop words * Scoring.docx - list of scoring for precsion, Recall, F-score

3. How to use

This program has 3 modes of operation for tokenizing your sets:

                $python3 main.py -train 1 -test 1

This first command will execute std tokenization on training set 1 and test set 1. To change which training set just change the 1 into a 2.

                $python3 main.py -train 2 -test 1

#NOTE do not change testing set number leave it as 1 it was intended for multiple testing sets

For binary:

                $python3 main.py -train # -test 1 -b

For stopwords:

                $python3 main.py -train # -test 1 -s

For both stopwords and binary:

                $python3 main.py -train # -test 1 -b -s

A naive Bayes model for cancer classification using a set of documents

Related tags

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

1. Purpose

2. Requirements/files used

3. How to use

Owner

Alex W King

MIT-Machine Learning with Python–From Linear Models to Deep Learning

As we all know the BGMI Loot Crate comes with so many resources for the gamers, this ML Crate will be the hub of various ML projects which will be the resources for the ML enthusiasts! Open Source Program: SWOC 2021 and JWOC 2022.

A Python implementation of GRAIL, a generic framework to learn compact time series representations.

Python package for machine learning for healthcare using a OMOP common data model

Machine-Learning with python (jupyter)

Nevergrad - A gradient-free optimization platform

The Emergence of Individuality

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

ML Optimizers from scratch using JAX

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading

A library to generate synthetic time series data by easy-to-use factors and generator

Dragonfly is an open source python library for scalable Bayesian optimisation.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

pandas, scikit-learn, xgboost and seaborn integration

Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Decision tree is the most powerful and popular tool for classification and prediction

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

A Python package to preprocess time series

Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

Upgini : data search library for your machine learning pipelines