Full-featured Decision Trees and Random Forests learner.

Overview

CID3

Latest Release License Github All Releases GitHub Follow Twitter Follow

This is a full-featured Decision Trees and Random Forests learner. It can save trees or forests to disk for later use. It is possible to query trees and Random Forests and to fill out an unlabeled file with the predicted classes. Documentation is not yet available, although the program options can be shown with command:

% java -jar cid3.jar -h

usage: java -jar cid3.jar
 -a,--analysis <name>    show causal analysis report
 -c,--criteria <name>    input criteria: c[Certainty], e[Entropy], g[Gini]
 -f,--file <name>        input file
 -h,--help               print this message
 -o,--output <name>      output file
 -p,--partition          partition train/test data
 -q,--query <type>       query model, enter: t[Tree] or r[Random forest]
 -r,--forest <amount>    create random forest, enter # of trees
 -s,--save               save tree/random forest
 -t,--threads <amount>   maximum number of threads (default is 500)
 -v,--validation         create 10-fold cross-validation
 -ver,--version          version

List of features

  • It uses a new Certainty formula as splitting criteria.
  • Provides causal analysis report, which shows how some attribute values cause a particular classification.
  • Creates full trees, showing error rates for train and test data, attribute importance, causes and false positives/negatives.
  • If no test data is provided, it can split the train dataset in 80% for training and 20% for testing.
  • Creates random forests, showing error rates for train and test data, attribute importance, causes and false positives/negatives. Random forests are created in parallel, so it is very fast.
  • Creates 10 Fold Cross-Validation for trees and random forests, showing error rates, mean and Standard Error and false positives/negatives. Cross-Validation folds are created in parallel.
  • Saves trees and random forests to disk in a compressed file. (E.g. model.tree, model.forest)
  • Query trees and random forest from saved files. Queries can contain missing values, just enter the character: “?”.
  • Make predictions and fill out cases files with those predictions, either from single trees or random forests.
  • Missing values imputation for train and test data is implemented. Continuous attributes are imputed as the mean value. Discrete attributes are imputed as MODE, which selects the value that is most frequent.
  • Ignoring attributes is implemented. In the .names file just set the attribute type as: ignore.
  • Three different splitting criteria can be used: Certainty, Entropy and Gini. If no criteria is invoked then Certainty will be used.

Example run with titanic dataset

[email protected] datasets % java -jar cid3.jar -f titanic

CID3 [Version 1.1]              Saturday October 30, 2021 06:34:11 AM
------------------
[ ✓ ] Read data: 891 cases for training. (10 attributes)
[ ✓ ] Decision tree created.

Rules: 276
Nodes: 514

Importance Cause   Attribute Name
---------- -----   --------------
      0.57   yes ············ Sex
      0.36   yes ········· Pclass
      0.30   yes ··········· Fare
      0.28   yes ······· Embarked
      0.27   yes ·········· SibSp
      0.26   yes ·········· Parch
      0.23    no ············ Age


[==== TRAIN DATA ====] 

Correct guesses:  875
Incorrect guesses: 16 (1.8%)

# Of Cases  False Pos  False Neg   Class
----------  ---------  ---------   -----
       549         14          2 ····· 0
       342          2         14 ····· 1

Time: 0:00:00

Requirements

CID3 requires JDK 15 or higher.

The data format is similar to that of C4.5 and C5.0. The data file format is CSV, and it could be split in two separated files, like: titanic.data and titanic.test. The class attribute column must be the last column of the file. The other necessary file is the "names" file, which should be named like: titanic.names, and it contains the names and types of the attributes. The first line is the class attribute possible values. This line could be left empty with just a dot(.) Below is an example of the titanic.names file:

0,1.  
PassengerId: ignore.  
Pclass: 1,2,3.  
Sex : male,female.  
Age: continuous.  
SibSp: discrete.  
Parch: discrete.  
Ticket: ignore.  
Fare: continuous.  
Cabin: ignore.  
Embarked: discrete.  

Example of causal analysis

% java -jar cid3.jar -f adult -a education

From this example we can see that attribute "education" is a cause, which is based on the certainty-raising inequality. Once we know that it is a cause we then compare the causal certainties of its values. When it's value is "Doctorate" it causes the earnings to be greater than $50,000, with a probability of 0.73. A paper will soon be published with all the formulas used to calculate the Certainty for splitting the nodes and the certainty-raising inequality, used for causal analysis.

Importance Cause   Attribute Name
---------- -----   --------------
      0.56   yes ······ education

Report of causal certainties
----------------------------

[ Attribute: education ]

    1st-4th --> <=50K  (0.97)

    5th-6th --> <=50K  (0.95)

    7th-8th --> <=50K  (0.94)

    9th --> <=50K  (0.95)

    10th --> <=50K  (0.94)

    11th --> <=50K  (0.95)

    12th --> <=50K  (0.93)

    Assoc-acdm --> <=50K  (0.74)

    Assoc-voc --> <=50K  (0.75)

    Bachelors --> Non cause.

    Doctorate --> >50K  (0.73)

    HS-grad --> <=50K  (0.84)

    Masters --> >50K  (0.55)

    Preschool --> <=50K  (0.99)

    Prof-school --> >50K  (0.74)

    Some-college --> <=50K  (0.81)
You might also like...
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs

Realistic Full-Body Anonymization with Surface-Guided GANs This is the official

Random-Afg - Afghanistan Random Old Idz Cloner Tools
Random-Afg - Afghanistan Random Old Idz Cloner Tools

AFGHANISTAN RANDOM OLD IDZ CLONER TOOLS Install $ apt update $ apt upgrade $ apt

ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.
ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Lightweight, efficient and stable implementations of deep reinforcement learning algorithms using PyTorch. 🔥

This program writes christmas wish programmatically. It is using turtle as a pen pointer draw christmas trees and stars.

Introduction This is a simple program is written in python and turtle library. The objective of this program is to wish merry Christmas programmatical

Simulate genealogical trees and genomic sequence data using population genetic models

msprime msprime is a population genetics simulator based on tskit. Msprime can simulate random ancestral histories for a sample of individuals (consis

TreeSubstitutionCipher - Encryption system based on trees and substitution

Tree Substitution Cipher Generation Algorithm: Generate random tree. Tree nodes

Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

This is a Python implementation of cover trees, a data structure for finding nearest neighbors in a general metric space (e.g., a 3D box with periodic

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

A python library to build Model Trees with Linear Models at the leaves.
A python library to build Model Trees with Linear Models at the leaves.

A python library to build Model Trees with Linear Models at the leaves.

Releases(v1.2.4)
Owner
Alejandro Penate-Diaz
Machine learner, web developer, scientist and photo edition enthusiast.
Alejandro Penate-Diaz
Python library for science observations from the James Webb Space Telescope

JWST Calibration Pipeline JWST requires Python 3.7 or above and a C compiler for dependencies. Linux and MacOS platforms are tested and supported. Win

Space Telescope Science Institute 386 Dec 30, 2022
Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model. Designed sample dashboard with insights and recommendation for

Yash 2 Apr 07, 2022
FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

0 Apr 02, 2021
Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

DHF1K =========================================================================== Wenguan Wang, J. Shen, M.-M Cheng and A. Borji, Revisiting Video Sal

Wenguan Wang 126 Dec 03, 2022
Code for our paper "Multi-scale Guided Attention for Medical Image Segmentation"

Medical Image Segmentation with Guided Attention This repository contains the code of our paper: "'Multi-scale self-guided attention for medical image

Ashish Sinha 394 Dec 28, 2022
[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets"

EarlyBERT This is the official implementation for the paper in ACL-IJCNLP 2021 "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by

VITA 13 May 11, 2022
Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Introduction 1. Usage (For MSS) 1.1 Prepare running environment 1.2 Use pretrained model 1.3 Train new MSS models from scratch 1.3.1 How to train 1.3.

Leo 100 Dec 25, 2022
An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.

简体中文 | English News [2021-10-12] PaddleNLP 2.1版本已发布!新增开箱即用的NLP任务能力、Prompt Tuning应用示例与生成任务的高性能推理! 🎉 更多详细升级信息请查看Release Note。 [2021-08-22]《千言:面向事实一致性的生

6.9k Jan 01, 2023
Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations"

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations" this repository is maintained by bo

Yuhan Liu 24 Nov 29, 2022
Tensorflow 2 implementations of the C-SimCLR and C-BYOL self-supervised visual representation methods from "Compressive Visual Representations" (NeurIPS 2021)

Compressive Visual Representations This repository contains the source code for our paper, Compressive Visual Representations. We developed informatio

Google Research 30 Nov 23, 2022
A tool to estimate time varying instantaneous reproduction number during epidemics

EpiEstim A tool to estimate time varying instantaneous reproduction number during epidemics. It is described in the following paper: @article{Cori2013

MRC Centre for Global Infectious Disease Analysis 78 Dec 19, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation

Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation The reference code of Improving Factual Completeness and C

46 Dec 15, 2022
Simple streamlit app to demonstrate HERE Tour Planning

Table of Contents About the Project Built With Getting Started Prerequisites Installation Usage Roadmap Contributing License Acknowledgements About Th

Amol 8 Sep 05, 2022
Keep CALM and Improve Visual Feature Attribution

Keep CALM and Improve Visual Feature Attribution Jae Myung Kim1*, Junsuk Choe1*, Zeynep Akata2, Seong Joon Oh1† * Equal contribution † Corresponding a

NAVER AI 90 Dec 07, 2022
Official PyTorch implementation of SyntaSpeech (IJCAI 2022)

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech | | | | 中文文档 This repository is the official PyTorch implementation of our IJCAI-2022

Zhenhui YE 116 Nov 24, 2022
BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

1 Jan 26, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 07, 2023
Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

SSL_OSC Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

zaixizhang 2 May 14, 2022
A Marvelous ChatBot implement using PyTorch.

PyTorch Marvelous ChatBot [Update] it's 2019 now, previously model can not catch up state-of-art now. So we just move towards the future a transformer

JinTian 223 Oct 18, 2022