demir.ai Dataset Operations

Overview

demir.ai Dataset Operations

With this application, you can have the empty values (nan/null) deleted or filled before giving your dataset to machine learning algorithms, you can access visual or numerical information about your dataset and have more detailed information about your attributes.

The application is written in Python programming language, Flask framework is used in the backend, Html is used in the frontent. Pandas framework is used to navigate over the dataset, all numerical operations on the dataset were written by me and no ready-made functions were used, while the plots were created from scratch by me using the Opencv framework.

Before running the application, you can install the necessary packages for the application with the following command.

pip3 install -r requirements.txt

You can launch the web application with the following command, and then you can use the application by going to http://localhost:5000/.

python3 main.py

With this web application, you can delete rows or columns with empty values (nan/null) on your dataset or fill these empty values in three different ways.

  • Null value (nan) operations you can do on your dataset with demir.ai Dataset Operations:

    • Column-based deletion of null data (nan/null)
    • Row-based deletion of null data (nan/null)
    • Filling in blank data by mean, median and mode

Again, thanks to this web application, you can reach visual or numerical results about your dataset and have detailed information about your dataset.

  • Information you can learn about your dataset with demir.ai Dataset Operations:

    • Mean of columns
    • Median of columns
    • Mode of columns
    • Frequency of columns
    • Interquartile range value (IQR) of columns
    • Outliers of columns
    • Five number summary of columns
    • Box Chart of columns
    • Variance and standard deviation of columns

Null value (nan/null) operations

  • Column-based deletion of null data (nan/null): The number of nulls is calculated for each column, then the percentage of nulls is calculated and if this percentage is greater than the percentage the user enters, this column is deleted.

  • Row-based deletion of null data (nan/null): The number of nulls is calculated for each line, and if this number of nulls is greater than the number entered by the user, this line is deleted.

  • Filling in blank data by mean, median and mode:

    • Mean: The sum of the non-blank values of the columns is taken and divided by the total number of non-blank values, the average obtained is written instead of the empty values.

    • Median: The median is calculated according to the non-blank values in the columns, and then this median value is written instead of the empty columns.

    • Mode: The mode is calculated according to the non-blank values in the columns, and then this mode value is written instead of the empty columns

Information you can learn about your dataset

  • Mean of columns: The mean is calculated for each column separately and the column mean information is presented to the user.

  • Median of columns: The median is calculated for each column separately and the column median information is presented to the user.

  • Mode of columns: The mode is calculated for each column separately and the column mode information is presented to the user.

  • Frequency of columns: Frequency is calculated for each column and the frequency information of the columns is presented to the user. In this section, frequency visualization is also done by creating a bar plot from scratch with Opencv.

  • Interquartile range value (IQR) of columns: Q1 and Q3 values are found for each column, then the IQR value of the columns is found with Q3-Q1 and presented to the user.

  • Outliers of columns: If the data in the column is less than (Q1-IQR * 1.5) and greater than (Q3+IQR * 1.5), it is called outlier and this information is presented to the user.

  • Five number summary of columns: Minimum, Q1, median, Q3 and Maximum values are calculated and presented to the user.

  • Box Chart of columns: After finding the minimum, Q1, median, Q3 and maximum values for each column, a box chart is created from scratch with Opencv and this chart is presented to the user.

  • Variance and standard deviation of columns: The variance and standard deviation for each column are calculated and presented to the user.

Application video

demirai.mp4
Owner
Ahmet Furkan DEMIR
Hi, my name is Ahmet Furkan DEMIR. I study computer engineering at Necmettin Erbakan University.
Ahmet Furkan DEMIR
A simple, fast, extensible python library for data validation.

Validr A simple, fast, extensible python library for data validation. Simple and readable schema 10X faster than jsonschema, 40X faster than schematic

kk 209 Sep 19, 2022
Mapomatic - Automatic mapping of compiled circuits to low-noise sub-graphs

mapomatic Automatic mapping of compiled circuits to low-noise sub-graphs Overvie

Qiskit Partners 27 Nov 06, 2022
Plotly Dash Command Line Tools - Easily create and deploy Plotly Dash projects from templates

🛠️ dash-tools - Create and Deploy Plotly Dash Apps from Command Line | | | | | Create a templated multi-page Plotly Dash app with CLI in less than 7

Andrew Hossack 50 Dec 30, 2022
ipyvizzu - Jupyter notebook integration of Vizzu

ipyvizzu - Jupyter notebook integration of Vizzu. Tutorial · Examples · Repository About The Project ipyvizzu is the Jupyter Notebook integration of V

Vizzu 729 Jan 08, 2023
paintable GitHub contribute table

githeart paintable github contribute table how to use: Functions key color select 1,2,3,4,5 clear c drawing mode mode on turn off e print paint matrix

Bahadır Araz 27 Nov 24, 2022
Automatically generate GitHub activity!

Commit Bot Automatically generate GitHub activity! We've all wanted to be the developer that commits every day, but that requires a lot of work. Let's

Ricky 4 Jun 07, 2022
nvitop, an interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management

An interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management.

Xuehai Pan 1.3k Jan 02, 2023
A grammar of graphics for Python

plotnine Latest Release License DOI Build Status Coverage Documentation plotnine is an implementation of a grammar of graphics in Python, it is based

Hassan Kibirige 3.3k Jan 01, 2023
Generate graphs with NetworkX, natively visualize with D3.js and pywebview

webview_d3 This is some PoC code to render graphs created with NetworkX natively using D3.js and pywebview. The main benifit of this approac

byt3bl33d3r 68 Aug 18, 2022
GitHub English Top Charts

Help you discover excellent English projects and get rid of the interference of other spoken language.

kon9chunkit 529 Jan 02, 2023
D-Analyst : High Performance Visualization Tool

D-Analyst : High Performance Visualization Tool D-Analyst is a high performance data visualization built with python and based on OpenGL. It allows to

4 Apr 14, 2022
PanGraphViewer -- show panenome graph in an easy way

PanGraphViewer -- show panenome graph in an easy way Table of Contents Versions and dependences Desktop-based panGraphViewer Library installation for

16 Dec 17, 2022
An open-source plotting library for statistical data.

Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le

JetBrains 820 Jan 06, 2023
Simple function to plot multiple barplots in the same figure.

Simple function to plot multiple barplots in the same figure. Supports padding and custom color.

Matthias Jakobs 2 Feb 21, 2022
Simple implementation of Self Organizing Maps (SOMs) with rectangular and hexagonal grid topologies

py-self-organizing-map Simple implementation of Self Organizing Maps (SOMs) with rectangular and hexagonal grid topologies. A SOM is a simple unsuperv

Jonas Grebe 1 Feb 10, 2022
Create animated and pretty Pandas Dataframe or Pandas Series

Rich DataFrame Create animated and pretty Pandas Dataframe or Pandas Series, as shown below: Installation pip install rich-dataframe Usage Minimal exa

Khuyen Tran 92 Dec 26, 2022
Cryptocurrency Centralized Exchange Visualization

This is a simple one that uses Grafina to visualize cryptocurrency from the Bitkub exchange. This service will make a request to the Bitkub API from your wallet and save the response to Postgresql. G

Popboon Mahachanawong 1 Nov 24, 2021
Debugging, monitoring and visualization for Python Machine Learning and Data Science

Welcome to TensorWatch TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Micr

Microsoft 3.3k Dec 27, 2022
GitHub Stats Visualizations : Transparent

GitHub Stats Visualizations : Transparent Generate visualizations of GitHub user and repository statistics using GitHub Actions. ⚠️ Disclaimer The pro

YuanYap 7 Apr 05, 2022
AB-test-analyzer - Python class to perform AB test analysis

AB-test-analyzer Python class to perform AB test analysis Overview This repo con

13 Jul 16, 2022