demir.ai Dataset Operations

Overview

demir.ai Dataset Operations

With this application, you can have the empty values (nan/null) deleted or filled before giving your dataset to machine learning algorithms, you can access visual or numerical information about your dataset and have more detailed information about your attributes.

The application is written in Python programming language, Flask framework is used in the backend, Html is used in the frontent. Pandas framework is used to navigate over the dataset, all numerical operations on the dataset were written by me and no ready-made functions were used, while the plots were created from scratch by me using the Opencv framework.

Before running the application, you can install the necessary packages for the application with the following command.

pip3 install -r requirements.txt

You can launch the web application with the following command, and then you can use the application by going to http://localhost:5000/.

python3 main.py

With this web application, you can delete rows or columns with empty values (nan/null) on your dataset or fill these empty values in three different ways.

  • Null value (nan) operations you can do on your dataset with demir.ai Dataset Operations:

    • Column-based deletion of null data (nan/null)
    • Row-based deletion of null data (nan/null)
    • Filling in blank data by mean, median and mode

Again, thanks to this web application, you can reach visual or numerical results about your dataset and have detailed information about your dataset.

  • Information you can learn about your dataset with demir.ai Dataset Operations:

    • Mean of columns
    • Median of columns
    • Mode of columns
    • Frequency of columns
    • Interquartile range value (IQR) of columns
    • Outliers of columns
    • Five number summary of columns
    • Box Chart of columns
    • Variance and standard deviation of columns

Null value (nan/null) operations

  • Column-based deletion of null data (nan/null): The number of nulls is calculated for each column, then the percentage of nulls is calculated and if this percentage is greater than the percentage the user enters, this column is deleted.

  • Row-based deletion of null data (nan/null): The number of nulls is calculated for each line, and if this number of nulls is greater than the number entered by the user, this line is deleted.

  • Filling in blank data by mean, median and mode:

    • Mean: The sum of the non-blank values of the columns is taken and divided by the total number of non-blank values, the average obtained is written instead of the empty values.

    • Median: The median is calculated according to the non-blank values in the columns, and then this median value is written instead of the empty columns.

    • Mode: The mode is calculated according to the non-blank values in the columns, and then this mode value is written instead of the empty columns

Information you can learn about your dataset

  • Mean of columns: The mean is calculated for each column separately and the column mean information is presented to the user.

  • Median of columns: The median is calculated for each column separately and the column median information is presented to the user.

  • Mode of columns: The mode is calculated for each column separately and the column mode information is presented to the user.

  • Frequency of columns: Frequency is calculated for each column and the frequency information of the columns is presented to the user. In this section, frequency visualization is also done by creating a bar plot from scratch with Opencv.

  • Interquartile range value (IQR) of columns: Q1 and Q3 values are found for each column, then the IQR value of the columns is found with Q3-Q1 and presented to the user.

  • Outliers of columns: If the data in the column is less than (Q1-IQR * 1.5) and greater than (Q3+IQR * 1.5), it is called outlier and this information is presented to the user.

  • Five number summary of columns: Minimum, Q1, median, Q3 and Maximum values are calculated and presented to the user.

  • Box Chart of columns: After finding the minimum, Q1, median, Q3 and maximum values for each column, a box chart is created from scratch with Opencv and this chart is presented to the user.

  • Variance and standard deviation of columns: The variance and standard deviation for each column are calculated and presented to the user.

Application video

demirai.mp4
Owner
Ahmet Furkan DEMIR
Hi, my name is Ahmet Furkan DEMIR. I study computer engineering at Necmettin Erbakan University.
Ahmet Furkan DEMIR
Bioinformatics tool for exploring RNA-Protein interactions

Explore RNA-Protein interactions. RNPFind is a bioinformatics tool. It takes an RNA transcript as input and gives a list of RNA binding protein (RBP)

Nahin Khan 3 Jan 27, 2022
This Crash Course will cover all you need to know to start using Plotly in your projects.

Plotly Crash Course This course was designed to help you get started using Plotly. If you ever felt like your data visualization skills could use an u

Fábio Neves 2 Aug 21, 2022
A little word cloud generator in Python

Linux macOS Windows PyPI word_cloud A little word cloud generator in Python. Read more about it on the blog post or the website. The code is tested ag

Andreas Mueller 9.2k Dec 30, 2022
Simple spectra visualization tool for astronomers

SpecViewer A simple visualization tool for astronomers. Dependencies Python = 3.7.4 PyQt5 = 5.15.4 pyqtgraph == 0.10.0 numpy = 1.19.4 How to use py

5 Oct 07, 2021
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 01, 2023
An open-source tool for visual and modular block programing in python

PyFlow PyFlow is an open-source tool for modular visual programing in python ! Although for now the tool is in Beta and features are coming in bit by

1.1k Jan 06, 2023
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Altair 8k Jan 05, 2023
Simple, realtime visualization of neural network training performance.

pastalog Simple, realtime visualization server for training neural networks. Use with Lasagne, Keras, Tensorflow, Torch, Theano, and basically everyth

Rewon Child 416 Dec 29, 2022
Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python 564 Jan 03, 2023
Plot-configurations for scientific publications, purely based on matplotlib

TUEplots Plot-configurations for scientific publications, purely based on matplotlib. Usage Please have a look at the examples in the example/ directo

Nicholas Krämer 487 Jan 08, 2023
This tool is designed to help administrators get an overview of their Active Directory structure.

This tool is designed to help administrators get an overview of their Active Directory structure. In the group view you can see all elements of an AD (OU, USER, GROUPS, COMPUTERS etc.). In the user v

deexno 2 Oct 30, 2022
Param: Make your Python code clearer and more reliable by declaring Parameters

Param Param is a library providing Parameters: Python attributes extended to have features such as type and range checking, dynamically generated valu

HoloViz 304 Jan 07, 2023
A GUI for Pandas DataFrames

PandasGUI A GUI for analyzing Pandas DataFrames. Demo Installation Install latest release from PyPi: pip install pandasgui Install directly from Githu

Adam 2.8k Jan 03, 2023
GDSHelpers is an open-source package for automatized pattern generation for nano-structuring.

GDSHelpers GDSHelpers in an open-source package for automatized pattern generation for nano-structuring. It allows exporting the pattern in the GDSII-

Helge Gehring 76 Dec 16, 2022
Visualizations of some specific solutions of different differential equations.

Diff_sims Visualizations of some specific solutions of different differential equations. Heat Equation in 1 Dimension (A very beautiful and elegant ex

2 Jan 13, 2022
ecoglib: visualization and statistics for high density microecog signals

ecoglib: visualization and statistics for high density microecog signals This library contains high-level analysis tools for "topos" and "chronos" asp

1 Nov 17, 2021
Flexitext is a Python library that makes it easier to draw text with multiple styles in Matplotlib

Flexitext is a Python library that makes it easier to draw text with multiple styles in Matplotlib

Tomás Capretto 93 Dec 28, 2022
Datapane is the easiest way to create data science reports from Python.

Datapane Teams | Documentation | API Docs | Changelog | Twitter | Blog Share interactive plots and data in 3 lines of Python. Datapane is a Python lib

Datapane 744 Jan 06, 2023
Profile and test to gain insights into the performance of your beautiful Python code

Profile and test to gain insights into the performance of your beautiful Python code View Demo - Report Bug - Request Feature QuickPotato in a nutshel

Joey Hendricks 138 Dec 06, 2022
Parse Robinhood 1099 Tax Document from PDF into CSV

Robinhood 1099 Parser This project converts Robinhood Securities 1099 tax document from PDF to CSV file. This tool will be helpful for those who need

Keun Tae (Kevin) Park 52 Jun 10, 2022