Data cleaning tools for Business analysis

Overview

Datacleaning

datacleaning tools for Business analysis

This program is made for Vicky's work. You can use it, too.

数据清洗

该数据清洗工具是为了商业分析

这个程序是为了Vicky的工作而开发的,公开代码后,您也可以免费使用它。

Attention

Before you use it. Please change your excel's format.

The first column is your target store name. If you want to use wash, you need change A1 name as '行标签', but if you just use pan category mapping, please forget it.

The second column is the number of target store, default is 1. Program will merge the same name store and sum numbers.

The third column is the local of your target store. We have not use it now, but we want to count it.

Please leave the forth column blank, because we will add the result of pan category mapping in here. Of cause, if you don't use it, forget it.

注意事项

在你使用该程序之前,请改变你的Excel格式。

第一列是你的目标店名。如果你想用数据清洗,请你在A1单元格写入‘行标签’,但是如果你只使用泛类目映射,请无视它。

第二列是目标店铺的门店数,默认为1。程序将会合并所有名字相同的店名。

第三列是你目标店铺的地址。我们暂时没有使用这个字段,但是我们将在未来推出统计店铺地址的功能。

请将第四列空出,因为我们需要将泛类目映射的结果写在这里。当然,如果你不用这个功能,请无视它。

Usage

Before you build it, you need to have right environment. You need have python3 and numpy、 pandas、 tkinter、 ahocorasick-python、 openpyxl.

pip install numpy
pip install pandas
pip install tkinter
pip install ahocorasick-python
pip install openpyxl

And after that, you can build it by this:

pip install pyinstaller
pyinstaller -F datacleaning.py

You can use it now, and executable program is in file dist.

The first buttom, you can read excel file path by it.

The second button, you can choose the path to save your result file. The result file name is result.xlsx

You can select the exact number of characters in text, the program will merge your store name by it.

If you just want to use pan category mapping, you can blank the text. But if you want to use datacleaning, you must fill it.

使用方法

在你编译它之前,你需要有正确的环境。你需要有python3和它的一系列库,如numpy、pandas、tkinter、ahocorasick-python、openpyxl。

pip install numpy
pip install pandas
pip install tkinter
pip install ahocorasick-python
pip install openpyxl

在这之后,你就可以编译了:

pip install pyinstaller
pyinstaller -F datacleaning.py

你能在编译之后,在你的目录下的dist目录中,找到可执行程序,并使用它。

读取EXCEL路径按钮可以读取你要清洗的EXCEL路径。

结果存放按钮可以设定你要存放结果的位置。结果的文件名为result.xlsx

你能选择你想要精确的字符数,存放在精确字符数的框内,程序会根据你的精确字符数,合并你的店名。

如果你只想使用泛类目映射,你可以将精确字符数的框空出。但是如果你要使用数据清洗,就必须填写这个框。

Owner
Lin Jian
KubeEdge Docker Gopher Kubernetes
Lin Jian
Random dataframe and database table generator

Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien

Tirthajyoti Sarkar 249 Jan 08, 2023
DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in cluste

Amazon Web Services - Labs 53 Dec 08, 2022
A Python and R autograding solution

Otter-Grader Otter Grader is a light-weight, modular open-source autograder developed by the Data Science Education Program at UC Berkeley. It is desi

Infrastructure Team 93 Jan 03, 2023
Anomaly Detection with R

AnomalyDetection R package AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the pre

Twitter 3.5k Dec 27, 2022
Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Using Streaming Twitter Data with Kafka and Spark Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream

Rustam Zokirov 1 Dec 06, 2021
Analysiscsv.py for extracting analysis and exporting as CSV

wcc_analysis Lichess page documentation: https://lichess.org/page/world-championships Each WCC has a study, studies are fetched using: https://lichess

32 Apr 25, 2022
Binance Kline Data With Python

Binance Kline Data by seunghan(gingerthorp) reference https://github.com/binance/binance-public-data/ All intervals are supported: 1m, 3m, 5m, 15m, 30

shquant 5 Jul 13, 2022
Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No

5 Sep 06, 2021
The Dash Enterprise App Gallery "Oil & Gas Wells" example

This app is based on the Dash Enterprise App Gallery "Oil & Gas Wells" example. For more information and more apps see: Dash App Gallery See the Dash

Austin Caudill 1 Nov 08, 2021
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

weightedcalcs weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more. Features Plays we

Jeremy Singer-Vine 98 Dec 31, 2022
Exploratory Data Analysis for Employee Retention Dataset

Exploratory Data Analysis for Employee Retention Dataset Employee turn-over is a very costly problem for companies. The cost of replacing an employee

kana sudheer reddy 2 Oct 01, 2021
CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner.

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner. It is aimed to integrate this tool with several more features including providing a U

Ravi Prakash 3 Jun 27, 2021
A data parser for the internal syncing data format used by Fog of World.

A data parser for the internal syncing data format used by Fog of World. The parser is not designed to be a well-coded library with good performance, it is more like a demo for showing the data struc

Zed(Zijun) Chen 40 Dec 12, 2022
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 07, 2022
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 01, 2023
MDAnalysis is a Python library to analyze molecular dynamics simulations.

MDAnalysis Repository README [*] MDAnalysis is a Python library for the analysis of computer simulations of many-body systems at the molecular scale,

MDAnalysis 933 Dec 28, 2022
An ETL framework + Monitoring UI/API (experimental project for learning purposes)

Fastlane An ETL framework for building pipelines, and Flask based web API/UI for monitoring pipelines. Project structure fastlane |- fastlane: (ETL fr

Dan Katz 2 Jan 06, 2022
Repository created with LinkedIn profile analysis project done

EN/en Repository created with LinkedIn profile analysis project done. The datase

Mayara Canaver 4 Aug 06, 2022
Python package for processing UC module spectral data.

UC Module Python Package How To Install clone repo. cd UC-module pip install . How to Use uc.module.UC(measurment=str, dark=str, reference=str, heade

Nicolai Haaber Junge 1 Oct 20, 2021
A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

Columns AI 131 Dec 26, 2022