Data cleaning tools for Business analysis

Overview

Datacleaning

datacleaning tools for Business analysis

This program is made for Vicky's work. You can use it, too.

数据清洗

该数据清洗工具是为了商业分析

这个程序是为了Vicky的工作而开发的,公开代码后,您也可以免费使用它。

Attention

Before you use it. Please change your excel's format.

The first column is your target store name. If you want to use wash, you need change A1 name as '行标签', but if you just use pan category mapping, please forget it.

The second column is the number of target store, default is 1. Program will merge the same name store and sum numbers.

The third column is the local of your target store. We have not use it now, but we want to count it.

Please leave the forth column blank, because we will add the result of pan category mapping in here. Of cause, if you don't use it, forget it.

注意事项

在你使用该程序之前,请改变你的Excel格式。

第一列是你的目标店名。如果你想用数据清洗,请你在A1单元格写入‘行标签’,但是如果你只使用泛类目映射,请无视它。

第二列是目标店铺的门店数,默认为1。程序将会合并所有名字相同的店名。

第三列是你目标店铺的地址。我们暂时没有使用这个字段,但是我们将在未来推出统计店铺地址的功能。

请将第四列空出,因为我们需要将泛类目映射的结果写在这里。当然,如果你不用这个功能,请无视它。

Usage

Before you build it, you need to have right environment. You need have python3 and numpy、 pandas、 tkinter、 ahocorasick-python、 openpyxl.

pip install numpy
pip install pandas
pip install tkinter
pip install ahocorasick-python
pip install openpyxl

And after that, you can build it by this:

pip install pyinstaller
pyinstaller -F datacleaning.py

You can use it now, and executable program is in file dist.

The first buttom, you can read excel file path by it.

The second button, you can choose the path to save your result file. The result file name is result.xlsx

You can select the exact number of characters in text, the program will merge your store name by it.

If you just want to use pan category mapping, you can blank the text. But if you want to use datacleaning, you must fill it.

使用方法

在你编译它之前,你需要有正确的环境。你需要有python3和它的一系列库,如numpy、pandas、tkinter、ahocorasick-python、openpyxl。

pip install numpy
pip install pandas
pip install tkinter
pip install ahocorasick-python
pip install openpyxl

在这之后,你就可以编译了:

pip install pyinstaller
pyinstaller -F datacleaning.py

你能在编译之后,在你的目录下的dist目录中,找到可执行程序,并使用它。

读取EXCEL路径按钮可以读取你要清洗的EXCEL路径。

结果存放按钮可以设定你要存放结果的位置。结果的文件名为result.xlsx

你能选择你想要精确的字符数,存放在精确字符数的框内,程序会根据你的精确字符数,合并你的店名。

如果你只想使用泛类目映射,你可以将精确字符数的框空出。但是如果你要使用数据清洗,就必须填写这个框。

Owner
Lin Jian
KubeEdge Docker Gopher Kubernetes
Lin Jian
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 917 Jan 03, 2023
nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

NRG Tech Services 23 Dec 08, 2022
A model checker for verifying properties in epistemic models

Epistemic Model Checker This is a model checker for verifying properties in epistemic models. The goal of the model checker is to check for Pluralisti

Thomas Träff 2 Dec 22, 2021
BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

Mathematical modeling is a powerful method for the analysis of complex biological systems. Although there are many researches devoted on produ

BioMASS 22 Dec 27, 2022
Scraping and analysis of leetcode-compensations page.

Leetcode compensations report Scraping and analysis of leetcode-compensations page.

utsav 96 Jan 01, 2023
Data processing with Pandas.

Processing-data-with-python This is a simple example showing how to use Pandas to create a dataframe and the processing data with python. The jupyter

1 Jan 23, 2022
Stochastic Gradient Trees implementation in Python

Stochastic Gradient Trees - Python Stochastic Gradient Trees1 by Henry Gouk, Bernhard Pfahringer, and Eibe Frank implementation in Python. Based on th

John Koumentis 2 Nov 18, 2022
The Dash Enterprise App Gallery "Oil & Gas Wells" example

This app is based on the Dash Enterprise App Gallery "Oil & Gas Wells" example. For more information and more apps see: Dash App Gallery See the Dash

Austin Caudill 1 Nov 08, 2021
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 09, 2023
Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

BigScience Workshop 3 Mar 03, 2022
PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra The purpose of this project is to demonstrate a structured streaming pipeline with Apache

Zekeriyya Demirci 5 Nov 13, 2022
peptides.py is a pure-Python package to compute common descriptors for protein sequences

peptides.py Physicochemical properties and indices for amino-acid sequences. 🗺️ Overview peptides.py is a pure-Python package to compute common descr

Martin Larralde 32 Dec 31, 2022
Describing statistical models in Python using symbolic formulas

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design mat

Python for Data 866 Dec 16, 2022
Churn prediction with PySpark

It is expected to develop a machine learning model that can predict customers who will leave the company.

3 Aug 13, 2021
Generate lookml for views from dbt models

dbt2looker Use dbt2looker to generate Looker view files automatically from dbt models. Features Column descriptions synced to looker Dimension for eac

lightdash 126 Dec 28, 2022
Geospatial data-science analysis on reasons behind delay in Grab ride-share services

Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in

Keng Hwee 6 Jun 07, 2022
An orchestration platform for the development, production, and observation of data assets.

Dagster An orchestration platform for the development, production, and observation of data assets. Dagster lets you define jobs in terms of the data f

Dagster 6.2k Jan 08, 2023
An easy-to-use feature store

A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.

ByteHub AI 48 Dec 09, 2022
Full automated data pipeline using docker images

Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of

1 Nov 21, 2021
Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

Correlation-Study-Climate-Change-EV-Adoption Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles I

Jonathan Feng 1 Jan 03, 2022