Data cleaning tools for Business analysis

Overview

Datacleaning

datacleaning tools for Business analysis

This program is made for Vicky's work. You can use it, too.

数据清洗

该数据清洗工具是为了商业分析

这个程序是为了Vicky的工作而开发的,公开代码后,您也可以免费使用它。

Attention

Before you use it. Please change your excel's format.

The first column is your target store name. If you want to use wash, you need change A1 name as '行标签', but if you just use pan category mapping, please forget it.

The second column is the number of target store, default is 1. Program will merge the same name store and sum numbers.

The third column is the local of your target store. We have not use it now, but we want to count it.

Please leave the forth column blank, because we will add the result of pan category mapping in here. Of cause, if you don't use it, forget it.

注意事项

在你使用该程序之前,请改变你的Excel格式。

第一列是你的目标店名。如果你想用数据清洗,请你在A1单元格写入‘行标签’,但是如果你只使用泛类目映射,请无视它。

第二列是目标店铺的门店数,默认为1。程序将会合并所有名字相同的店名。

第三列是你目标店铺的地址。我们暂时没有使用这个字段,但是我们将在未来推出统计店铺地址的功能。

请将第四列空出,因为我们需要将泛类目映射的结果写在这里。当然,如果你不用这个功能,请无视它。

Usage

Before you build it, you need to have right environment. You need have python3 and numpy、 pandas、 tkinter、 ahocorasick-python、 openpyxl.

pip install numpy
pip install pandas
pip install tkinter
pip install ahocorasick-python
pip install openpyxl

And after that, you can build it by this:

pip install pyinstaller
pyinstaller -F datacleaning.py

You can use it now, and executable program is in file dist.

The first buttom, you can read excel file path by it.

The second button, you can choose the path to save your result file. The result file name is result.xlsx

You can select the exact number of characters in text, the program will merge your store name by it.

If you just want to use pan category mapping, you can blank the text. But if you want to use datacleaning, you must fill it.

使用方法

在你编译它之前,你需要有正确的环境。你需要有python3和它的一系列库,如numpy、pandas、tkinter、ahocorasick-python、openpyxl。

pip install numpy
pip install pandas
pip install tkinter
pip install ahocorasick-python
pip install openpyxl

在这之后,你就可以编译了:

pip install pyinstaller
pyinstaller -F datacleaning.py

你能在编译之后,在你的目录下的dist目录中,找到可执行程序,并使用它。

读取EXCEL路径按钮可以读取你要清洗的EXCEL路径。

结果存放按钮可以设定你要存放结果的位置。结果的文件名为result.xlsx

你能选择你想要精确的字符数,存放在精确字符数的框内,程序会根据你的精确字符数,合并你的店名。

如果你只想使用泛类目映射,你可以将精确字符数的框空出。但是如果你要使用数据清洗,就必须填写这个框。

Owner
Lin Jian
KubeEdge Docker Gopher Kubernetes
Lin Jian
A pipeline that creates consensus sequences from a Nanopore reads. I

A pipeline that creates consensus sequences from a Nanopore reads. It clusters reads that are similar to each other and creates a consensus that is then identified using BLAST.

Ada Madejska 2 May 15, 2022
A simple and efficient tool to parallelize Pandas operations on all available CPUs

Pandaral·lel Without parallelization With parallelization Installation $ pip install pandarallel [--upgrade] [--user] Requirements On Windows, Pandara

Manu NALEPA 2.8k Dec 31, 2022
Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine Intro This repo contains the python/stan version of the Statistical Rethinking

Andrés Suárez 3 Nov 08, 2022
Open source platform for Data Science Management automation

Hydrosphere examples This repo contains demo scenarios and pre-trained models to show Hydrosphere capabilities. Data and artifacts management Some mod

hydrosphere.io 6 Aug 10, 2021
📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.

Web Trader Web Trader is a trading website that consolidates data from Nasdaq, allowing the user to search up the ticker symbol and price of any stock

Paulina Khew 21 Aug 30, 2022
Tools for the analysis, simulation, and presentation of Lorentz TEM data.

ltempy ltempy is a set of tools for Lorentz TEM data analysis, simulation, and presentation. Features Single Image Transport of Intensity Equation (SI

McMorran Lab 1 Dec 26, 2022
ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

ForecastGA is a tool that combines a couple of popular libraries, Atspy and googleanalytics, with a few enhancements.

JR Oakes 36 Jan 03, 2023
Python script for transferring data between three drives in two separate stages

Waterlock Waterlock is a Python script meant for incrementally transferring data between three folder locations in two separate stages. It performs ha

David Swanlund 13 Nov 10, 2021
Kennedy Institute of Rheumatology University of Oxford Project November 2019

TradingBot6M Kennedy Institute of Rheumatology University of Oxford Project November 2019 Run Change api.txt to binance api key: https://www.binance.c

Kannan SAR 2 Nov 16, 2021
Python package for analyzing sensor-collected human motion data

Python package for analyzing sensor-collected human motion data

Simon Ho 71 Nov 05, 2022
a tool that compiles a csv of all h1 program stats

h1stats - h1 Program Stats Scraper This python3 script will call out to HackerOne's graphql API and scrape all currently active programs for informati

Evan 40 Oct 27, 2022
wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

Andrew Tavis McAllister 35 Jan 04, 2023
Picka: A Python module for data generation and randomization.

Picka: A Python module for data generation and randomization. Author: Anthony Long Version: 1.0.1 - Fixed the broken image stuff. Whoops What is Picka

Anthony 108 Nov 30, 2021
scikit-survival is a Python module for survival analysis built on top of scikit-learn.

scikit-survival scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizi

Sebastian Pölsterl 876 Jan 04, 2023
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021
Churn prediction with PySpark

It is expected to develop a machine learning model that can predict customers who will leave the company.

3 Aug 13, 2021
Synthetic Data Generation for tabular, relational and time series data.

An Open Source Project from the Data to AI Lab, at MIT Website: https://sdv.dev Documentation: https://sdv.dev/SDV User Guides Developer Guides Github

The Synthetic Data Vault Project 1.2k Jan 07, 2023
🌍 Create 3d-printable STLs from satellite elevation data 🌏

mapa 🌍 Create 3d-printable STLs from satellite elevation data Installation pip install mapa Usage mapa uses numpy and numba under the hood to crunch

Fabian Gebhart 13 Dec 15, 2022
Single machine, multiple cards training; mix-precision training; DALI data loader.

Template Script Category Description Category script comparison script train.py, loader.py for single-machine-multiple-cards training train_DP.py, tra

2 Jun 27, 2022
Pipeline to convert a haploid assembly into diploid

HapDup (haplotype duplicator) is a pipeline to convert a haploid long read assembly into a dual diploid assembly. The reconstructed haplotypes

Mikhail Kolmogorov 50 Jan 05, 2023