Data cleaning tools for Business analysis

Overview

Datacleaning

datacleaning tools for Business analysis

This program is made for Vicky's work. You can use it, too.

数据清洗

该数据清洗工具是为了商业分析

这个程序是为了Vicky的工作而开发的,公开代码后,您也可以免费使用它。

Attention

Before you use it. Please change your excel's format.

The first column is your target store name. If you want to use wash, you need change A1 name as '行标签', but if you just use pan category mapping, please forget it.

The second column is the number of target store, default is 1. Program will merge the same name store and sum numbers.

The third column is the local of your target store. We have not use it now, but we want to count it.

Please leave the forth column blank, because we will add the result of pan category mapping in here. Of cause, if you don't use it, forget it.

注意事项

在你使用该程序之前,请改变你的Excel格式。

第一列是你的目标店名。如果你想用数据清洗,请你在A1单元格写入‘行标签’,但是如果你只使用泛类目映射,请无视它。

第二列是目标店铺的门店数,默认为1。程序将会合并所有名字相同的店名。

第三列是你目标店铺的地址。我们暂时没有使用这个字段,但是我们将在未来推出统计店铺地址的功能。

请将第四列空出,因为我们需要将泛类目映射的结果写在这里。当然,如果你不用这个功能,请无视它。

Usage

Before you build it, you need to have right environment. You need have python3 and numpy、 pandas、 tkinter、 ahocorasick-python、 openpyxl.

pip install numpy
pip install pandas
pip install tkinter
pip install ahocorasick-python
pip install openpyxl

And after that, you can build it by this:

pip install pyinstaller
pyinstaller -F datacleaning.py

You can use it now, and executable program is in file dist.

The first buttom, you can read excel file path by it.

The second button, you can choose the path to save your result file. The result file name is result.xlsx

You can select the exact number of characters in text, the program will merge your store name by it.

If you just want to use pan category mapping, you can blank the text. But if you want to use datacleaning, you must fill it.

使用方法

在你编译它之前,你需要有正确的环境。你需要有python3和它的一系列库,如numpy、pandas、tkinter、ahocorasick-python、openpyxl。

pip install numpy
pip install pandas
pip install tkinter
pip install ahocorasick-python
pip install openpyxl

在这之后,你就可以编译了:

pip install pyinstaller
pyinstaller -F datacleaning.py

你能在编译之后,在你的目录下的dist目录中,找到可执行程序,并使用它。

读取EXCEL路径按钮可以读取你要清洗的EXCEL路径。

结果存放按钮可以设定你要存放结果的位置。结果的文件名为result.xlsx

你能选择你想要精确的字符数,存放在精确字符数的框内,程序会根据你的精确字符数,合并你的店名。

如果你只想使用泛类目映射,你可以将精确字符数的框空出。但是如果你要使用数据清洗,就必须填写这个框。

Owner
Lin Jian
KubeEdge Docker Gopher Kubernetes
Lin Jian
BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

Angel Chavez 1 Oct 31, 2021
SparseLasso: Sparse Solutions for the Lasso

SparseLasso: Sparse Solutions for the Lasso Introduction SparseLasso provides a Scikit-Learn based estimation of the Lasso with cross-validation tunin

Gabriel Okasa 1 Nov 08, 2021
Evaluation of a Monocular Eye Tracking Set-Up

Evaluation of a Monocular Eye Tracking Set-Up As part of my master thesis, I implemented a new state-of-the-art model that is based on the work of Che

Pascal 19 Dec 17, 2022
nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

NRG Tech Services 23 Dec 08, 2022
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021
Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Long Course "Geophysical Python for Seismic Data Analysis" Instruktur: Dr.rer.nat. Wiwit Suryanto, M.Si Dipersiapkan oleh: Anang Sahroni Waktu: Sesi 1

Anang Sahroni 0 Dec 04, 2021
This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

This repo contains a powerful tool made using python which is used to visualize, analyse and finally assess the quality of the product depending upon the given observations

SasiVatsal 8 Oct 18, 2022
Analyzing Covid-19 Outbreaks in Ontario

My group and I took Covid-19 outbreak statistics from ontario, and analyzed them to find different patterns and future predictions for the virus

Vishwaajeeth Kamalakkannan 0 Jan 20, 2022
Randomisation-based inference in Python based on data resampling and permutation.

Randomisation-based inference in Python based on data resampling and permutation.

67 Dec 27, 2022
Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities. This is aimed at those looking to get into the field of D

Joachim 1 Dec 26, 2021
The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

Bell Eapen 14 Jan 02, 2023
Full ELT process on GCP environment.

Rent Houses Germany - GCP Pipeline Project: The goal of the project is to extract data about house rentals in Germany, store, process and analyze it u

Felipe Demenech Vasconcelos 2 Jan 20, 2022
💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

Chatistics Python 3 scripts to convert chat logs from various messaging platforms into Pandas DataFrames. Can also generate histograms and word clouds

Florian 893 Jan 02, 2023
Transform-Invariant Non-Negative Matrix Factorization

Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn

EMD Group 6 Jul 01, 2022
Python tools for querying and manipulating BIDS datasets.

PyBIDS is a Python library to centralize interactions with datasets conforming BIDS (Brain Imaging Data Structure) format.

Brain Imaging Data Structure 180 Dec 18, 2022
WAL enables programmable waveform analysis.

This repro introcudes the Waveform Analysis Language (WAL). The initial paper on WAL will appear at ASPDAC'22 and can be downloaded here: https://www.

Institute for Complex Systems (ICS), Johannes Kepler University Linz 40 Dec 13, 2022
PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra The purpose of this project is to demonstrate a structured streaming pipeline with Apache

Zekeriyya Demirci 5 Nov 13, 2022
pandas: powerful Python data analysis toolkit

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.

pandas 36.4k Jan 03, 2023
CRISP: Critical Path Analysis of Microservice Traces

CRISP: Critical Path Analysis of Microservice Traces This repo contains code to compute and present critical path summary from Jaeger microservice tra

Uber Research 110 Jan 06, 2023
Aggregating gridded data (xarray) to polygons

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample c

Kevin Schwarzwald 42 Nov 09, 2022