BIGDATA SIMULATION ONE PIECE WORLD CENSUS

Overview

BIGDATA SIMULATION ONE PIECE WORLD CENSUS

=================

Solution Architecture

delta

Description


ONE PIECE is a Japanese manga of great international success. The story turns inhabited in a fictional world, tells the adventures of a young man whose body gained rubber properties after accidentally eating a devil fruit (AKUMA NO MI). In this universe there are three types of AKUMA NO MI; Logia, Zoan and Paramecia. Each has a characteristic. The Logia type are elements that can modify the body, the Zoan are of the animal type (and can be extinct or not) and Paramecia are of the object type. These powers may or may not represent a danger to society, all that pose a danger to society are considered criminals and, varying the type of crime, are announced with a reward. The government always seeks to collect its taxes. So in this BigData project we explore the census of this population. Imagining a population of at least 100,000.00 inhabitants, we wrote a project that has MONGODB as its final repository, a non-relational database that organizes its documents by Collections. Below is a glossary of data.

Glossary of Data


Fields Type Description
_id string undescore ID
region_birth string region of birth
country_birth string country of birth
city_birth string city_birth
current_region string current region
current_country string current country
current_city string current city
street string current street
number string number of house
postalcode string postal code
mailer string mailer
street string street name informed
number string number of street name informed
register_data string date your data was entered into the record
type_of_fruit string type of fruit
fruit_name string fruit name
fruit_category string fruit hazard level
number_times_resurrected string number of times that fruit was resurrected
job string occupation
current_job string current job
contracting_company string name of contracting company
start_date string start date in job company
year_working_time string time in year working in company
initial_salary string initial salary
current_wage string current wage
first_name string first name
last_name string last name
gender string gender
race string race of person
birthday string date of birthday
age string age
has_disability string have or do not have a disability
security_social_number string security social number
phone string phone
sketch string sketch
has_tatoo string have or do not have a tatoo
has_scar string have or do not have a scar
has_rewards string have or do not have a rewards
devil_fruit_user string whether or not you are an akuma no mi user
color_hair string color of hair
color_skill string color of skill
type_of_tatoo string type of tatoo
where_in_body string where in body is the tatoo
color_of_tatoo string color of tatoo
scar string where in body is the scar
color_eyes string color of eyes
main_crime string If the person is a criminal. main crime
code_crime string code of crime
tax_collected_government string tax collected by government
debt_with_government string debt with government
rewards string rewards

Description


For a better view of the world of ONE PIECE, its regions, cities and islands, we put the map created for the world.

Map


delta

Start the Project


To run the project, you need to install the dependencies located in the "dependencies" folder and in the root of the project, run the shell_script "run_script.sh".

Sample of Payload in Stagin


address

{
 "_id":"2W1159879A",
 "region_birth":"East Blue",
 "country_birth":"Warship Island",
 "city_birth":"North Wayne",
 "current_region":"East Blue",
 "current_country":"Warship Island",
 "current_city":"East Joshua",
 "street":"Christine Fields",
 "number":"4104",
 "postalcode":"04650",
 "mailer":"m[email protected]",
 "register_data":"20210423"
}

fruit

{
 "_id":"3Y6898825C",
 "type_of_fruit":"it does not have",
 "fruit_name":"it does not have",
 "fruit_category":"it does not have",
 "number_times_resurrected":"0",
 "register_data":"20210622"
}

job

{
 "_id":"2W1159879A",
 "job":"Freight forwarder",
 "current_job":"YES",
 "contracting_company":"Robinson, Simon and Hernandez",
 "start_date":"1981/11/02",
 "year_working_time":40,
 "initial_salary":4904.0,
 "current_wage":5345.36,
 "register_data":"20210423"
}

persona

{
 "_id":"7P1521176A",
 "first_name":"Kristin",
 "last_name":"Smith",
 "gender":"F",
 "race":"Minks",
 "birthday":"1967-03-26",
 "age":"54",
 "devil_fruit_user":"it does not have",
 "has_job":"has",
 "has_tatoo":"it does not have",
 "has_scar":"has",
 "has_disability":"no deficiency",
 "security_social_number":"575-40-5565",
 "phone":"001-985-833-8626x33224",
 "has_rewards":"has",
 "sketch":"https://www.lorempixel.com/350/215",
 "register_data":"20210816"
}

physical_characteristics

{
 "_id":"1S6151128X",
 "color_hair":"SeaShell",
 "color_skill":"BLUISH",
 "type_of_tatoo":"it does not have",
 "where_in_body":"it does not have",
 "color_of_tatoo":"it does not have",
 "scar":"Left arm",
 "color_eyes":"SeaShell",
 "register_data":"20210828"
}

rewards

{
 "_id":"2W1159879A",
 "ssn_people":"165-53-1723",
 "main_crime":"female violence",
 "code_crime":13,
 "tax_collected_government":37824.56,
 "debt_with_government":31503.56,
 "rewards":961679.94,
 "register_data":"20210423"
}

Sample of Payload in Datalake


one_piece

collection not_fruit_user

> db.not_fruit_user.findOne()
{
        "_id" : ObjectId("61a80938f9fae20940d6d7a9"),
        "payload" : {
                "personal_information" : {
                        "first_name" : "Kimberly",
                        "last_name" : "Thompson",
                        "gender" : "F",
                        "race" : "Dwarf",
                        "birthday" : "1996-11-11",
                        "age" : "25"
                },
                "physical_characteristics" : {
                        "has_disability" : "no deficiency",
                        "color_hair" : "Blue",
                        "color_skill" : "WHITE",
                        "scar" : "Back",
                        "color_eyes" : "Blue"
                },
                "social_characteristics" : {
                        "security_social_number" : "740-38-7150",
                        "phone" : "+1-705-306-4346x28383",
                        "sketch" : "https://dummyimage.com/716x261"
                }
        }
}

collection fruit_user

> db.fruit_user.findOne()
{
        "_id" : ObjectId("61a8143e22cbec6d05f38f4e"),
        "payload" : {
                "personal_characteristics" : {
                        "first_name" : "Kenneth",
                        "last_name" : "Brady",
                        "gender" : "M",
                        "race" : "Skypiea",
                        "birthday" : "2000-05-28",
                        "age" : "21"
                },
                "fruit_characteristics" : {
                        "type_of_fruit" : "Logia",
                        "fruit_name" : "Bismuth\t Bismuth\t no Mi",
                        "fruit_category" : "Dangerous",
                        "number_times_resurrected" : "2"
                },
                "job_characteristics" : {
                        "job" : "Swordsman",
                        "current_job" : "YES",
                        "contracting_company" : "Williams, Wilson and Patterson",
                        "start_date" : "1954/09/01",
                        "year_working_time" : "67",
                        "initial_salary" : "4058.0",
                        "current_wage" : "4423.22"
                },
                "physical_characteristics" : {
                        "type_of_tatoo" : "it does not have",
                        "where_in_body" : "it does not have",
                        "color_of_tatoo" : "it does not have",
                        "color_eyes" : "Red",
                        "color_hair" : "Red",
                        "has_disability" : "no deficiency"
                },
                "social_characteristics" : {
                        "security_social_number" : "151-48-5282",
                        "phone" : "+1-842-853-5857",
                        "sketch" : "https://dummyimage.com/428x136"
                },
                "rewards_informations" : {
                        "main_crime" : "Tax evasion",
                        "code_crime" : "9",
                        "tax_collected_government" : 29491.37,
                        "debt_with_government" : "25393.37",
                        "rewards" : "968090.23"
                }
        }
}

Owner
Maycon Cypriano
DATA ENGINEER | DATA SCIENCE | DATA PYTHON | DATA DRIVEN |
Maycon Cypriano
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
Spectral Analysis in Python

SPECTRUM : Spectral Analysis in Python contributions: Please join https://github.com/cokelaer/spectrum contributors: https://github.com/cokelaer/spect

Thomas Cokelaer 280 Dec 16, 2022
pandas: powerful Python data analysis toolkit

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.

pandas 36.4k Jan 03, 2023
Exploratory data analysis

Exploratory data analysis An Exploratory data analysis APP TAPIWA CHAMBOKO 🚀 About Me I'm a full stack developer experienced in deploying artificial

tapiwa chamboko 1 Nov 07, 2021
BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

Angel Chavez 1 Oct 31, 2021
Very basic but functional Kakuro solver written in Python.

kakuro.py Very basic but functional Kakuro solver written in Python. It uses a reduction to exact set cover and Ali Assaf's elegant implementation of

Louis Abraham 4 Jan 15, 2022
In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t

Mobeen Ahmed 1 Nov 01, 2021
Data and code accompanying the paper Politics and Virality in the Time of Twitter

Politics and Virality in the Time of Twitter Data and code accompanying the paper Politics and Virality in the Time of Twitter. In specific: the code

Cardiff NLP 3 Jul 02, 2022
AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures.

AptaMAT Purpose AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures. The method is based on the compa

GEC UTC 3 Nov 03, 2022
Example Of Splunk Search Query With Python And Splunk Python SDK

SSQAuto (Splunk Search Query Automation) Example Of Splunk Search Query With Python And Splunk Python SDK installation: ➜ ~ git clone https://github.c

AmirHoseinTangsiriNET 1 Nov 14, 2021
Jupyter notebooks for the book "The Elements of Statistical Learning".

This repository contains Jupyter notebooks implementing the algorithms found in the book and summary of the textbook.

Madiyar 369 Dec 30, 2022
PyPSA: Python for Power System Analysis

1 Python for Power System Analysis Contents 1 Python for Power System Analysis 1.1 About 1.2 Documentation 1.3 Functionality 1.4 Example scripts as Ju

758 Dec 30, 2022
Python implementation of Principal Component Analysis

Principal Component Analysis Principal Component Analysis (PCA) is a dimension-reduction algorithm. The idea is to use the singular value decompositio

Ignacio Darago 1 Nov 06, 2021
[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Nested Collaborative Learning for Long-Tailed Visual Recognition This repository is the official PyTorch implementation of the paper in CVPR 2022: Nes

Jun Li 65 Dec 09, 2022
Multiple Pairwise Comparisons (Post Hoc) Tests in Python

scikit-posthocs is a Python package that provides post hoc tests for pairwise multiple comparisons that are usually performed in statistical data anal

Maksim Terpilowski 264 Dec 30, 2022
Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

topas-create-graphs A script to automatically plot the results of a topas simulation Works for percentage depth dose (pdd) and dose profiles (dp). Dep

Sebastian Schäfer 10 Dec 08, 2022
Pyspark Spotify ETL

This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data

16 Jun 09, 2022
Elasticsearch tool for easily collecting and batch inserting Python data and pandas DataFrames

ElasticBatch Elasticsearch buffer for collecting and batch inserting Python data and pandas DataFrames Overview ElasticBatch makes it easy to efficien

Dan Kaslovsky 21 Mar 16, 2022
Calculate multilateral price indices in Python (with Pandas and PySpark).

IndexNumCalc Calculate multilateral price indices using the GEKS-T (CCDI), Time Product Dummy (TPD), Time Dummy Hedonic (TDH), Geary-Khamis (GK) metho

Dr. Usman Kayani 3 Apr 27, 2022
Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

Desafio Modulo 4 - Cloud Data Engineer Bootcamp - IGTI Objetivos Criar infraestrutura como código Utuilizando um cluster Kubernetes na Azure Ingestão

Otacilio Filho 4 Jan 23, 2022