The repo for mlbtradetrees.com. Analyze any trade in baseball history!

Overview

MLB Trade Trees

2.0.0 Release: November 24, 2021

www.mlbtradetrees.com allows you to view the trade tree of any player in MLB history.

What is a trade tree?

A trade tree will show you the complete details of a trade made by a team. Let's use Hall Of Fame candidate Cliff Lee for some examples, as he was traded multiple times throughout his career..

Here is the simplest form of his tree: Cliff Lee Phils

Cliff Lee was traded to the Mariners in 2009, and the Phillies received 3 players in return. All players the Phillies received in return either retired or became free agents, ending the tree with them.

Let's take a look at a more complicated example:

Cliff Lee Phils

We can see the Mariners traded away Cliff Lee in 2010, receiving 4 players in return. 2 Players' lines end due to free agency and being picked up on waivers. 2 players' lines continue due to being traded away the next year. Some of those players' lines end however some continue to be traded away, so the tree grows. The tree finally ends in 2014 due to the final player hitting free agency.

Some of these trees can get pretty massive, spanning decades and dozens of trades. An example is Harry Simpson.

The Database

The transaction, team and player databases are thanks to Retrosheet. I will only update transactions when they update the database.

I have made some adjustments to the database that allows the search to go more smoothly:

Transaction database (data/sorted_transactions_final.csv)

  • Nan players involved in trades were changed to "PTBNL/Cash" (player to be named later). Most of the time you see this in a tree, it is a cash transaction.
  • Transactions of players that were released or granted free agency, then signed back with the team as their next transaction were deleted as it caused trees to end prematurely.
  • Franchise tags were added to the database to ensure that a team name change doesn't end a tree.

Team database (data/teams.csv)

  • All teams in the database received a franchise tag if they are part of the same franchise. They received a unique franchise code if they are an independant team.

Player database (data/teams.csv)

  • Nothing changed, just made a copy with the full name to easily get the user input. (static/css/searchable_players.csv)

Installing Locally

If you want to run the website locally:

  • install flask
  • install pandas
  • install JSGlue (allows Jinja to work in a js file)

Run server.py

What am I working on?

Updated Nov. 24 2021

  • Some players don't display properly due to having very old teams not listed in the teams database. Usually these are players before 1920. I just need to update the transactions database to find all teams without the franchise tag.

  • Adding stat support with pybaseball. I'd like to add total war contributed by players in a trade on the tree.

  • Searching for and filtering trees based on team, year, players in a tree, length of trees, etc.

  • Various UI enhancements, like clickable nodes to get a player's tree, collapsable nodes for easier readability.

Additional tools for particle accelerator data analysis and machine information

PyLHC Tools This package is a collection of useful scripts and tools for the Optics Measurements and Corrections group (OMC) at CERN. Documentation Au

PyLHC 3 Apr 13, 2022
A highly efficient and modular implementation of Gaussian Processes in PyTorch

GPyTorch GPyTorch is a Gaussian process library implemented using PyTorch. GPyTorch is designed for creating scalable, flexible, and modular Gaussian

3k Jan 02, 2023
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 01, 2023
Efficient matrix representations for working with tabular data

Efficient matrix representations for working with tabular data

QuantCo 70 Dec 14, 2022
Bearsql allows you to query pandas dataframe with sql syntax.

Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine

14 Jun 22, 2022
Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences. Copula and functional Principle Component Analysis (fPCA) are st

32 Dec 20, 2022
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022
🌍 Create 3d-printable STLs from satellite elevation data 🌏

mapa 🌍 Create 3d-printable STLs from satellite elevation data Installation pip install mapa Usage mapa uses numpy and numba under the hood to crunch

Fabian Gebhart 13 Dec 15, 2022
In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t

Mobeen Ahmed 1 Nov 01, 2021
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021
ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

Juan Nicolas Serrano 0 Jul 07, 2021
Very basic but functional Kakuro solver written in Python.

kakuro.py Very basic but functional Kakuro solver written in Python. It uses a reduction to exact set cover and Ali Assaf's elegant implementation of

Louis Abraham 4 Jan 15, 2022
ELFXtract is an automated analysis tool used for enumerating ELF binaries

ELFXtract ELFXtract is an automated analysis tool used for enumerating ELF binaries Powered by Radare2 and r2ghidra This is specially developed for PW

Monish Kumar 49 Nov 28, 2022
University Challenge 2021 With Python

University Challenge 2021 This repository contains: The TeX file of the technical write-up describing the University / HYPER Challenge 2021 under late

2 Nov 27, 2021
Pandas and Spark DataFrame comparison for humans

DataComPy DataComPy is a package to compare two Pandas DataFrames. Originally started to be something of a replacement for SAS's PROC COMPARE for Pand

Capital One 259 Dec 24, 2022
Utilize data analytics skills to solve real-world business problems using Humana’s big data

Humana-Mays-2021-HealthCare-Analytics-Case-Competition- The goal of the project is to utilize data analytics skills to solve real-world business probl

Yongxian (Caroline) Lun 1 Dec 27, 2021
Improving your data science workflows with

Make Better Defaults Author: Kjell Wooding [email protected] This is the git re

Kjell Wooding 18 Dec 23, 2022
Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

Henrique A. Lourenço 1 Apr 12, 2022
4CAT: Capture and Analysis Toolkit

4CAT: Capture and Analysis Toolkit 4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to m

Digital Methods Initiative 147 Dec 20, 2022
X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

Nguyễn Quang Huy 5 Sep 28, 2022