Python code for working with NFL play by play data.

Overview

nfl_data_py

nfl_data_py is a Python library for interacting with NFL data sourced from nflfastR, nfldata, dynastyprocess, and Draft Scout.

Includes import functions for play-by-play data, weekly data, seasonal data, rosters, win totals, scoring lines, officials, draft picks, draft pick values, schedules, team descriptive info, combine results and id mappings across various sites.

Installation

Use the package manager pip to install nfl_data_py.

pip install nfl_data_py

Usage

import nfl_data_py as nfl

Working with play-by-play data

nfl.import_pbp_data(years, columns, downcast=True, cache=False, alt_path=None)

Returns play-by-play data for the years and columns specified

years : required, list of years to pull data for (earliest available is 1999)

columns : optional, list of columns to pull data for

downcast : optional, converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%

cache : optional, determines whether to pull pbp data from github repo or local cache generated by nfl.cache_pbp()

alt_path : optional, required if nfl.cache_pbp() is called using an alternate path to the default cache

nfl.see_pbp_cols()

returns list of columns available in play-by-play dataset

Working with weekly data

nfl.import_weekly_data(years, columns, downcast)

Returns weekly data for the years and columns specified

years : required, list of years to pull data for (earliest available is 1999)

columns : optional, list of columns to pull data for

downcast : converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%

nfl.see_weekly_cols()

returns list of columns available in weekly dataset

Working with seasonal data

nfl.import_seasonal_data(years)

Returns seasonal data, including various calculated market share stats

years : required, list of years to pull data for (earliest available is 1999)

Additional data imports

nfl.import_rosters(years, columns)

Returns roster information for years and columns specified

years : required, list of years to pull data for (earliest available is 1999)

columns : optional, list of columns to pull data for

nfl.import_win_totals(years)

Returns win total lines for years specified

years : optional, list of years to pull

nfl.import_sc_lines(years)

Returns scoring lines for years specified

years : optional, list of years to pull

nfl.import_officials(years)

Returns official information by game for the years specified

years : optional, list of years to pull

nfl.import_draft_picks(years)

Returns list of draft picks for the years specified

years : optional, list of years to pull

nfl.import_draft_values()

Returns relative values by generic draft pick according to various popular valuation methods

nfl.import_team_desc()

Returns dataframe with color/logo/etc information for all NFL team

nfl.import_schedules(years)

Returns dataframe with schedule information for years specified

years : required, list of years to pull data for (earliest available is 1999)

nfl.import_combine_data(years, positions)

Returns dataframe with combine results for years and positions specified

years : optional, list or range of years to pull data from

positions : optional, list of positions to be pulled (standard format - WR/QB/RB/etc.)

nfl.import_ids(columns, ids)

Returns dataframe with mapped ids for all players across most major NFL and fantasy football data platforms

columns : optional, list of columns to return

ids : optional, list of ids to return

nfl.import_ngs_data(stat_type, years)

Returns dataframe with specified NGS data

columns : required, type of data (passing, rushing, receiving)

years : optional, list of years to return data for

nfl.import_depth_charts(years)

Returns dataframe with depth chart data

years : optional, list of years to return data for

nfl.import_injuries(years)

Returns dataframe of injury reports

years : optional, list of years to return data for

nfl.import_qbr(years, level, frequency)

Returns dataframe with QBR history

years : optional, years to return data for

level : optional, competition level to return data for, nfl or college, default nfl

frequency : optional, frequency to return data for, weekly or season, default season

nfl.import_pfr_passing(years)

Returns dataframe of PFR passing data

years : optional, years to return data for

nfl.import_snap_counts(years)

Returns dataframe with snap count records

years : optional, list of years to return data for

Additional features

nfl.cache_pbp(years, downcast=True, alt_path=None)

Caches play-by-play data locally to speed up download time. If years specified have already been cached they will be overwritten, so if using in-season must cache 1x per week to catch most recent data

years : required, list or range of years to cache

downcast : optional, converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%

alt_path :optional, alternate path to store pbp cache - default is in program created user Local folder

nfl.clean_nfl_data(df)

Runs descriptive data (team name, player name, etc.) through various cleaning processes

df : required, dataframe to be cleaned

Recognition

I'd like to recognize all of Ben Baldwin, Sebastian Carl, and Lee Sharpe for making this data freely available and easy to access. I'd also like to thank Tan Ho, who has been an invaluable resource as I've worked through this project, and Josh Kazan for the resources and assistance he's provided.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Comments
  • Unable to import data

    Unable to import data

    I am trying to run ` import nfl_data_py as nfl

    nfl.import_seasonal_data([2010]) `

    but I get the following error: OverflowError: value too large Exception ignored in: 'fastparquet.cencoding.read_bitpacked'

    I have tried uninstalling and reinstalling fastparquet, which did not work, and the same issue has arisen in my other function calls.

    More specifically: line 530, in read_col part[defi == max_defi] = dic[val]

    IndexError: index 8389024 is out of bounds for axis 0 with size 94518

    opened by samob917 6
  • Caching

    Caching

    I think the library would benefit from having a caching strategy for downloaded or processed files. One option is to download the files outside of pandas and use an http cache. This tends to be temporary, no more than 7 days. Another option is to give the user the option to read from and save to a cache, which could just involve reading/writing parquet files from the existing data directory, which would be more permanent, or to the system tmp directory, which is more transient. Let me know if either option is of interest.

    opened by sansbacon 6
  • read_parquet engine parameter: 'fastparquet' vs. default 'auto' argument

    read_parquet engine parameter: 'fastparquet' vs. default 'auto' argument

    Pandas will try pyarrow and default back to fastparquet if pyarrow is unavailable. Is there a specific reason you are specifying engine='fastparquet' because, if not, I think it would be better to leave it as the default 'auto' argument.

    opened by sansbacon 3
  • ID Tables Small Issue?

    ID Tables Small Issue?

    Hi, i'm a newb and this is my first issue post so please delete if this is wrong. I suspect your code is fine and that the source data has some bugs, but i'm not sure who to alert to help them out.

    I believe there are some slight data issues on the player ID table.

    probably some more than listed below, but i have to run for now. these are pretty small things, doubtful they will be relevant to anything imminent for anyone... i ran into it on a merge i was doing on gsis_id that didn't like my many-to-one relationship because of it. i'm happy to share and/or try to help fix if manual edits are an option (i stink at coding though)

    gsis_id has four duplications: 00-0020270, 00-0019641, 00-0016098, and 00-0029435. On further research, each of these same cases also has a duplication of the pff_id. There are no other pff_id duplications.

    espn_ids have some duplication: 17257, 2578554, 2574009, 5774, 5730, 12771, 2516049, 13490, 2574010, 14660, 2582138, 16094, 17101. i'm not sure the source of all of these - some seem to be extremely similar names but different people and others seem to be typos (e.g. 5774 one of them should simply be 15774).

    yahoo_ids have one dup: 33495. the one from Duke should be 33542

    opened by robertryanharrison 2
  • .import_ngs_data() HTTPError when filtering for year(s)

    .import_ngs_data() HTTPError when filtering for year(s)

    HTTPError: HTTP Error 404: Not Found

    I receive a 404: not found error when I try to import ngs data from a specific year:

    This works just fine and returns the dataframe:

    ngs_data = nfl.import_ngs_data('passing')
    
    ngs_data
    

    When I add a specific year argument, this breaks:

    ngs_data = nfl.import_ngs_data('passing', [2020])
    
    ngs_data
    

    Yes: I realize that nfl.import_ngs_data('passing') is comprehensive of all seasons that are available and I can filter with that.

    I don't know much about these different file formats, but when I look in nflverse-data/releases/nextgen_stats it looks like there are no .parquet files specific to a season (only .qs, .csv.gz, .rds), only the larger ones (sans season) called in the first block: Screen Shot 2022-07-26 at 12 12 50 PM

    So I think something just needs to be changed around here?

    Thanks for putting together this awesome library!! It's been a huge help as I get ready for the season.

    opened by jbf302 2
  • What version of Python is this for? Getting an error

    What version of Python is this for? Getting an error

    I have tried to install it for both Python 3.8 and 3.6 and I am getting this error:

    Collecting pandas>1 (from nfl_data_py) Could not find a version that satisfies the requirement pandas>1 (from nfl_data_py) (from versions: 0.1, 0.2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0, 0.24.1, 0.24.2, 0.25.0, 0.25.1, 0.25.2, 0.25.3)

    opened by epmck 2
  • Reduce memory usage of multi-year pbp dataframe

    Reduce memory usage of multi-year pbp dataframe

    If you load 20 years of data into one dataframe, you could start pushing up against memory limits on certain users' computers. You can reduce memory usage about 30% by converting float64 values to float32 when loading the yearly play-by-play data.

    cols = df.select_dtypes(include=[np.float64]).columns
    df.loc[:, cols] = df.loc[:, cols].astype(np.float32)
    

    On my computer, this reduced the memory usage of a single year from 129.7 MB to 94.5 MB. I don't think the lost precision is going to matter for anything we are doing with football stats.

    If you are interested, I can submit a pull request that implements this change. You could also make it optional with the default being to downcast but allow the user to override if they want np.float64 as the dtype.

    opened by sansbacon 2
  • HTTP Error 404: Not Found

    HTTP Error 404: Not Found

    There's a chance the links have changed in nflfastR. This link doesn't seem to work: data = pandas.read_parquet(r'https://github.com/nflverse/nflfastR-data/raw/master/data/player_stats.parquet', engine='auto')

    Using Python v3.10.7 with nfl_data_py v0.2.5 Call: nfl.import_weekly_data([2021, 2022]) Error: HTTPError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_15024/1768614258.py in ----> 1 nfl.import_weekly_data([2021, 2022])

    c:\Users\andre\AppData\Local\Programs\Python\Python310\lib\site-packages\nfl_data_py_init_.py in import_weekly_data(years, columns, downcast) 215 216 # read weekly data --> 217 data = pandas.read_parquet(r'https://github.com/nflverse/nflfastR-data/raw/master/data/player_stats.parquet', engine='auto') 218 data = data[data['season'].isin(years)] 219

    c:\Users\andre\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parquet.py in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, **kwargs) 493 impl = get_engine(engine) 494 --> 495 return impl.read( 496 path, 497 columns=columns,

    c:\Users\andre\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parquet.py in read(self, path, columns, use_nullable_dtypes, storage_options, **kwargs) 230 to_pandas_kwargs["split_blocks"] = True # type: ignore[assignment] 231 --> 232 path_or_handle, handles, kwargs["filesystem"] = _get_path_or_handle( 233 path, 234 kwargs.pop("filesystem", None), ... --> 643 raise HTTPError(req.full_url, code, msg, hdrs, fp) 644 645 class HTTPRedirectHandler(BaseHandler):

    HTTPError: HTTP Error 404: Not Found

    opened by andre03051 1
  • Pandas version incompatibility

    Pandas version incompatibility

    Per discussion in nflverse discord.

    Release 0.2.7 updated Pandas to a version not compatible with Python 3.6. This requirement should be reverted if possible. Alternatively if functionality in the updated Pandas is found to be needed, then the package metadata should be updated to specify it requires Python >= 3.7.

    opened by alecglen 1
  • Pandas .append method deprication

    Pandas .append method deprication

    nfl.import_pbp_data give the warning below. Should update method to use concat rather than append method.

    FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. plays = plays.append(raw)

    opened by martydertz 1
  • import_pbp_data function not working

    import_pbp_data function not working

    when trying to run the function import_pbp_data, I receive the following error message: No such file or directory: [path]

    Code: import nfl_data_py as nfl data = nfl.import_pbp_data([2021])

    Screen Shot 2021-09-21 at 5 41 50 PM
    opened by Josephhero 1
  • Missing EPA data

    Missing EPA data

    In week 6 of 2021, the Chicago Bears lost 24-14 to the Green Bay Packers: https://www.nfl.com/games/packers-at-bears-2021-reg-6. When trying to load the EPA data for this game, I get an empty dataframe. Could someone let me know if I am doing something wrong?

    import nfl_data_py as nfl
    
    seasons = [2021]
    cols = ['epa', 'week', 'possession_team']
    
    nfl.cache_pbp(seasons, downcast=True, alt_path=None)
    data = nfl.import_pbp_data(seasons, cols, cache=True)
    
    print(data.loc[(data['possession_team'] == 'CHI') & (data['week'] == 5)])
    print(data.loc[(data['possession_team'] == 'CHI') & (data['week'] == 6)])
    
    
    opened by pphili 2
  • Changed downcast code to use df[cols] instead of df.loc[:, cols]

    Changed downcast code to use df[cols] instead of df.loc[:, cols]

    pandas was giving me the following FutureWarning when importing data:

    /nfl_data_py/__init__.py:137: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt
    to set the values inplace instead of always setting a new array. To retain the old behavior, use either
    `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
        plays.loc[:, cols] = plays.loc[:, cols].astype(numpy.float32)
    

    This change seems to have eliminated the warning.

    opened by wstrausser 0
  • import_win_totals() years not optional

    import_win_totals() years not optional

    Git ReadMe and PyPl docs say years is optional, but years is required. This is a bit difficult because all years are not in the data, so it's hard to know why 2022 doesn't show.

    Suggest making argument years=None and adding the following in the method/lambda: if years is None: years = []

    Let me know how I can contribute. --David

    opened by DavidAllyn68 0
  • Data Dictionary

    Data Dictionary

    Hey guys - thanks for pulling this all together!!! Do you have a data dictionary explaining the columns? Most are self explanatory, but a few are cryptic (e.g. in weekly_data dakota, pacr, racr, wopr, and ..._epa).

    opened by DavidAllyn68 2
  • Missing Personnel Data Week 4 NFL

    Missing Personnel Data Week 4 NFL

    It appears that the values for formation data (offense/defense), players on the play (offense/defense), men in the box, # of pass rushers, etc. are unavailable for 2022 Week 4 data (which is otherwise available). Is this a known issue, an intentional omission, or something else? Where is this data sourced from?

    Wondering if it will be available in the near future or if that is a result of an availability issue

    opened by jwald3 1
Releases(v0.3.0)
Official Matplotlib cheat sheets

Official Matplotlib cheat sheets

Matplotlib Developers 6.7k Jan 09, 2023
This program has been coded to allow the user to rename all the files in the entered folder.

Bulk_File_Renamer This program has been coded to allow the user to rename all the files in the entered folder. The only required package is "termcolor

1 Jan 06, 2022
A powerful Sphinx changelog-generating extension.

What is Releases? Releases is a Python (2.7, 3.4+) compatible Sphinx (1.8+) extension designed to help you keep a source control friendly, merge frien

Jeff Forcier 166 Dec 29, 2022
30 Days of google cloud leaderboard website

30 Days of Cloud Leaderboard This is a leaderboard for the students of Thapar, Patiala who are participating in the 2021 30 days of Google Cloud Platf

Developer Student Clubs TIET 13 Aug 25, 2022
Swagger UI is a collection of HTML, JavaScript, and CSS assets that dynamically generate beautiful documentation from a Swagger-compliant API.

Introduction Swagger UI allows anyone — be it your development team or your end consumers — to visualize and interact with the API’s resources without

Swagger 23.2k Dec 29, 2022
Anomaly Detection via Reverse Distillation from One-Class Embedding

Anomaly Detection via Reverse Distillation from One-Class Embedding Implementation (Official Code ⭐️ ⭐️ ⭐️ ) Environment pytorch == 1.91 torchvision =

73 Dec 19, 2022
Project documentation with Markdown.

MkDocs Project documentation with Markdown. View the MkDocs documentation. Project release notes. Visit the MkDocs wiki for community resources, inclu

MkDocs 15.6k Jan 02, 2023
Simple yet powerful CAD (Computer Aided Design) library, written with Python.

Py-MADCAD it's time to throw parametric softwares out ! Simple yet powerful CAD (Computer Aided Design) library, written with Python. Installation

jimy byerley 124 Jan 06, 2023
Showing potential issues with merge strategies

Showing potential issues with merge strategies Context There are two branches in this repo: main and a feature branch feat/inverting-method (not the b

Rubén 2 Dec 20, 2021
level2-data-annotation_cv-level2-cv-15 created by GitHub Classroom

[AI Tech 3기 Level2 P Stage] 글자 검출 대회 팀원 소개 김규리_T3016 박정현_T3094 석진혁_T3109 손정균_T3111 이현진_T3174 임종현_T3182 Overview OCR (Optimal Character Recognition) 기술

6 Jun 10, 2022
The tutorial is a collection of many other resources and my own notes

Why we need CTC? --- looking back on history 1.1. About CRNN 1.2. from Cross Entropy Loss to CTC Loss Details about CTC 2.1. intuition: forward algor

手写AI 7 Sep 19, 2022
k3heap is a binary min heap implemented with reference

k3heap k3heap is a binary min heap implemented with reference k3heap is a component of pykit3 project: a python3 toolkit set. In this module RefHeap i

pykit3 1 Nov 13, 2021
Sphinx theme for readthedocs.org

Read the Docs Sphinx Theme This Sphinx theme was designed to provide a great reader experience for documentation users on both desktop and mobile devi

Read the Docs 4.3k Dec 31, 2022
A Material Design theme for MkDocs

A Material Design theme for MkDocs Create a branded static site from a set of Markdown files to host the documentation of your Open Source or commerci

Martin Donath 12.3k Jan 04, 2023
A plugin to introduce a generic API for Decompiler support in GEF

decomp2gef A plugin to introduce a generic API for Decompiler support in GEF. Like GEF, the plugin is battery-included and requires no external depend

Zion 379 Jan 08, 2023
Markdown documentation generator from Google docstrings

mkgendocs A Python package for automatically generating documentation pages in markdown for Python source files by parsing Google style docstring. The

Davide Nunes 44 Dec 18, 2022
SqlAlchemy Flask-Restful Swagger Json:API OpenAPI

SAFRS: Python OpenAPI & JSON:API Framework Overview Installation JSON:API Interface Resource Objects Relationships Methods Custom Methods Class Method

Thomas Pollet 361 Nov 16, 2022
Some custom tweaks to the results produced by pytkdocs.

pytkdocs_tweaks Some custom tweaks for pytkdocs. For use as part of the documentation-generation-for-Python stack that comprises mkdocs, mkdocs-materi

Patrick Kidger 4 Nov 24, 2022
Loudchecker - Python script to check files for earrape

loudchecker python script to check files for earrape automatically installs depe

1 Jan 22, 2022
Projeto em Python colaborativo para o Bootcamp de Dados do Itaú em parceria com a Lets Code

🧾 lets-code-todo-list por Henrique V. Domingues e Josué Montalvão Projeto em Python colaborativo para o Bootcamp de Dados do Itaú em parceria com a L

Henrique V. Domingues 1 Jan 11, 2022