A repository with scraping code and soccer dataset from understat.com.

Last update: Jan 03, 2023

Related tags

Overview

UNDERSTAT - SHOTS DATASET

As many people interested in soccer analytics know, Understat is an amazing source of information. They provide Expected Goals (xG) stats for every shot taken in the top 5 leagues in Europe, as well as the Russian league.

After watching an awesome tutorial by McKay Johns (great channel btw, loads of resources for beginners in soccer analytics), I decided to write some code to scrape all the shots data available at Understat. As a consequence I managed to generate this dataset, containing shots data of season 2014/2015, up to every match played in the 2020/2021 season, for the top division on the following countries:

England - EPL

Spain - La Liga

Germany - Bundesliga

Italy - Serie A

France - Ligue 1

Russia - RFPL

Besides shots data, I also managed to scrape very detailed season stats on every single player that took part in these matches.

The datasets have been split into folders for every league, so every folder has 7 .csv files for shots data and 7 .csv files for players data (1 for every season since 14/15). The full dataset, with every league and season combined is also available at the "datasets" folder. I plan on updating the datasets everyday, but I also uploaded the Python code that generates and updates the datasets. Feel free to play with it and suggest improvements (hit me up on twitter). To update it by yourself, just save "scraping" and "datasets" on the same folder, run Python with this folder as the current working directory and then run the update.py script, that is located in "scraping".

Most of the columns in the datasets are pretty straightforward, but some aren't. So I uploaded a couple of .pdf files in "documentation", explaining every column.

A repository with scraping code and soccer dataset from understat.com.

Related tags

Overview

UNDERSTAT - SHOTS DATASET

Owner

douglasbc

A tool can scrape product in aliexpress: Title, Price, and URL Product.

中国大学生在线四史自动答题刷分(现仅支持英雄篇)

A dead simple crawler to get books information from Douban.

News, full-text, and article metadata extraction in Python 3. Advanced docs:

A python module to parse the Open Graph Protocol

Snowflake database loading utility with Scrapy integration

Anonymously scrapes onlinesim.ru for new usable phone numbers.

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

Download images from forum threads

TikTok Username Swapper/Claimer/etc

Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

Subscrape - A Python scraper for substrate chains

Script used to download data for stocks.

京东云无线宝积分推送，支持查看多设备积分使用情况

京东秒杀商品抢购Python脚本

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Python framework to scrape Pastebin pastes and analyze them

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

A repository with scraping code and soccer dataset from understat.com.

Related tags

Overview

UNDERSTAT - SHOTS DATASET

Owner

douglasbc

A tool can scrape product in aliexpress: Title, Price, and URL Product.

中国大学生在线 四史自动答题刷分(现仅支持英雄篇)

A dead simple crawler to get books information from Douban.

News, full-text, and article metadata extraction in Python 3. Advanced docs:

A python module to parse the Open Graph Protocol

Snowflake database loading utility with Scrapy integration

Anonymously scrapes onlinesim.ru for new usable phone numbers.

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

Download images from forum threads

TikTok Username Swapper/Claimer/etc

Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

Subscrape - A Python scraper for substrate chains

Script used to download data for stocks.

京东云无线宝积分推送，支持查看多设备积分使用情况

京东秒杀商品抢购Python脚本

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Python framework to scrape Pastebin pastes and analyze them

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

中国大学生在线四史自动答题刷分(现仅支持英雄篇)