Practical-statistics-for-data-scientists - Code repository for O'Reilly book

Overview

Code repository

Practical Statistics for Data Scientists:

50+ Essential Concepts Using R and Python

by Peter Bruce, Andrew Bruce, and Peter Gedeck

Online

View the notebooks online: nbviewer

Excecute the notebooks in Binder: Binder

This can take some time if the binder environment needs to be rebuilt.

Other language versions

English:
Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
2020: ISBN 149207294X
Google books, Amazon
Japanese:
データサイエンスのための統計学入門 第2版 ―予測、分類、統計モデリング、統計的機械学習とR/Pythonプログラミング
2020: ISBN 487311926X, Shinya Ohashi (supervised), Toshiaki Kurokawa (translated)
Google books, Amazon
German:
Praktische Statistik für Data Scientists: 50+ essenzielle Konzepte mit R und Python 
2021: ISBN 3960091532, Marcus Fraaß (Übersetzer)
Google books, Amazon
Korean:
Practical Statistics for Data Scientists: 데이터 과학을 위한 통계(2판) 2021: ISBN 9791162244180, Junyong Lee (translation)
Google books, Hanbit media
Polish:
Statystyka praktyczna w data science. 50 kluczowych zagadnien w jezykach R i Python 2021: ISBN 9788328374270
Google books, Amazon, Helion

See also

Setup R and Python environments

R

Run the following commands in R to install all required packages

if (!require(vioplot)) install.packages('vioplot')
if (!require(corrplot)) install.packages('corrplot')
if (!require(gmodels)) install.packages('gmodels')
if (!require(matrixStats)) install.packages('matrixStats')

if (!require(lmPerm)) install.packages('lmPerm')
if (!require(pwr)) install.packages('pwr')

if (!require(FNN)) install.packages('FNN')
if (!require(klaR)) install.packages('klaR')
if (!require(DMwR)) install.packages('DMwR')

if (!require(xgboost)) install.packages('xgboost')

if (!require(ellipse)) install.packages('ellipse')
if (!require(mclust)) install.packages('mclust')
if (!require(ca)) install.packages('ca')

Python

We recommend to use a conda environment to run the Python code.

conda create -n sfds python
conda activate sfds
conda env update -n sfds -f environment.yml
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas

CogniPy for Pandas - In-memory Graph Database and Knowledge Graph with Natural Language Interface Whats in the box Reasoning, exploration of RDF/OWL,

Cognitum Octopus 34 Dec 13, 2022
Joyplots in Python with matplotlib & pandas :chart_with_upwards_trend:

JoyPy JoyPy is a one-function Python package based on matplotlib + pandas with a single purpose: drawing joyplots (a.k.a. ridgeline plots). The code f

Leonardo Taccari 462 Jan 02, 2023
Fractals plotted on MatPlotLib in Python.

About The Project Learning more about fractals through the process of visualization. Built With Matplotlib Numpy License This project is licensed unde

Akeel Ather Medina 2 Aug 30, 2022
Small project to recursively calculate and plot each successive order of the Hilbert Curve

hilbert-curve Small project to recursively calculate and plot each successive order of the Hilbert Curve. After watching 3Blue1Brown's video on Hilber

Stefan Mejlgaard 2 Nov 15, 2021
Handout for the tutorial "Creating publication-quality figures with matplotlib"

Handout for the tutorial "Creating publication-quality figures with matplotlib"

JB Mouret 1.9k Jan 02, 2023
clock_plot provides a simple way to visualize timeseries data, mapping 24 hours onto the 360 degrees of a polar plot

clock_plot clock_plot provides a simple way to visualize timeseries data mapping 24 hours onto the 360 degrees of a polar plot. For usage, please see

12 Aug 24, 2022
flask extension for integration with the awesome pydantic package

Flask-Pydantic Flask extension for integration of the awesome pydantic package with Flask. Installation python3 -m pip install Flask-Pydantic Basics v

249 Jan 06, 2023
A research of IT labor market based especially on hh.ru. Salaries, rate of technologies and etc.

hh_ru_research Проект реализован в учебных целях анализа рынка труда, в особенности по hh.ru Input data В качестве входных данных используются сериали

3 Sep 07, 2022
Package managers visualization

Software Galaxies This repository combines visualizations of major software package managers. All visualizations are available here: http://anvaka.git

Andrei Kashcha 1.4k Dec 22, 2022
Visualization Data Drug in thailand during 2014 to 2020

Visualization Data Drug in thailand during 2014 to 2020 Data sorce from ข้อมูลเปิดภาครัฐ สำนักงาน ป.ป.ส Inttroducing program Using tkinter module for

Narongkorn 1 Jan 05, 2022
Create a table with row explanations, column headers, using matplotlib

Create a table with row explanations, column headers, using matplotlib. Intended usage was a small table containing a custom heatmap.

4 Aug 14, 2022
Fast scatter density plots for Matplotlib

About Plotting millions of points can be slow. Real slow... 😴 So why not use density maps? ⚡ The mpl-scatter-density mini-package provides functional

Thomas Robitaille 473 Dec 12, 2022
Fastest Gephi's ForceAtlas2 graph layout algorithm implemented for Python and NetworkX

ForceAtlas2 for Python A port of Gephi's Force Atlas 2 layout algorithm to Python 2 and Python 3 (with a wrapper for NetworkX and igraph). This is the

Bhargav Chippada 227 Jan 05, 2023
web application for flight log analysis & review

Flight Review This is a web application for flight log analysis. It allows users to upload ULog flight logs, and analyze them through the browser. It

PX4 Drone Autopilot 145 Dec 20, 2022
Turn a STAC catalog into a dask-based xarray

StackSTAC Turn a list of STAC items into a 4D xarray DataArray (dims: time, band, y, x), including reprojection to a common grid. The array is a lazy

Gabe Joseph 148 Dec 19, 2022
A tool for creating Toontown-style nametags in Panda3D

Toontown-Nametag Toontown-Nametag is a tool for creating Toontown Online/Toontown Rewritten-style nametags in Panda3D. It contains a function, createN

BoggoTV 2 Dec 23, 2021
Insert SVGs into matplotlib

Insert SVGs into matplotlib

Andrew White 35 Dec 29, 2022
Graphing communities on Twitch.tv in a visually intuitive way

VisualizingTwitchCommunities This project maps communities of streamers on Twitch.tv based on shared viewership. The data is collected from the Twitch

Kiran Gershenfeld 312 Jan 07, 2023
Generate visualizations of GitHub user and repository statistics using GitHub Actions.

GitHub Stats Visualization Generate visualizations of GitHub user and repository statistics using GitHub Actions. This project is currently a work-in-

Aditya Thakekar 1 Jan 11, 2022
A Python package that provides evaluation and visualization tools for the DexYCB dataset

DexYCB Toolkit DexYCB Toolkit is a Python package that provides evaluation and visualization tools for the DexYCB dataset. The dataset and results wer

NVIDIA Research Projects 107 Dec 26, 2022