Python for Data Analysis, 2nd Edition

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media

Follow Wes on Twitter:

1st Edition Readers

If you are reading the 1st Edition (published in 2012), please find the reorganized book materials on the 1st-edition branch.

Translations

Chinese by Xu Liang
Polish by Michal Biesiada

IPython Notebooks:

Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
Chapter 3: Built-in Data Structures, Functions, and Files
Chapter 4: NumPy Basics: Arrays and Vectorized Computation
Chapter 5: Getting Started with pandas
Chapter 6: Data Loading, Storage, and File Formats
Chapter 7: Data Cleaning and Preparation
Chapter 8: Data Wrangling: Join, Combine, and Reshape
Chapter 9: Plotting and Visualization
Chapter 10: Data Aggregation and Group Operations
Chapter 11: Time Series
Chapter 12: Advanced pandas
Chapter 13: Introduction to Modeling Libraries in Python
Chapter 14: Data Analysis Examples
Appendix A: Advanced NumPy

License

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

Python for Data Analysis, 2nd Edition

Related tags

Overview

Python for Data Analysis, 2nd Edition

1st Edition Readers

Translations

IPython Notebooks:

License

Code

Owner

Wes McKinney

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

Convert tables stored as images to an usable .csv file

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which he recommends to buy. We will use this data to build a portfolio

Provide a market analysis (R)

Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

Shot notebooks resuming the main functions of GeoPandas

Integrate bus data from a variety of sources (batch processing and real time processing).

An Indexer that works out-of-the-box when you have less than 100K stored Documents

Feature engineering and machine learning: together at last

Used for data processing in machine learning, and help us to construct ML model more easily from scratch

A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

Python Kalman filtering and optimal estimation library. Implements Kalman filter, particle filter, Extended Kalman filter, Unscented Kalman filter, g-h (alpha-beta), least squares, H Infinity, smoothers, and more. Has companion book 'Kalman and Bayesian Filters in Python'.

Employee Turnover Analysis

Data processing with Pandas.

Automated Exploration Data Analysis on a financial dataset

Airflow ETL With EKS EFS Sagemaker

Titanic data analysis for python