Create HTML profiling reports from pandas DataFrame objects


Pandas Profiling

Pandas Profiling Logo Header

Generates profile reports from a pandas DataFrame.

The pandas df.describe() function is great but a little basic for serious exploratory data analysis. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

  • Type inference: detect the types of columns in a dataframe.
  • Essentials: type, unique values, missing values
  • Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
  • Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
  • Most frequent values
  • Histogram
  • Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
  • Missing values matrix, count, heatmap and dendrogram of missing values
  • Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
  • File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.


Version v2.10.1 released: containing stability fixes for the previous release, which included a major overhaul of the type system, now fully reliant on visions. See the changelog below to know what has changed.

Spark backend in progress: We can happily announce that we're nearing v1 for the Spark backend for generating profile reports. Stay tuned.

Support pandas-profiling

The development of pandas-profiling relies completely on contributions. If you find value in the package, we welcome you to support the project directly through GitHub Sponsors! Please help me to continue to support this package. It's extra exciting that GitHub matches your contribution for the first year.

Find more information here:

February 7, 2021 💘

Contents: Examples | Installation | Documentation | Large datasets | Command line usage | Advanced usage | Support | Types | How to contribute | Editor Integration | Dependencies


The following examples can give you an impression of what the package can do:

Specific features:



Using pip

You can install using the pip package manager by running

pip install pandas-profiling[notebook]

Alternatively, you could install the latest version directly from Github:

pip install

Using conda

You can install using the conda package manager by running

conda install -c conda-forge pandas-profiling

From source

Download the source code by cloning the repository or by pressing 'Download ZIP' on this page.

Install by navigating to the proper directory and running:

python install


The documentation for pandas_profiling can be found here. Previous documentation is still available here.

Getting started

Start by loading in your pandas DataFrame, e.g. by using:

import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(
    np.random.rand(100, 5),
    columns=["a", "b", "c", "d", "e"]

To generate the report, run:

profile = ProfileReport(df, title="Pandas Profiling Report")

Explore deeper

You can configure the profile report in any way you like. The example code below loads the explorative configuration file, that includes many features for text (length distribution, unicode information), files (file size, creation time) and images (dimensions, exif information). If you are interested what exact settings were used, you can compare with the default configuration file.

profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)

Learn more about configuring pandas-profiling on the Advanced usage page.

Jupyter Notebook

We recommend generating reports interactively by using the Jupyter notebook. There are two interfaces (see animations below): through widgets and through a HTML report.

Notebook Widgets

This is achieved by simply displaying the report. In the Jupyter Notebook, run:


The HTML report can be included in a Jupyter notebook:


Run the following code:


Saving the report

If you want to generate a HTML report file, save the ProfileReport to an object and use the to_file() function:


Alternatively, you can obtain the data as JSON:

# As a string
json_data = profile.to_json()

# As a file

Large datasets

Version 2.4 introduces minimal mode.

This is a default configuration that disables expensive computations (such as correlations and dynamic binning).

Use the following syntax:

profile = ProfileReport(large_dataset, minimal=True)

Command line usage

For standard formatted CSV files that can be read immediately by pandas, you can use the pandas_profiling executable.

Run the following for information about options and arguments.

pandas_profiling -h

Advanced usage

A set of options is available in order to adapt the report generated.

  • title (str): Title for the report ('Pandas Profiling Report' by default).
  • pool_size (int): Number of workers in thread pool. When set to zero, it is set to the number of CPUs available (0 by default).
  • progress_bar (bool): If True, pandas-profiling will display a progress bar.
  • infer_dtypes (bool): When True (default) the dtype of variables are inferred using visions using the typeset logic (for instance a column that has integers stored as string will be analyzed as if being numeric).

More settings can be found in the default configuration file, minimal configuration file and dark themed configuration file.

You find the configuration docs on the advanced usage page here


profile = df.profile_report(title='Pandas Profiling Report', plot={'histogram': {'bins': 8}})

Supporting open source

Maintaining and developing the open-source code for pandas-profiling, with millions of downloads and thousands of users, would not be possible without support of our gracious sponsors.

Lambda Labs

Lambda workstations, servers, laptops, and cloud services power engineers and researchers at Fortune 500 companies and 94% of the top 50 universities. Lambda Cloud offers 4 & 8 GPU instances starting at $1.50 / hr. Pre-installed with TensorFlow, PyTorch, Ubuntu, CUDA, and cuDNN.

We would like to thank our generous Github Sponsors supporters who make pandas-profiling possible:

Martin Sotir, Joseph Yuen, Brian Lee, Stephanie Rivera, nscsekhar, abdulAziz

More info if you would like to appear here: Github Sponsor page


Types are a powerful abstraction for effective data analysis, that goes beyond the logical data types (integer, float etc.). pandas-profiling currently recognizes the following types: Boolean, Numerical, Date, Categorical, URL, Path, File and Image.

We have developed a type system for Python, tailored for data analysis: visions. Selecting the right typeset drastically reduces the complexity the code of your analysis. Future versions of pandas-profiling will have extended type support through visions!


Read on getting involved in the Contribution Guide.

A low threshold place to ask questions or start contributing is by reaching out on the pandas-profiling Slack. Join the Slack community.

Editor integration

PyCharm integration

  1. Install pandas-profiling via the instructions above
  2. Locate your pandas-profiling executable.
    • On macOS / Linux / BSD:
      $ which pandas_profiling
      (example) /usr/local/bin/pandas_profiling
    • On Windows:
      $ where pandas_profiling
      (example) C:\ProgramData\Anaconda3\Scripts\pandas_profiling.exe
  3. In PyCharm, go to Settings (or Preferences on macOS) > Tools > External tools
  4. Click the + icon to add a new external tool
  5. Insert the following values
    • Name: Pandas Profiling
    • Program: The location obtained in step 2
    • Arguments: "$FilePath$" "$FileDir$/$FileNameWithoutAllExtensions$_report.html"
    • Working Directory: $ProjectFileDir$

PyCharm Integration

To use the PyCharm Integration, right click on any dataset file:

External Tools > Pandas Profiling.

Other integrations

Other editor integrations may be contributed via pull requests.


The profile report is written in HTML and CSS, which means pandas-profiling requires a modern browser.

You need Python 3 to run this package. Other dependencies can be found in the requirements files:

Filename Requirements
requirements.txt Package requirements
requirements-dev.txt Requirements for development
requirements-test.txt Requirements for testing Requirements for Widgets etc.
  • v3.6.2(Jan 2, 2023)

  • v3.6.1(Dec 23, 2022)

  • v3.6.0(Dec 21, 2022)

    3.6.0 (2022-12-21)

    Bug Fixes

    • add css to cope with large tables (7f42f87)
    • adjust categoricals layout (f0bb45a)
    • categorical data not being obscured in the common values plot (40236bc)
    • compare report ignoring config parameter (3d60556)
    • compare report warnings always showing the last alert type (6b3c13d)
    • comparison fails when duplicates are disable (#1208) (6d19620)
    • do no raise exception for percentage formatter (3ea626d)
    • enforce recomputation of description sets (a9fd1c8)
    • error comparing only one precomputed profile (00646cd)
    • html: sensible cloud-platform notebook html rendering (b22ece2)
    • ignoring config of precomputed reports (6478c40)
    • only compute auto correlation when no config is specified (d5d4f58)
    • remove malfunctioning hook (e2593f5)
    • remove unused test (2170338)
    • return the proper type for widgets (4c0b358)
    • set compute default to false (c70e491)
    • solve mypy error (9c4266e)
    • solve mypy issue (e3e7788)
    • uses colors from the specified config (c0c556d)
    • utils: use 'urllib.request' instead of 'requests' (#1177) (e4d020b), closes #1168


    • add heatmap values as a table under correlations (fc5da9e)
    • allow to specify the configuration for the comparison report (ad725b0)
    • design improvements on the correlations section (e5cd8cf)
    • implement imbalanced warning (ce84c81)
    • update variables layout (#1207) (cf0e0a7)
    Source code(tar.gz)
    Source code(zip)
  • v3.5.0(Nov 22, 2022)

    3.5.0 (2022-11-22)

    Bug Fixes


    Source code(tar.gz)
    Source code(zip)
  • v3.4.0(Oct 20, 2022)

    3.4.0 (2022-10-20)

    Bug Fixes


    Source code(tar.gz)
    Source code(zip)
  • v3.3.0(Sep 7, 2022)

  • v3.2.0(May 2, 2022)

  • v3.1.0(Sep 27, 2021)

  • v3.0.0(May 11, 2021)

  • v2.13.0(May 8, 2021)

  • v2.12.0(May 5, 2021)

  • v2.11.0(Feb 20, 2021)

  • v2.10.1(Feb 7, 2021)

  • v2.10.0rc1(Jan 5, 2021)

  • v2.9.0(Sep 3, 2020)

  • v2.9.0rc1(Jul 12, 2020)

    This release candidate improves handling of sensitive data and futhermore reduces technical debt with various fixes. The full changelog is available here:

    A warm thank you to everyone who has contributed to this release: @gauravkumar37 @Jooong @smaranjitghose @XavierBanos Tam Nguyen @andycraig @mgorsk1 @mbh86 @MHUNCHO @GaelVaroquaux @AmauryLepicard @baluyotraf @pvojnisek @abegong

    Source code(tar.gz)
    Source code(zip)
  • v2.8.0(May 12, 2020)

    pandas-profiling now has build-in supports for Files and Images, such as extracting file sizes, creation dates and dimensions and scanning for truncated images or those containing EXIF information. Moreover, the text analysis features have also been reworked, providing more informative statistics.

    Read the changelog v2.8.0 for more details.

    Contributors: @loopyme @Bradley-Butcher @willemhendriks, @IscaAy, @frellnick, @dataverz @ieaves

    Source code(tar.gz)
    Source code(zip)
  • v2.7.1(May 11, 2020)

  • v2.7.0(May 7, 2020)

    Announcement and changelog are available in the documentation.

    We are grateful for @loopyme and @kyleYang for creating parts of the features on this release.

    Thanks for all contributors that made this release possible @1313e @dataprofessor @neomatrix369 @jiangfangfangxm @WesleyTheGeolien @NickYi1990 @ricgu8086.

    Source code(tar.gz)
    Source code(zip)
  • v2.6.0(Apr 13, 2020)

    Dependency policy

    The current dependency policy is suboptimal. Pinning the dependencies is great for reproducibility (high guarantee to work), but on the downside requires frequent maintenance and introduces compatibility issues with other packages. Therefore, we are moving away from pinning dependencies and instead specify a minimum version.

    Pandas v1

    Early releases of pandas v1 demonstrated many regressions that broke functionality (as acknowledged by the authors here). At this point, pandas is more stable and we notice high demand for compatibility. We move on to support pandas' latest versions. To ensure compatibility with both versions, we have extended the test matrix to test against both pandas 0.x.y and 1.x.y.

    Python 3.6+ features

    Python 3.6 introduces ordered dicts and f-strings, which we now rely on. This means that from pandas-profiling 2.6, you should minimally run Python 3.6. For users that for some reason cannot update, you can use pandas-profiling 2.5.0, but you unfortunately won't benefit from updates or maintenance.

    Extended continuous integration

    Starting from this release, we use Github Actions and Travis CI combined to increase maintainability. Travis CI handles the testing, Github Actions automates part of the development process by running black and building the docs.

    Source code(tar.gz)
    Source code(zip)
  • v2.5.0(Feb 14, 2020)

    • Progress bar added (#224)
    • Character analysis for Text/NLP (#278)
    • Themes: configuration and demo's (Orange, Dark)
    • Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.
    • Toggle descriptions at correlations.


    • This is the last version to support Python 3.5.


    • The order of columns changed when sort="None" (#377, fixed).
    • Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1)
    • Improved mixed type detection (#351)
    • Refactor of report structures.
    • Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329).
    • Distinct counts exclude NaNs.
    • Fixed alerts in notebooks.

    Other improvements:

    • Warnings are now sorted.
    • Links to Binder and Google Colab are added for notebooks (#349)
    • The overview section is tabbed.
    • Commit for pandas-profiling v2.5.0
    • Progress bar added (#224)
    • Character analysis for Text/NLP (#278)
    • Themes: configuration and demo's (Orange, Dark)
    • Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.
    • Toggle descriptions at correlations.


    • This is the last version to support Python 3.5.


    • The order of columns changed when sort="None" (#377, fixed).
    • Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1)
    • Improved mixed type detection (#351)
    • Refactor of report structures.
    • Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329).
    • Distinct counts exclude NaNs.
    • Fixed alerts in notebooks.

    Other improvements:

    • Warnings are now sorted.
    • Links to Binder and Google Colab are added for notebooks (#349)
    • The overview section is tabbed.
    Source code(tar.gz)
    Source code(zip)
  • v2.4.0(Jan 8, 2020)

    The v2.4.0 release decouples the data structure of reports from the actual rendering. It's now much simpler to change the user interface, whether the user is in a jupyter notebook, webpage, native application or just wants a json view of the data.

    We are also proud to announce that we are accepted for the GitHub Sponsor programme. You are cordially invited to support me through this programme, because you want to see me continue working on this project and to boost community funding, GitHub will match your contribution!

    Other improvements:

    • extended configuration with better defaults, including minimal mode for big data (#258, #310)
    • more example datasets
    • rejection of highly correlated variables is generalized (#284, #299)
    • many structural and stability improvements (#254, #274, #239)

    Special thanks to @marco-cardoso @ajupton @lvwerra @gliptak @neomatrix369 for their contributions.

    Source code(tar.gz)
    Source code(zip)
  • v2.3.0(Jul 27, 2019)

    • (Experimental) Support for "path" type
    • Fix numeric precision (#225)
    • Force labels in missing values diagram for large number of columns (#222)
    • Add pull request template
    • Add Census Dataset from the UCI ML Repository

    Thanks @bensdm and @huaiweicheng for your valuable contributions to this version!

    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Jul 22, 2019)

    New release introducing variable size binning (via astropy), PyCharm integration and various fixes and optimizations.

    • Added Variable bin sizing via Bayesian Boxing (feature request [#216])
    • PyCharm integration, console attempts to detect file type.
    • Fixed bug [#215].
    • Updated the missingno package to 0.4.2, fixing the font size in the bar diagram.
    • Various optimizations

    Thanks to: @Utsav37 @mansenfranzen @jakevdp

    Source code(tar.gz)
    Source code(zip)
  • v2.1.2(Jul 11, 2019)

  • v2.1.1(Jul 11, 2019)

  • v2.1.0(Jul 6, 2019)

    The pandas-profiling release version 2.1.0 includes:

    • Correlations: correlation calculations are now more fault tolerant ([#51] and [#197]), correlation names in the report are clarified.
    • Jupyter Notebook: rendering a profiling report is done inside the srcdoc attribute (which fixes [#199]), a full-width option is added and the column layout is improved.
    • User experience: The table styling and sample section formatting is improved.
    • Warnings: detection added for categorical variable that is suspected to be of the datetime type.
    • Documentation and community:
      • The Contribution page helps users that want to contribute.
      • Typo's fixed [#195], Thank you @abhilashshakti
      • Added more examples.
    • Other bugfixes and improvements:
      • Add version information to console interface.
      • Fix: Remove one-time used logger [#202]
      • Fix: Dealing with string indices [#200]

    Contributors: @abhilashshakti @adamrossnelson @manycoding @InsciteAnalytics

    Source code(tar.gz)
    Source code(zip)
  • v2.0.3(Jun 23, 2019)

  • v2.0.2(Jun 22, 2019)

    Revised version structure, fixed recursion preventing installation of dependencies ([#184]).

    The file used to include utils from the package prior to installation. This causes errors when the dependencies are not yet present.

    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jun 21, 2019)

