Changelog
3.6.2
[3.6.2](https://github.com/ydataai/pandas-profiling/compare/v3.6.1...v3.6.2) (2023-01-02)
Bug Fixes
* comparison alerts ([1229](https://github.com/ydataai/pandas-profiling/issues/1229)) ([bbca61b](https://github.com/ydataai/pandas-profiling/commit/bbca61b0b5e109563aee88a245d6b776d1b65d9b))
* comparison histogram ([1228](https://github.com/ydataai/pandas-profiling/issues/1228)) ([0081581](https://github.com/ydataai/pandas-profiling/commit/0081581c67d0667ad869677a5b9b29d276d5a461))
* comparison report style issues ([a465cdd](https://github.com/ydataai/pandas-profiling/commit/a465cddc88091ed99485b7e34d4c037faf37f6d3))
* update the link for the people-example.csv ([2bb5043](https://github.com/ydataai/pandas-profiling/commit/2bb5043fd147a289cfe54a9feddcb9693275e13d))
3.6.1
[3.6.1](https://github.com/ydataai/pandas-profiling/compare/v3.6.0...v3.6.1) (2022-12-23)
Bug Fixes
* categorical var frequency plot ([6cb391f](https://github.com/ydataai/pandas-profiling/commit/6cb391fd8d26c98792e14592b4d853f9a557eab0))
* remove ipywidgets import ([1b8b117](https://github.com/ydataai/pandas-profiling/commit/1b8b11719cd2a1dfcde9ecd7406aa0545bf46a8e))
3.6.0
[3.6.0](https://github.com/ydataai/pandas-profiling/compare/v3.5.0...v3.6.0) (2022-12-21)
Bug Fixes
* add css to cope with large tables ([7f42f87](https://github.com/ydataai/pandas-profiling/commit/7f42f87cedd06694fe83241416e1fa21327b8c97))
* adjust categoricals layout ([f0bb45a](https://github.com/ydataai/pandas-profiling/commit/f0bb45a2a2d89b5c6e77fd20939e069979b2b948))
* categorical data not being obscured in the common values plot ([40236bc](https://github.com/ydataai/pandas-profiling/commit/40236bc67619a8aadeae797920c6238616169641))
* compare report ignoring config parameter ([3d60556](https://github.com/ydataai/pandas-profiling/commit/3d6055675579d72a5ddf34c4c85e94befb403e72))
* compare report warnings always showing the last alert type ([6b3c13d](https://github.com/ydataai/pandas-profiling/commit/6b3c13dd33489c8a895b2db1854b23a7edd3b948))
* comparison fails when duplicates are disable ([1208](https://github.com/ydataai/pandas-profiling/issues/1208)) ([6d19620](https://github.com/ydataai/pandas-profiling/commit/6d1962044d5bcf634266998551328bd3cdeb354c))
* do no raise exception for percentage formatter ([3ea626d](https://github.com/ydataai/pandas-profiling/commit/3ea626de3d839a55fb0fac9dc7a5fa1da18ba037))
* enforce recomputation of description sets ([a9fd1c8](https://github.com/ydataai/pandas-profiling/commit/a9fd1c845511679a18c87a9566d343ea945e9f16))
* error comparing only one precomputed profile ([00646cd](https://github.com/ydataai/pandas-profiling/commit/00646cde15e0fb0dad29e4bd3cc5747b3eff61e2))
* **html:** sensible cloud-platform notebook html rendering ([b22ece2](https://github.com/ydataai/pandas-profiling/commit/b22ece261c0e9a74254361b6b7e121ab94abe44d))
* ignoring config of precomputed reports ([6478c40](https://github.com/ydataai/pandas-profiling/commit/6478c4047ee871ede7f7aa76379818ee3217e7d7))
* only compute auto correlation when no config is specified ([d5d4f58](https://github.com/ydataai/pandas-profiling/commit/d5d4f58d3b0728bed021677ffb7be14cb7f04f27))
* remove malfunctioning hook ([e2593f5](https://github.com/ydataai/pandas-profiling/commit/e2593f5bb093117c7afb8914eafbda6e2e110782))
* remove unused test ([2170338](https://github.com/ydataai/pandas-profiling/commit/21703385a42bf38d4306511e0f99bed9e1092991))
* return the proper type for widgets ([4c0b358](https://github.com/ydataai/pandas-profiling/commit/4c0b358002d75139c23babc30cbc0c7b23534d92))
* set compute default to false ([c70e491](https://github.com/ydataai/pandas-profiling/commit/c70e49136fbdf1d3fe7e6ef5b23a8adbd0567ecf))
* solve mypy error ([9c4266e](https://github.com/ydataai/pandas-profiling/commit/9c4266eb1cb252d8008795080723598d2d151e26))
* solve mypy issue ([e3e7788](https://github.com/ydataai/pandas-profiling/commit/e3e7788907eebcf572423b48800f848d965f5969))
* uses colors from the specified config ([c0c556d](https://github.com/ydataai/pandas-profiling/commit/c0c556d29cc191d44fdb08fc813818709c1b0666))
* **utils:** use 'urllib.request' instead of 'requests' ([1177](https://github.com/ydataai/pandas-profiling/issues/1177)) ([e4d020b](https://github.com/ydataai/pandas-profiling/commit/e4d020b873b67845a329517e42620ed96545d60e)), closes [#1168](https://github.com/ydataai/pandas-profiling/issues/1168)
Features
* add heatmap values as a table under correlations ([fc5da9e](https://github.com/ydataai/pandas-profiling/commit/fc5da9eff07e7e18c5fd2d8caa698af7aee861e2))
* allow to specify the configuration for the comparison report ([ad725b0](https://github.com/ydataai/pandas-profiling/commit/ad725b0f7d3b61c2a4fafddbdbfc1451197e2c94))
* design improvements on the correlations section ([e5cd8cf](https://github.com/ydataai/pandas-profiling/commit/e5cd8cfb4b91f22b3435f9830f516e929c4e8d32))
* implement imbalanced warning ([ce84c81](https://github.com/ydataai/pandas-profiling/commit/ce84c81c9d2194237676a407fbe5d2461ed64eda))
* update variables layout ([1207](https://github.com/ydataai/pandas-profiling/issues/1207)) ([cf0e0a7](https://github.com/ydataai/pandas-profiling/commit/cf0e0a72477ce13941caf09887afe6a1c3073858))
3.5.0
[3.5.0](https://github.com/ydataai/pandas-profiling/compare/v3.4.0...v3.5.0) (2022-11-22)
Bug Fixes
* change context managed backend ([1149](https://github.com/ydataai/pandas-profiling/issues/1149)) ([11e1a8a](https://github.com/ydataai/pandas-profiling/commit/11e1a8a3fa8d13513fe926b731fb907a066af2a1))
* dataset names on comparison report ([1159](https://github.com/ydataai/pandas-profiling/issues/1159)) ([3c14d43](https://github.com/ydataai/pandas-profiling/commit/3c14d438d9a557ac85f5663cc3446c0fb3081e18))
* duplicate key in test dict ([1126](https://github.com/ydataai/pandas-profiling/issues/1126)) ([d19affe](https://github.com/ydataai/pandas-profiling/commit/d19affe15a4e3063af7187ca5fa81f1bf75ce648))
* improve description and correct plot for ‘auto’ correlation ([1119](https://github.com/ydataai/pandas-profiling/issues/1119)) ([2617b92](https://github.com/ydataai/pandas-profiling/commit/2617b92d08ed87546c80e0cc01cd475d1e60ec56))
* remove correlation calculation for constants ([1152](https://github.com/ydataai/pandas-profiling/issues/1152)) ([1ed2bc0](https://github.com/ydataai/pandas-profiling/commit/1ed2bc0702f504592ed211097469405a5061a857))
* time series render format ([1157](https://github.com/ydataai/pandas-profiling/issues/1157)) ([39ca8ce](https://github.com/ydataai/pandas-profiling/commit/39ca8ce7d4ed2ad0ebb78db5d5f26d3ace08753a))
* update config files to only calculate 'auto' correlation ([1158](https://github.com/ydataai/pandas-profiling/issues/1158)) ([34cf73d](https://github.com/ydataai/pandas-profiling/commit/34cf73dadaea08e44e741f99fa0a10c322c86109))
* update repository links ([1141](https://github.com/ydataai/pandas-profiling/issues/1141)) ([c742c5d](https://github.com/ydataai/pandas-profiling/commit/c742c5dbeb18fe2907a4c03792e8802993c46da5))
Features
* add typechecking to profile report ([1139](https://github.com/ydataai/pandas-profiling/issues/1139)) ([ec8ece0](https://github.com/ydataai/pandas-profiling/commit/ec8ece0de394eb4c2918bb6a74f0c5e5bb77ca61))
* report comparison example ([1160](https://github.com/ydataai/pandas-profiling/issues/1160)) ([5e75fd2](https://github.com/ydataai/pandas-profiling/commit/5e75fd275d14c8ce7ba49d0a15ec26810c4c0e73))
* report comparisons ([1069](https://github.com/ydataai/pandas-profiling/issues/1069)) ([70ee5c7](https://github.com/ydataai/pandas-profiling/commit/70ee5c776ad0c72d709631690a2df1cde5ca0424)), closes [#1137](https://github.com/ydataai/pandas-profiling/issues/1137) [#1136](https://github.com/ydataai/pandas-profiling/issues/1136) [#1143](https://github.com/ydataai/pandas-profiling/issues/1143) [#1148](https://github.com/ydataai/pandas-profiling/issues/1148) [#1150](https://github.com/ydataai/pandas-profiling/issues/1150)
3.4.0
[3.4.0](https://github.com/ydataai/pandas-profiling/compare/v3.3.0...v3.4.0) (2022-10-20)
Bug Fixes
* correlation `auto` passing extra parameters ([1114](https://github.com/ydataai/pandas-profiling/issues/1114)) ([21f4fe6](https://github.com/ydataai/pandas-profiling/commit/21f4fe68b3febe359ea60f7b9790a39db28c222a))
* cramer's correlation fails with missings vals ([1109](https://github.com/ydataai/pandas-profiling/issues/1109)) ([8e7f8b2](https://github.com/ydataai/pandas-profiling/commit/8e7f8b2147886e1d01e3a5c5fa8423cf8e781b76))
* drop joblib dependency ([1090](https://github.com/ydataai/pandas-profiling/issues/1090)) ([586cef3](https://github.com/ydataai/pandas-profiling/commit/586cef360d6b8ed926953298ed3a9772b8369052)), closes [#1056](https://github.com/ydataai/pandas-profiling/issues/1056)
* fix linter errors ([1117](https://github.com/ydataai/pandas-profiling/issues/1117)) ([5f17cfd](https://github.com/ydataai/pandas-profiling/commit/5f17cfdb3c7c07f981fb200a1f12a73bf40690f3))
* make tangled-up-in-unicode an optional dependency ([1070](https://github.com/ydataai/pandas-profiling/issues/1070)) ([e6b2a00](https://github.com/ydataai/pandas-profiling/commit/e6b2a0018a007bef8029ca1c69b6123d0a8e5cda))
* remove unused imports ([56beed4](https://github.com/ydataai/pandas-profiling/commit/56beed456c4fab13a45fd77d93ca12fc38053bb0))
* remove unused imports ([66864c1](https://github.com/ydataai/pandas-profiling/commit/66864c15cfa9b80cb426957e17410c579425d450))
* Remove unused imports. ([985fbd1](https://github.com/ydataai/pandas-profiling/commit/985fbd1fc0e826bda3ac1b725fa8842013743ab3))
Features
* add support for Pandas 1.5 ([1076](https://github.com/ydataai/pandas-profiling/issues/1076)) ([5c5a710](https://github.com/ydataai/pandas-profiling/commit/5c5a710f23d83ba5ff1dc9ab6fc23b28094560fb))
* added filter to locate columns ([1115](https://github.com/ydataai/pandas-profiling/issues/1115)) ([c2f817d](https://github.com/ydataai/pandas-profiling/commit/c2f817d09a38094dcf83b0e49d86e3c87d822c7b))
* introduce auto parameter for correlations ([1095](https://github.com/ydataai/pandas-profiling/issues/1095)) ([4d2e415](https://github.com/ydataai/pandas-profiling/commit/4d2e415601afce2c997298cdedc69e6e04ac6689))
3.3.0
The full changelog is available here: https://pandas-profiling.ydata.ai/docs/master/pages/reference/changelog.html?highlight=change+log
3.2.0
The full changelog is available here: https://pandas-profiling.ydata.ai/docs/master/pages/reference/changelog.html?highlight=change+log
3.1.0
The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
3.0.0
The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
2.13.0
The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
2.12.0
The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
2.11.0
The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
2.10.1
The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
2.10.0rc1
The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
2.9.0
The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
2.9.0rc1
This release candidate improves handling of sensitive data and futhermore reduces technical debt with various fixes. The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
A warm thank you to everyone who has contributed to this release: gauravkumar37 Jooong smaranjitghose XavierBanos Tam Nguyen andycraig mgorsk1 mbh86 MHUNCHO GaelVaroquaux AmauryLepicard baluyotraf pvojnisek abegong
2.8.0
pandas-profiling` now has build-in supports for Files and Images, such as extracting file sizes, creation dates and dimensions and scanning for truncated images or those containing EXIF information. Moreover, the text analysis features have also been reworked, providing more informative statistics.
Read the [changelog v2.8.0](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html#changelog-v2-8-0) for more details.
Contributors: loopyme Bradley-Butcher willemhendriks, IscaAy, frellnick, dataverz ieaves
2.7.0
Announcement and changelog are available in the documentation.
We are grateful for loopyme and kyleYang for creating parts of the features on this release.
Thanks for all contributors that made this release possible 1313e dataprofessor neomatrix369 jiangfangfangxm WesleyTheGeolien NickYi1990 ricgu8086.
2.6.0
Dependency policy
The current dependency policy is suboptimal. Pinning the dependencies is great for reproducibility (high guarantee to work), but on the downside requires frequent maintenance and introduces compatibility issues with other packages. Therefore, we are moving away from pinning dependencies and instead specify a minimum version.
Pandas v1
Early releases of pandas v1 demonstrated many regressions that broke functionality (as acknowledged by the authors [here](https://github.com/pandas-dev/pandas/issues/31523)). At this point, pandas is more stable and we notice high demand for compatibility. We move on to support pandas' latest versions. To ensure compatibility with both versions, we have extended the test matrix to test against both pandas 0.x.y and 1.x.y.
Python 3.6+ features
Python 3.6 introduces ordered dicts and f-strings, which we now rely on. This means that from pandas-profiling 2.6, you should minimally run Python 3.6. For users that for some reason cannot update, you can use pandas-profiling 2.5.0, but you unfortunately won't benefit from updates or maintenance.
Extended continuous integration
Starting from this release, we use Github Actions and Travis CI combined to increase maintainability.
Travis CI handles the testing, Github Actions automates part of the development process by running black and building the docs.
2.5.0
- Progress bar added (224)
- Character analysis for Text/NLP (278)
- Themes: configuration and demo's (Orange, Dark)
- Tutorial on modifying the report's structure (362; 281, 259, 253, 234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.
- Toggle descriptions at correlations.
Deprecation:
- This is the last version to support Python 3.5.
Stability:
- The order of columns changed when sort="None" (377, fixed).
- Pandas v1.0.X is not yet supported (367, 366, 363, 353, pinned pandas to < 1)
- Improved mixed type detection (351)
- Refactor of report structures.
- Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, 329).
- Distinct counts exclude NaNs.
- Fixed alerts in notebooks.
Other improvements:
- Warnings are now sorted.
- Links to Binder and Google Colab are added for notebooks (349)
- The overview section is tabbed.
* Commit for pandas-profiling v2.5.0
- Progress bar added (224)
- Character analysis for Text/NLP (278)
- Themes: configuration and demo's (Orange, Dark)
- Tutorial on modifying the report's structure (362; 281, 259, 253, 234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.
- Toggle descriptions at correlations.
Deprecation:
- This is the last version to support Python 3.5.
Stability:
- The order of columns changed when sort="None" (377, fixed).
- Pandas v1.0.X is not yet supported (367, 366, 363, 353, pinned pandas to < 1)
- Improved mixed type detection (351)
- Refactor of report structures.
- Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, 329).
- Distinct counts exclude NaNs.
- Fixed alerts in notebooks.
Other improvements:
- Warnings are now sorted.
- Links to Binder and Google Colab are added for notebooks (349)
- The overview section is tabbed.
2.4.0
The v2.4.0 release decouples the data structure of reports from the actual rendering. It's now much simpler to change the user interface, whether the user is in a jupyter notebook, webpage, native application or just wants a json view of the data.
We are also proud to announce that we are accepted for the GitHub Sponsor programme. You are cordially invited to support me [through this programme](https://github.com/sponsors/sbrugman), because you want to see me continue working on this project and to boost community funding, GitHub will match your contribution!
Other improvements:
- extended configuration with better defaults, including minimal mode for big data (258, 310)
- more example datasets
- rejection of highly correlated variables is generalized (284, 299)
- many structural and stability improvements (254, 274, 239)
Special thanks to marco-cardoso ajupton lvwerra gliptak neomatrix369 for their contributions.
2.3.0
- (Experimental) Support for "path" type
- Fix numeric precision (225)
- Force labels in missing values diagram for large number of columns (222)
- Add pull request template
- Add [Census Dataset](https://archive.ics.uci.edu/ml/datasets/census+income) from the UCI ML Repository
Thanks bensdm and huaiweicheng for your valuable contributions to this version!
2.2.0
New release introducing variable size binning (via astropy), PyCharm integration and various fixes and optimizations.
- Added Variable bin sizing via Bayesian Boxing (feature request [216])
- PyCharm integration, console attempts to detect file type.
- Fixed bug [215].
- Updated the `missingno` package to 0.4.2, fixing the font size in the `bar` diagram.
- Various optimizations
Thanks to:
Utsav37 mansenfranzen jakevdp
2.1.2
Fix [211] and README
2.1.1
- Fix of [206]
- Improve code maintainability of the view (HTML templates, notebook)
- Fix bug in dendrogram sizing
2.1.0
The `pandas-profiling` release version 2.1.0 includes:
- **Correlations**: correlation calculations are now more fault tolerant ([51] and [197]), correlation names in the report are clarified.
- **Jupyter Notebook**: rendering a profiling report is done inside the `srcdoc` attribute (which fixes [199]), a full-width option is added and the column layout is improved.
- **User experience**: The table styling and sample section formatting is improved.
- **Warnings**: detection added for categorical variable that is suspected to be of the datetime type.
- **Documentation and community**:
- The [Contribution page](CONTRIBUTING.md) helps users that want to contribute.
- Typo's fixed [195], Thank you abhilashshakti
- Added more examples.
- **Other bugfixes and improvements**:
- Add version information to console interface.
- Fix: Remove one-time used logger [202]
- Fix: Dealing with string indices [200]
Contributors:
abhilashshakti adamrossnelson manycoding InsciteAnalytics
2.0.3
Bugfix on version structure for 2.0.2.
2.0.2
Revised version structure, fixed recursion preventing installation of dependencies ([184]).
The setup.py file used to include utils from the package prior to installation.
This causes errors when the dependencies are not yet present.
2.0.1
- Add offline support [177], [179] and [180]
2.0.0
With 23 commits, 123 files changes and 20+ issues resolved, Pandas Profiling v2.0.0 is a big leap forward.
Thanks to the great contributions from everyone involved! Special thanks to JosPolfliet conradoqg eyaltra.
1.4.3
- Fix the correlation images (160).
Contributors:
kazetof
1.4.2
* Multiple Bugfixes
* Enable Travis CI builds
Contributors:
Aylr LeonardAukea kevanshea endremborza romainx drkarthi
1.4.1
Enhancements
- Performance enhancement. It is now possible to disable some heavy resource operations and achieve better performances (see also 76):
- Correlation checking by turning `check_correlation` to `False` (43)
- Recoded checking by turning `check_recoded` to `False`.
- Possibility to install using conda
- Implementation of a new Boolean variable type (25)
- Add new badges for zeros and highly skewed (63)
- Code refactoring (internal improvement) to split on main modules in 4 modules (65)
- Improve types handling
- types like `list`, `tuple` and `dict` are now officially unsupported until we improve them
- mixed columns are also correctly handled
- New Binary variable type supporting native `boolean` type and also binary numeric values (77)
- Warnings column names have link to corresponding detail in variables section in order to ease the navigation (66)
- Spearman and Pearson Correlation matrix diagrams added in the report (83)
Bug fixes
- 56 Incorrect calculation for % unique for variables with missing values bug
- 11 Avoid to throw an error when calling `get_rejected_variables` while correlation has not been computed
- 68 Avoid to set the matplotlib backend if not necessary
1.4.0
Bug fixes and new check for recoded categorical variables. Thanks to all who contributed!
v.1.3.0
New additions include frequency counts and extreme values for numeric variables.
Pandas-profiling now does all 1d-calculations in a multitprocessing fashion, _vastly_ speeding up runtime.
1.2.0
What's new:
- histograms for date variables
- bug fixes
1.0.0a1
Initial release.