Visualize and compare datasets, target values and associations, with one line of code.

Last update: Jan 05, 2023

Overview

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code!

Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application.

The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks.

Usage and parameters are described below, you can also find an article describing its features in depth and see examples in action HERE.

Sweetviz development is still ongoing! Please let me know if you run into any data, compatibility or install issues! Thank you for reporting any BUGS in the issue tracking system here, and I welcome your feedback and questions on usage/features in the brand-new GitHub "Discussions" tab right here!.

Examples

Example report using the Titanic dataset

Article describing its features in depth

Features

Target analysis
- Shows how a target value (e.g. "Survived" in the Titanic dataset) relates to other features
Visualize and compare
- Distinct datasets (e.g. training vs test data)
- Intra-set characteristics (e.g. male versus female)
Mixed-type associations
- Sweetviz integrates associations for numerical (Pearson's correlation), categorical (uncertainty coefficient) and categorical-numerical (correlation ratio) datatypes seamlessly, to provide maximum information for all data types.
Type inference
- Automatically detects numerical, categorical and text features, with optional manual overrides
Summary information
- Type, unique values, missing values, duplicate rows, most frequent values
- Numerical analysis:
  - min/max/range, quartiles, mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness

Upgrading

Some people have experienced mixed results behavior upgrading through pip. To update to the latest from an existing install, it is recommended to pip uninstall sweetviz first, then simply install.

Installation

Sweetviz currently supports Python 3.6+ and Pandas 0.25.3+. Reports are output using the base "os" module, so custom environments such as Google Colab which require custom file operations are not yet supported, although I am looking into a solution.

Using pip

The best way to install sweetviz (other than from source) is to use pip:

pip install sweetviz

Installation issues & fixes

In some rare cases, users have reported errors such as ModuleNotFoundError: No module named 'sweetviz' and AttributeError: module 'sweetviz' has no attribute 'analyze'. In those cases, we suggest the following:

Make sure none of your scripts are named sweetviz.py, as that interferes with the library itself. Delete or rename that script (and any associated .pyc files), and try again.
Try uninstalling the library using pip uninstall sweetviz, then reinstalling
The issue may stem from using multiple versions of Python, or from OS permissions. The following Stack Overflow articles have resolved many of these issues reported: Article 1, Article 2, Article 3
If all else fails, post a bug issue here on github. Thank you for taking the time, it may help resolve the issue for you and everyone else!

Basic Usage

Creating a report is a quick 2-line process:

Create a DataframeReport object using one of: analyze(), compare() or compare_intra()
Use a show_xxx() function to render the report. You can now use either html or notebook report options, as well as scaling: (more info on these options below)

Step 1: Create the report

There are 3 main functions for creating reports:

analyze(...)
compare(...)
compare_intra(...)

Analyzing a single dataframe (and its optional target feature)

To analyze a single dataframe, simply use the analyze(...) function, then the show_html(...) function:

import sweetviz as sv

my_report = sv.analyze(my_dataframe)
my_report.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"

When run, this will output a 1080p widescreen html app in your default browser:

Optional arguments

The analyze() function can take multiple other arguments:

analyze(source: Union[pd.DataFrame, Tuple[pd.DataFrame, str]],
            target_feat: str = None,
            feat_cfg: FeatureConfig = None,
            pairwise_analysis: str = 'auto'):

source: Either the data frame (as in the example) or a tuple containing the data frame and a name to show in the report. e.g. my_df or [my_df, "Training"]
target_feat: A string representing the name of the feature to be marked as "target". Only BOOLEAN and NUMERICAL features can be targets for now.
feat_cfg: A FeatureConfig object representing features to be skipped, or to be forced a certain type in the analysis. The arguments can either be a single string or list of strings. Parameters are skip, force_cat, force_num and force_text. The "force_" arguments override the built-in type detection. They can be constructed as follows:

feature_config = sv.FeatureConfig(skip="PassengerId", force_text=["Age"])

pairwise_analysis: Correlations and other associations can take quadratic time (n^2) to complete. The default setting ("auto") will run without warning until a data set contains "association_auto_threshold" features. Past that threshold, you need to explicitly pass the parameter pairwise_analysis="on" (or ="off") since processing that many features would take a long time. This parameter also covers the generation of the association graphs (based on Drazen Zaric's concept):

Comparing two dataframes (e.g. Test vs Training sets)

To compare two data sets, simply use the compare() function. Its parameters are the same as analyze(), except with an inserted second parameter to cover the comparison dataframe. It is recommended to use the [dataframe, "name"] format of parameters to better differentiate between the base and compared dataframes. (e.g. [my_df, "Train"] vs my_df)

my_report = sv.compare([my_dataframe, "Training Data"], [test_df, "Test Data"], "Survived", feature_config)

Comparing two subsets of the same dataframe (e.g. Male vs Female)

Another way to get great insights is to use the comparison functionality to split your dataset into 2 sub-populations.

Support for this is built in through the compare_intra() function. This function takes a boolean series as one of the arguments, as well as an explicit "name" tuple for naming the (true, false) resulting datasets. Note that internally, this creates 2 separate dataframes to represent each resulting group. As such, it is more of a shorthand function of doing such processing manually.

my_report = sv.compare_intra(my_dataframe, my_dataframe["Sex"] == "male", ["Male", "Female"], feature_config)

Step 2: Show the report

Once you have created your report object (e.g. my_report in the examples above), simply pass it into one of the two `show' functions:

show_html()

show_html(  filepath='SWEETVIZ_REPORT.html', 
            open_browser=True, 
            layout='widescreen', 
            scale=None)

show_html(...) will create and save an HTML report at the given file path. There are options for:

layout: Either 'widescreen' or 'vertical'. The widescreen layout displays details on the right side of the screen, as the mouse goes over each feature. The new (as of 2.0) vertical layout is more compact horizontally and enables expanding each detail area upon clicking.
scale: Use a floating-point number (scale= 0.8 or None) to scale the entire report. This is very useful to fit reports to any output.
open_browser: Enables the automatic opening of a web browser to show the report. Since under some circumstances this is not desired (or causes issues with some IDE's), you can disable it here.

show_notebook()

show_notebook(  w=None, 
                h=None, 
                scale=None,
                layout='widescreen',
                filepath=None)

show_notebook(...) is new as of 2.0 and will embed an IFRAME element showing the report right inside a notebook (e.g. Jupyter, Google Colab, etc.).

Note that since notebooks are generally a more constrained visual environment, it is probably a good idea to use custom width/height/scale values (w, h, scale) and even set custom default values in an INI override (see below). The options are:

w (width): Sets the width of the output window for the report (the full report may not fit; use layout and/or scale for the report itself). Can be as a percentage string (w="100%") or number of pixels (w=900).
h (height): Sets the height of the output window for the report. Can be as a number of pixels (h=700) or "Full" to stretch the window to be as tall as all the features (h="Full").
scale: Same as for show_html, above.
layout: Same as for show_html, above.
scale: Same as for show_html, above.
filepath: An optional output HTML report.

Customizing defaults: the Config file

The package contains an INI file for configuration. You can override any setting by providing your own then calling this before creating a report:

sv.config_parser.read("Override.ini")

IMPORTANT #1: it is best to load overrides before any other command, as many of the INI options are used in the report generation.

IMPORTANT #2: always set the header (e.g. [General] before the value, otherwise there will be an error).

Most useful config overrides

You can look into the file sweetviz_defaults.ini for what can be overriden (warning: much of it is a work in progress and not well documented), but the most useful overrides are as follows.

Default report layout, size

Override any of these (by putting them in your own INI, again do not forget the header), to avoid having to set them every time you do a "show" command:

Important: note the double '%' if specifying a percentage

[Output_Defaults]
html_layout = widescreen
html_scale = 1.0
notebook_layout = vertical
notebook_scale = 0.9
notebook_width = 100%%
notebook_height = 700

New: Chinese, Japanse, Korean (CJK) character support

[General]
use_cjk_font = 1

Will switch the font in the graphs to use a CJK-compatible font. Although this font is not as compact, it will get rid of any warnings and "unknown character" symbols for these languages.

Remove Sweetviz logo

[Layout]
show_logo = 0

Will remove the Sweetviz logo from the top of the page.

Correlation/Association analysis

A major source of insight and unique feature of Sweetviz' associations graph and analysis is that it unifies in a single graph (and detail views):

Numerical correlation (between numerical features)
Uncertainty coefficient (for categorical-categorical)
Correlation ratio (for categorical-numerical)

Squares represent categorical-featured-related variables and circles represent numerical-numerical correlations. Note that the trivial diagonal is left empty, for clarity.

IMPORTANT: categorical-categorical associations (provided by the SQUARES showing the uncertainty coefficient) are ASSYMMETRICAL, meaning that each row represents how much the row title (on the left) gives information on each column. For example, "Sex", "Pclass" and "Fare" are the elements that give the most information on "Survived".

For the Titanic dataset, this information is rather symmetrical but it is not always the case!

Correlations are also displayed in the detail section of each feature, with the target value highlighted when applicable. e.g.:

Finally, it is worth noting these correlation/association methods shouldn’t be taken as gospel as they make some assumptions on the underlying distribution of data and relationships. However they can be a very useful starting point.

Troubleshooting / FAQ

Installation issues

Please see the "Installation issues & fixes" section at the top of this document

Asian characters, "RuntimeWarning: Glyph ### missing from current font"

See section above regarding CJK characters support. If you find the need for additional character types, definitely post a request in the issue tracking system.

...any other issues

Development is ongoing so absolutely feel free to report any issues and/or suggestions in the issue tracking system here or in our forum (you should be able to log in with your Github account!)

Contribute

This is my first open-source project! I built it to be the most useful tool possible and help as many people as possible with their data science work. If it is useful to you, your contribution is more than welcome and can take many forms:

1. Spread the word!

A STAR here on GitHub, and a Twitter or Instagram post are the easiest contribution and can potentially help grow this project tremendously! If you find this project useful, these quick actions from you would mean a lot and could go a long way.

Kaggle notebooks/posts, Medium articles, YouTube video tutorials and other content take more time but will help all the more!

2. Report bugs & issues

I expect there to be many quirks once the project is used by more and more people with a variety of new (& "unclean") data. If you found a bug, please open a new issue here.

3. Suggest and discuss usage/features

To make Sweetviz as useful as possible we need to hear what you would like it to do, or what it could do better! Head on to our Discourse server and post your suggestions there; no login required!.

4. Contribute to the development

I definitely welcome the help I can get on this project, simply get in touch on the issue tracker and/or our Discourse forum.

Please note that after a hectic development period, the code itself right now needs a bit of cleanup. :)

Special thanks & related materials

I want Sweetviz to be a hub of the best of what's out there, a way to get the most valuable information and visualization, without reinventing the wheel.

As such, I want to point some of those great resources that were inspiring and integrated into Sweetviz:

Pandas-Profiling was the original inspiration for this project. Some of its type-detection code was included in Sweetviz.
Shaked Zychlinski: The Search for Categorical Correlation is a great article about different types of variable interactions that was the basis of that analysis in Sweetviz.
Drazen Zaric: Better Heatmaps and Correlation Matrix Plots in Python was the basis for our association graphs.

And of course, very special thanks to everyone who have contributed on Github, through reports, feedback and commits!

Comments

ValueError: index must be monotonic increasing or decreasing

I am able to generate the same report on Titanic data as in the Medium articles. However, when I try to test the Boston housing data, I get the errors as below:

ValueError Traceback (most recent call last) ~\AppData\Local\Continuum\anaconda3\envs\envSDS\lib\site-packages\pandas\core\indexes\base.py in get_slice_bound(self, label, side, kind) 5166 try: -> 5167 return self._searchsorted_monotonic(label, side) 5168 except ValueError:

~\AppData\Local\Continuum\anaconda3\envs\envSDS\lib\site-packages\pandas\core\indexes\base.py in _searchsorted_monotonic(self, label, side) 5127 -> 5128 raise ValueError("index must be monotonic increasing or decreasing") 5129

ValueError: index must be monotonic increasing or decreasing

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last) in ----> 1 my_report = sv.analyze(dfx)

Any ideas on the error?

Thanks.
bug

opened by phillip1029 18
FloatingPointError: divide by zero encountered in true_divide

I ran into a "FloatingPointError: divide by zero encountered in true_divide" in the pairwise feature portion of the code. Apparently there was a divide by zero issue in the cov part of the underlying code.

The trace of the error is as follows: file: sv_public.py, line 13, in analyze, pairwise_analysis, feat_cfg) file: dataframe_report.py, line 243, in init, self.process_associations(features_to_process, source_target_series, compare_target series file: dataframe_report.py, line 423, in process_associations, feature.source.corr(other.source, method='pearson') file: series.py line 2254, in corr, this.values, other.values, method=method, min_periods=min_periods file: nanops.py, line 69, in _f, return f(*args,*kwargs) file: nanops.py, line 1240, in nancorr, return f(a,b) file: nanops.py, line 1256, in _pearson, return np.corrcoef(a,b)[0,1] file: <array_function internals>, line 6, in corrcoef file: function_base.py,line 2526 in corrcoef, c=cov(x,y,rowvar) file: <array_function internals>, line 6, in cov file: function_base.py, line 2455, in cov, c=np.true_divide(1,fact)

My dataframe had some empty strings where nulls should have been, but there were other columns that had similar features, but they never threw this error.
bug

opened by jmcneal84 17

Integer feature with values 1 and 2 cannot be handled as categorical?

Hey guys, I'm getting an error when handling integer columns but the error message is not very clear for me to understand what is going on. So far it looks like a bug to me. Here it goes.

We start by importing basic stuff and generate a pandas dataframe with 4 columns containing random real numbers, plus an integer column named 'target' with values 1 and 2.

import sweetviz as sv
import pandas as pd
import numpy as np

np.random.seed(42)
np_data = np.random.randn(10, 4)
df = pd.DataFrame(np_data, columns=['col1', 'col2', 'col3', 'col4'])
df['target'] = 1.0
df['target'].iloc[5:] = 2.
df['target'] = df['target'].astype(int)

Taking a look at the original types of the dataframe (df.dtypes), we have as a result: col1 float64 col2 float64 col3 float64 col4 float64 target int32 dtype: object

Error: TypeError

compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"])
compareReport.show_html()

gives this message:

TypeError                                 Traceback (most recent call last)
<ipython-input-54-8e3e89553904> in <module>
      1 #feature_config = sv.FeatureConfig(force_num=['col1', 'col2', 'col3', 'col4'], force_cat='target')
----> 2 compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"])#, feat_cfg=feature_config, target_feat='target')
      3 compareReport.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"

~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\sv_public.py in compare_intra(source_df, condition_series, names, target_feat, feat_cfg, pairwise_analysis)
     42     report = sweetviz.DataframeReport([data_true, names[0]], target_feat,
     43                                       [data_false, names[1]],
---> 44                                       pairwise_analysis, feat_cfg)
     45     return report
     46 

~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
    215             # start = time.perf_counter()
    216             self.progress_bar.set_description(':' + f.source.name + '')
--> 217             self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
    218             self.progress_bar.update(1)
    219             # print(f"DONE FEATURE------> {f.source.name}"

~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\series_analyzer.py in analyze_feature_to_dictionary(to_process)
     92         compare_type = determine_feature_type(to_process.compare,
     93                                               to_process.compare_counts,
---> 94                                               returned_feature_dict["type"], "COMPARED")
     95         if compare_type != FeatureType.TYPE_ALL_NAN and \
     96             source_type != FeatureType.TYPE_ALL_NAN:

~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\type_detection.py in determine_feature_type(series, counts, must_be_this_type, which_dataframe)
     73             var_type = FeatureType.TYPE_TEXT
     74         else:
---> 75             raise TypeError(f"Cannot force series '{series.name}' in {which_dataframe} to be from its type {var_type} to\n"
     76                             f"DESIRED type {must_be_this_type}. Check documentation for the possible coercion possibilities.\n"
     77                             f"This can be solved by changing the source data or is sometimes caused by\n"

TypeError: Cannot force series 'target' in COMPARED to be from its type FeatureType.TYPE_CAT to
DESIRED type FeatureType.TYPE_BOOL. Check documentation for the possible coercion possibilities.
This can be solved by changing the source data or is sometimes caused by
a feature type mismatch between source and compare dataframes.

If I explicitly supply the feat_cfg argument the result is the same.

feature_config = sv.FeatureConfig(force_num=['col1', 'col2', 'col3', 'col4'], force_cat='target')
compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"], feat_cfg=feature_config)
compareReport.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"

However, if I add 10 to the 'target' column (it will now have 11 and 12 as values), the report is generated without errors. Am I missing something or it is indeed a bug?

bug

opened by shgo 11

cast key to string

In some cases the key is a boolean value not a string. A keyerror is produced when a boolean value appears in key. Reference #42

I was able to recreate the issue as user described and was able to fix by casting key as string. It seems like the key should always be a string.

opened by a246530 10

TypeError: DatetimeIndex cannot perform the operation sum

I've a dataset which has date_time column of the format: 2020-07-12 11:37:25

I get the following error:

:date_time:                        |███                  | [ 14%]   00:00  -> (00:03 left)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-77-cbd387f7f43e> in <module>()
      1 #analyzing the dataset
----> 2 techglares_report = sv.analyze(df)

6 frames
/usr/local/lib/python3.6/dist-packages/sweetviz/sv_public.py in analyze(source, target_feat, feat_cfg, pairwise_analysis)
     11             pairwise_analysis: str = 'auto'):
     12     report = sweetviz.DataframeReport(source, target_feat, None,
---> 13                                       pairwise_analysis, feat_cfg)
     14     return report
     15 

/usr/local/lib/python3.6/dist-packages/sweetviz/dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
    214             # start = time.perf_counter()
    215             self.progress_bar.set_description(':' + f.source.name + '')
--> 216             self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
    217             self.progress_bar.update(1)
    218             # print(f"DONE FEATURE------> {f.source.name}"

/usr/local/lib/python3.6/dist-packages/sweetviz/series_analyzer.py in analyze_feature_to_dictionary(to_process)
     90 
     91     # Establish base stats
---> 92     add_series_base_stats_to_dict(to_process.source, to_process.source_counts, returned_feature_dict)
     93     if to_process.compare is not None:
     94         add_series_base_stats_to_dict(to_process.compare, to_process.compare_counts, compare_dict)

/usr/local/lib/python3.6/dist-packages/sweetviz/series_analyzer.py in add_series_base_stats_to_dict(series, counts, updated_dict)
     42     base_stats = updated_dict["base_stats"]
     43     num_total = counts["num_rows_total"]
---> 44     num_zeros = series[series == 0].sum()
     45     non_nan = counts["num_rows_with_data"]
     46     base_stats["total_rows"] = num_total

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, min_count, **kwargs)
  11180             skipna=skipna,
  11181             numeric_only=numeric_only,
> 11182             min_count=min_count,
  11183         )
  11184 

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   3901             numeric_only=numeric_only,
   3902             filter_type=filter_type,
-> 3903             **kwds,
   3904         )
   3905 

/usr/local/lib/python3.6/dist-packages/pandas/core/base.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   1058         if func is None:
   1059             raise TypeError(
-> 1060                 f"{type(self).__name__} cannot perform the operation {name}"
   1061             )
   1062         return func(skipna=skipna, **kwds)

TypeError: DatetimeIndex cannot perform the operation sum

I'm running sweetviz on Google Colab.

Is there any way to solve this error?

opened by vidyap-xgboost 9

Charset utf-8

First of all it’s awesome! Many thanks for your effort on data visualization! There is a small issue maybe, the html report lacks a meta tag showing the charset as “utf-8”; by adding it, the report can correctly show the MBCS characters and will catch eyes of more global analysts. Thanks again! Hope this project goes better!

opened by 95Key 9
show_html() doesn't shows the output jupyter notebook / lab
Hi there,

I try to use sweetviz in local:

Ubuntu 20.04

And in anaconda enterprise:

K8s with centOS

Both lead to the same issue. The display of the output in jupyter lab and notebook isn't visible.

Local:

Jupyter_lab=2.0 AE:

jupyter=1.0.0

jupyter_client=5.3.3

jupyter_console=6.0.0

jupyter_core=4.5.0

Jupyter_lab=1.1.3

ipython=7.8.0

The report has been generated but not display.

How to fix it?

Best
report output
opened by Christophe-pere 8
show_html Generate a Strange Layout of Analysis

Hey there, this is a great package and it is pretty handy. However, I run into a strange layout issue when generating the plot.

sweet = stz.analyze(data) sweet.show_html()

Above is the code I used, and I attached the result's layout as a png file below.

1: Would you kindly inform me of the option for layout in show_html()? 2: How I should solve this issue?

Thank you so much!

bug report output can't repro issue closing as cannot repro and no more reports

opened by HaoVJiang 7
error Font family ['STIXGeneral'] not found. Falling back to DejaVu Sans. occure

hi,

sweetviz does not work for a special table

i get the following error AAfindfont: Font family ['STIXGeneral'] not found. Falling back to DejaVu Sans. findfont: Font family ['DejaVu Sans'] not found. Falling back to DejaVu Sans. findfont: Font family ['DejaVu Sans'] not found. Falling back to DejaVu Sans. findfont: Font family ['DejaVu Sans'] not found. Falling back to DejaVu Sans. findfont: Font family ['DejaVu Sans'] not found. Falling back to DejaVu Sans. ... RecursionError: maximum recursion depth exceeded in comparison

how can i resolve this ?

thanks for any hint
bug

opened by fleschgordon 6
Error message in pip install sweetviz

Hi, attempting to install sweetviz using pip install sweetviz, but kept encountering following error message (reproduced below) Am using pandas version 1.0.1. Kindly advise, thanks.

Installing collected packages: importlib-resources, pandas, tqdm, sweetviz Attempting uninstall: pandas Found existing installation: pandas 1.0.1 Uninstalling pandas-1.0.1: ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'c:\\users\\65943\\anaconda3\\lib\\site-packages\\~andas\\_libs\\algos.cp37-win_amd64.pyd' Consider using the--useroption or check the permissions.

opened by AngShengJun 6
error in graph_associations.py line 210, ValueError: cannot convert float NaN to integer

Error thrown up during analyze(dataframe), right after :PAIRWISE DONE: and Creating Associations graph... Traceback (most recent call last):

File "", line 1, in myreport = sv.analyze(df)

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\sv_public.py", line 13, in analyze pairwise_analysis, feat_cfg)

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\dataframe_report.py", line 246, in init self._association_graphs["all"] = GraphAssoc(self, "all", self._associations)

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 165, in init f = corrplot(graph_data, dataframe_report)

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 410, in corrplot dataframe_report = dataframe_report

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 318, in heatmap cur_size[1] / 2, facecolor=value_to_color(color[index]),

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 210, in value_to_color ind = int(val_position * (n_colors - 1)) # target index in the color palette

ValueError: cannot convert float NaN to integer
bug

opened by cnblevins 6
json files.

I am fun of this library. I have used, I would like to use for all EDA, however it is giving error with a dataframe out of json file. After making a dataframe, I didn't think this would be a problem.

opened by gozdeydd 0
iteritems & mad deprecated

My code: import sweetviz as sv my_report = sv.analyze(df) my_report.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"

Warnings:

C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\dataframe_report.py:74: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. all_source_names = [cur_name for cur_name, cur_series in source_df.iteritems()] C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\dataframe_report.py:109: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. filtered_series_names_in_source = [cur_name for cur_name, cur_series in source_df.iteritems()

C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_cat.py:28: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in category_counts.iteritems(): C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_text.py:19: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in to_process.source_counts["value_counts_without_nan"].iteritems(): C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_text.py:19: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in to_process.source_counts["value_counts_without_nan"].iteritems(): C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_cat.py:28: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for item in category_counts.iteritems(): C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_numeric.py:25: FutureWarning: The 'mad' method is deprecated and will be removed in a future version. To compute the same result, you may do (df - df.mean()).abs().mean(). stats["mad"] = series.mad() C:\Users\thomp\AppData\Local\Programs\Python\Python311\Lib\site-packages\sweetviz\series_analyzer_numeric.py:25: FutureWarning: The 'mad' method is deprecated and will be removed in a future version. To compute the same result, you may do (df - df.mean()).abs().mean(). stats["mad"] = series.mad()

opened by tallgaijin 0
sweetviz shows wrong target rate for numerical variable
I am trying to plot the distribution of a variable and target rate in each of its value, sweetviz shows wrong target rate. Below is the reproducible code.

import pandas as pd import sweetviz as sv var1 = [0.]*10 + [1.]*10 + [2]*10 + [3]*10 target = [0]*2 + [1]*8 + [0]*4 +[1]*6 + [0]*8 + [1]*2 + [0]*10 df = pd.DataFrame({'var1':var1, 'target':target}) fc = sv.FeatureConfig(force_num=['var1']) report = sv.analyze([df, 'Train'], target_feat='target', feat_cfg=fc, pairwise_analysis='off') report.show_html('report.html') report.show_notebook('report.html')

I know that, if var1 is forcefully set to categorical, it shows the correct output. But it is not useful for me, since categorical variables sweetviz charts are not sorted based axis labels, but on the size of category.

How to make this work, by keep the variable numerical itself?
opened by shreeprasadbhat 0
Add an argument to silence the progress bar

Is it possible to add an argument to silence the progress bar ?

We want to use SweetViz in an automatique pipeline and store the report in a database. We already have a lot of logs in our process, hence we would love to get rid of the progress bar logs. We can deactivate tqdm before loading SweetViz, but that would also impact others parts of our process.

One solution might be to add an argument in DataframeReport.__init__ and set self.progress_bar to a fake logger.

opened by LexABzH 0
Not reading overrrde.ini - to remove logo

Hi,

I have tried to insert the line sv.config_parser.read("override.ini") into my code right after the import after I have set "show_logo = 0" under the layout section but I noticed it is still reading from the default .ini file as opposed to the new one (which I have duplicated from existing one before apply my change).

If I proceed to set the "show_logo = 0" in teh default ini file, it is working (logo no longer shows), Any advise?

Thanks.

opened by thongfam 0
Use html correlation heatmap (Associations) instead of picture.

If we have more than 100 features, no label is clear in current correlation map.

But if we create heatmap by seaborn or just pandas, user can zoom html to see characters clearly.

Further more, use html+js can provide hover infomation on heatmap cells.

opened by PaleNeutron 0

Releases(v2.1.4)

v2.1.4(Jun 14, 2022)

This version fixes deprecation warnings with the latest packages.
Source code(tar.gz)
Source code(zip)
v2.1.2(May 28, 2021)

Fixed an issue with comet.ml in some cases
Source code(tar.gz)
Source code(zip)
2.1.1(May 27, 2021)

Contains Comet.ml integration, as well as a few fixes.
Source code(tar.gz)
Source code(zip)
v2.0.9(Feb 26, 2021)

Added display of the number of "zeroes" in the summary of numerical features.
Source code(tar.gz)
Source code(zip)
v2.0.7(Feb 20, 2021)

New release, featuring Jupyter/Colab/etc. notebook integration, as well as lots of quality-of-life improvments & fixes.
Source code(tar.gz)
Source code(zip)
v1.1.2(Nov 24, 2020)

This is the first post-beta version of Sweetviz! Lots of stability fixes and a few additions. See changelog for full details.

Thank you to everyone who has contributed fixes and bug reports. This has been invaluable in improving the library, please keep them coming!
Source code(tar.gz)
Source code(zip)

Owner

Francois Bertrand

GitHub Repository

Simple spectra visualization tool for astronomers

SpecViewer A simple visualization tool for astronomers. Dependencies Python = 3.7.4 PyQt5 = 5.15.4 pyqtgraph == 0.10.0 numpy = 1.19.4 How to use py

5 Oct 07, 2021

Regress.me is an easy to use data visualization tool powered by Dash/Plotly.

Regress.me Regress.me is an easy to use data visualization tool powered by Dash/Plotly. Regress.me.-.Google.Chrome.2022-05-10.15-58-59.mp4 Get Started

14 Aug 14, 2022

nvitop, an interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management

An interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management.

1.3k Jan 02, 2023

Editor and Presenter for Manim Generated Content.

Editor and Presenter for Manim Generated Content. Take a look at the Working Example. More information can be found on the documentation. These Browse

149 Dec 29, 2022

The visual framework is designed on the idea of module and implemented by mixin method

Visual Framework The visual framework is designed on the idea of module and implemented by mixin method. Its biggest feature is the mixins module whic

9 Sep 19, 2022

Generate visualizations of GitHub user and repository statistics using GitHub Actions.

GitHub Stats Visualization Generate visualizations of GitHub user and repository statistics using GitHub Actions. This project is currently a work-in-

3 Dec 14, 2022

Compute and visualise incidence (reworking of the original incidence package)

incidence2 incidence2 is an R package that implements functions and classes to compute, handle and visualise incidence from linelist data. It refocuss

15 Nov 22, 2022

Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects

carcassonne_tools Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects NOTE NOTE NOTE The

1 Nov 08, 2021

A declarative (epi)genomics visualization library for Python

gos is a declarative (epi)genomics visualization library for Python. It is built on top of the Gosling JSON specification, providing a simplified interface for authoring interactive genomic visualiza

107 Dec 14, 2022

基于python爬虫爬取COVID-19爆发开始至今全球疫情数据并利用Echarts对数据进行分析与多样化展示。

COVID-19-Epidemic-Map 基于python爬虫爬取COVID-19爆发开始至今全球疫情数据并利用Echarts对数据进行分析与多样化展示。觉得项目还不错的话欢迎给一个star! 项目的源码可以正常运行，各个库的版本、数据库的建表语句、运行过程中遇到的坑以及解决方式在笔记.md中都

31 Dec 15, 2022

CompleX Group Interactions (XGI) provides an ecosystem for the analysis and representation of complex systems with group interactions.

XGI CompleX Group Interactions (XGI) is a Python package for the representation, manipulation, and study of the structure, dynamics, and functions of

67 Dec 28, 2022