Debugging, monitoring and visualization for Python Machine Learning and Data Science

Overview

Welcome to TensorWatch

TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Microsoft Research. It works in Jupyter Notebook to show real-time visualizations of your machine learning training and perform several other key analysis tasks for your models and data.

TensorWatch is designed to be flexible and extensible so you can also build your own custom visualizations, UIs, and dashboards. Besides traditional "what-you-see-is-what-you-log" approach, it also has a unique capability to execute arbitrary queries against your live ML training process, return a stream as a result of the query and view this stream using your choice of a visualizer (we call this Lazy Logging Mode).

TensorWatch is under heavy development with a goal of providing a platform for debugging machine learning in one easy to use, extensible, and hackable package.

TensorWatch in Jupyter Notebook

How to Get It

pip install tensorwatch

TensorWatch supports Python 3.x and is tested with PyTorch 0.4-1.x. Most features should also work with TensorFlow eager tensors. TensorWatch uses graphviz to create network diagrams and depending on your platform sometime you might need to manually install it.

How to Use It

Quick Start

Here's simple code that logs an integer and its square as a tuple every second to TensorWatch:

import tensorwatch as tw
import time

# streams will be stored in test.log file
w = tw.Watcher(filename='test.log')

# create a stream for logging
s = w.create_stream(name='metric1')

# generate Jupyter Notebook to view real-time streams
w.make_notebook()

for i in range(1000):
    # write x,y pair we want to log
    s.write((i, i*i))

    time.sleep(1)

When you run this code, you will notice a Jupyter Notebook file test.ipynb gets created in your script folder. From a command prompt type jupyter notebook and select test.ipynb. Choose Cell > Run all in the menu to see the real-time line graph as values get written in your script.

Here's the output you will see in Jupyter Notebook:

TensorWatch in Jupyter Notebook

To dive deeper into the various other features, please see Tutorials and notebooks.

How does this work?

When you write to a TensorWatch stream, the values get serialized and sent to a TCP/IP socket as well as the file you specified. From Jupyter Notebook, we load the previously logged values from the file and then listen to that TCP/IP socket for any future values. The visualizer listens to the stream and renders the values as they arrive.

Ok, so that's a very simplified description. The TensorWatch architecture is actually much more powerful. Almost everything in TensorWatch is a stream. Files, sockets, consoles and even visualizers are streams themselves. A cool thing about TensorWatch streams is that they can listen to any other streams. This allows TensorWatch to create a data flow graph. This means that a visualizer can listen to many streams simultaneously, each of which could be a file, a socket or some other stream. You can recursively extend this to build arbitrary data flow graphs. TensorWatch decouples streams from how they get stored and how they get visualized.

Visualizations

In the above example, the line graph is used as the default visualization. However, TensorWatch supports many other diagram types including histograms, pie charts, scatter charts, bar charts and 3D versions of many of these plots. You can log your data, specify the chart type you want and let TensorWatch take care of the rest.

One of the significant strengths of TensorWatch is the ability to combine, compose, and create custom visualizations effortlessly. For example, you can choose to visualize an arbitrary number of streams in the same plot. Or you can visualize the same stream in many different plots simultaneously. Or you can place an arbitrary set of visualizations side-by-side. You can even create your own custom visualization widget simply by creating a new Python class, implementing a few methods.

Comparing Results of Multiple Runs

Each TensorWatch stream may contain a metric of your choice. By default, TensorWatch saves all streams in a single file, but you could also choose to save each stream in separate files or not to save them at all (for example, sending streams over sockets or into the console directly, zero hit to disk!). Later you can open these streams and direct them to one or more visualizations. This design allows you to quickly compare the results from your different experiments in your choice of visualizations easily.

Training within Jupyter Notebook

Often you might prefer to do data analysis, ML training, and testing - all from within Jupyter Notebook instead of from a separate script. TensorWatch can help you do sophisticated, real-time visualizations effortlessly from code that is run within a Jupyter Notebook end-to-end.

Lazy Logging Mode

A unique feature in TensorWatch is the ability to query the live running process, retrieve the result of this query as a stream and direct this stream to your preferred visualization(s). You don't need to log any data beforehand. We call this new way of debugging and visualization a lazy logging mode.

For example, as seen below, we visualize input and output image pairs, sampled randomly during the training of an autoencoder on a fruits dataset. These images were not logged beforehand in the script. Instead, the user sends query as a Python lambda expression which results in a stream of images that gets displayed in the Jupyter Notebook:

TensorWatch in Jupyter Notebook

See Lazy Logging Tutorial.

Pre-Training and Post-Training Tasks

TensorWatch leverages several excellent libraries including hiddenlayer, torchstat, Visual Attribution to allow performing the usual debugging and analysis activities in one consistent package and interface.

For example, you can view the model graph with tensor shapes with a one-liner:

Model graph for Alexnet

You can view statistics for different layers such as flops, number of parameters, etc:

Model statistics for Alexnet

See notebook.

You can view the dataset in a lower dimensional space using techniques such as t-SNE:

t-SNE visualization for MNIST

See notebook.

Prediction Explanations

We wish to provide various tools for explaining predictions to help debugging models. Currently, we offer several explainers for convolutional networks, including Lime. For example, the following highlights the areas that cause the Resnet50 model to make a prediction for class 240 for the Imagenet dataset:

CNN prediction explanation

See notebook.

Tutorials

Paper

More technical details are available in TensorWatch paper (EICS 2019 Conference). Please cite this as:

@inproceedings{tensorwatch2019eics,
  author    = {Shital Shah and Roland Fernandez and Steven M. Drucker},
  title     = {A system for real-time interactive analysis of deep learning training},
  booktitle = {Proceedings of the {ACM} {SIGCHI} Symposium on Engineering Interactive
               Computing Systems, {EICS} 2019, Valencia, Spain, June 18-21, 2019},
  pages     = {16:1--16:6},
  year      = {2019},
  crossref  = {DBLP:conf/eics/2019},
  url       = {https://arxiv.org/abs/2001.01215},
  doi       = {10.1145/3319499.3328231},
  timestamp = {Fri, 31 May 2019 08:40:31 +0200},
  biburl    = {https://dblp.org/rec/bib/conf/eics/ShahFD19},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contribute

We would love your contributions, feedback, questions, and feature requests! Please file a Github issue or send us a pull request. Please review the Microsoft Code of Conduct and learn more.

Contact

Join the TensorWatch group on Facebook to stay up to date or ask any questions.

Credits

TensorWatch utilizes several open source libraries for many of its features. These include: hiddenlayer, torchstat, Visual-Attribution, pyzmq, receptivefield, nbformat. Please see install_requires section in setup.py for upto date list.

License

This project is released under the MIT License. Please review the License file for more details.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
A command line tool for visualizing CSV/spreadsheet-like data

PerfPlotter Read data from CSV files using pandas and generate interactive plots using bokeh, which can then be embedded into HTML pages and served by

Gino Mempin 0 Jun 25, 2022
Generate the report for OCULTest.

Sample report generated in this function Usage example from utils.gen_report import generate_report if __name__ == '__main__': # def generate_rep

Philip Guo 1 Mar 10, 2022
A python script editor for napari based on PyQode.

napari-script-editor A python script editor for napari based on PyQode. This napari plugin was generated with Cookiecutter using with @napari's cookie

Robert Haase 9 Sep 20, 2022
股票行情实时数据接口-A股,完全免费的沪深证券股票数据-中国股市,python最简封装的API接口

股票行情实时数据接口-A股,完全免费的沪深证券股票数据-中国股市,python最简封装的API接口,包含日线,历史K线,分时线,分钟线,全部实时采集,系统包括新浪腾讯双数据核心采集获取,自动故障切换,STOCK数据格式成DataFrame格式,可用来查询研究量化分析,股票程序自动化交易系统.为量化研究者在数据获取方面极大地减轻工作量,更加专注于策略和模型的研究与实现。

dev 572 Jan 08, 2023
A research of IT labor market based especially on hh.ru. Salaries, rate of technologies and etc.

hh_ru_research Проект реализован в учебных целях анализа рынка труда, в особенности по hh.ru Input data В качестве входных данных используются сериали

3 Sep 07, 2022
Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

AutoViz Automatically Visualize any dataset, any size with a single line of code. AutoViz performs automatic visualization of any dataset with one lin

AutoViz and Auto_ViML 1k Jan 02, 2023
Learn Basic to advanced level Data visualisation techniques from this Repository

Data visualisation Hey, You can learn Basic to advanced level Data visualisation techniques from this Repository. Data visualization is the graphic re

Shashank dwivedi 16 Jan 03, 2023
coordinate to draw the nimbus logo on the graffitiwall

This is a community effort to draw the nimbus logo on beaconcha.in's graffitiwall. get started clone repo with git clone https://github.com/tennisbowl

4 Apr 04, 2022
Statistics and Visualization of acceptance rate, main keyword of CVPR 2021 accepted papers for the main Computer Vision conference (CVPR)

Statistics and Visualization of acceptance rate, main keyword of CVPR 2021 accepted papers for the main Computer Vision conference (CVPR)

Hoseong Lee 78 Aug 23, 2022
Here are my graphs for hw_02

Let's Have A Look At Some Graphs! Graph 1: State Mentions in Congressperson's Tweets on 10/01/2017 The graph below uses this data set to demonstrate h

7 Sep 02, 2022
UNMAINTAINED! Renders beautiful SVG maps in Python.

Kartograph is not maintained anymore As you probably already guessed from the commit history in this repo, Kartograph.py is not maintained, which mean

1k Dec 09, 2022
A Python wrapper of Neighbor Retrieval Visualizer (NeRV)

PyNeRV A Python wrapper of the dimensionality reduction algorithm Neighbor Retrieval Visualizer (NeRV) Compile Set up the paths in Makefile then make.

2 Aug 29, 2021
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 694 Jan 04, 2023
Browse Dash docsets inside emacs

Helm Dash What's it This package uses Dash docsets inside emacs to browse documentation. Here's an article explaining the basic usage of it. It doesn'

504 Dec 15, 2022
Visualize your pandas data with one-line code

PandasEcharts 简介 基于pandas和pyecharts的可视化工具 安装 pip 安装 $ pip install pandasecharts 源码安装 $ git clone https://github.com/gamersover/pandasecharts $ cd pand

陈华杰 2 Apr 13, 2022
Sky attention heatmap of submissions to astrometry.net

astroheat Installation Requires Python 3.6+, Tested with Python 3.9.5 Install library dependencies pip install -r requirements.txt The program require

4 Jun 20, 2022
🗾 Streamlit Component for rendering kepler.gl maps

streamlit-keplergl 🗾 Streamlit Component for rendering kepler.gl maps in a streamlit app. 🎈 Live Demo 🎈 Installation pip install streamlit-keplergl

Christoph Rieke 39 Dec 14, 2022
flask extension for integration with the awesome pydantic package

Flask-Pydantic Flask extension for integration of the awesome pydantic package with Flask. Installation python3 -m pip install Flask-Pydantic Basics v

249 Jan 06, 2023
An application that allows you to design and test your own stock trading algorithms in an attempt to beat the market.

StockBot is a Python application for designing and testing your own daily stock trading algorithms. Installation Use the

Ryan Cullen 280 Dec 19, 2022
Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Somoclu Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing

Peter Wittek 239 Nov 10, 2022