A diff tool for language models

Overview

LMdiff

Qualitative comparison of large language models.

Demo & Paper: http://lmdiff.net

LMdiff is a MIT-IBM Watson AI Lab collaboration between:
Hendrik Strobelt (IBM, MIT) , Benjamin Hoover (IBM, GeorgiaTech), Arvind Satyanarayan (MIT), and Sebastian Gehrmann (HarvardNLP, Google).

Setting up / Quick start

From the root directory install Conda dependencies:

conda env create -f environment.yml
conda activate LMdiff
pip install -e .

Run the backend in development mode, deploying default models and configurations:

uvicorn backend.server:app --reload

Check the output for the right port (something like http://localhost:8000) and open in Browser.

Rebuild frontend

This is optional, because we have a compiled version checked into this repo.

cd client
npm install
npm run build:backend
cd ..

Using your own models

To use your own models:

  1. Create a TextDataset of phrases to analyze

    You can create the dataset file in several ways:

    From a text file So you have already collected all the phrases you want into a text file separated by newlines. Simply run:
    python scripts/make_dataset.py path/to/my_dataset.txt my_dataset -o folder/i/want/to/save/in
    
    From a python object (list of strings) Want to only work within python?
    from analysis.create_dataset import create_text_dataset_from_object
    
    my_collection = ["Phrase 1", "My second phrase"]
    create_text_dataset_from_object(my_collection, "easy-first-dataset", "human_created", "folder/i/want/to/save/in")
    From [Huggingface Datasets](https://huggingface.co/docs/datasets/) It can be created from one of Huggingface's provided datasets with:
    from analysis.create_dataset import create_text_dataset_from_hf_datasets
    import datasets
    import path_fixes as pf
    
    glue_mrpc = datasets.load_dataset("glue", "mrpc", split="train")
    name = "glue_mrpc_train"
    
    def ds2str(glue):
        """(e.g.,) Turn the first 50 sentences of the dataset into sentence information"""
        sentences = glue['sentence1'][:50]
        return "\n".join(sentences)
    
    create_text_dataset_from_hf_datasets(glue_mrpc, name, ds2str, ds_type="human_created", outfpath=pf.DATASETS)

    The dataset is a simple .txt file, with a new phrase on every line, and with a bit of required metadata header at the top. E.g.,

    ---
    checksum: 92247a369d5da32a44497be822d4a90879807a8751f5db3ff1926adbeca7ba28
    name: dataset-dummy
    type: human_created
    ---
    
    This is sentence 1, please analyze this.
    Every line is a new phrase to pass to the model.
    I can keep adding phrases, so long as they are short enough to pass to the model. They don't even need to be one sentence long.
    

    The required fields in the header:

    • checksum :: A unique identifier for the state of that file. It can be calculated however you wish, but it should change if anything at all changes in the contents below (e.g., two phrases are transposed, a new phase added, or a period is added after a sentence)
    • name :: The name of the dataset.
    • type :: Either human_created or machine_generated if you want to compare on a dataset that was spit out by another model

    Each line in the contents is a new phrase to compare in the language model. A few warnings:

    • Make sure the phrases are short enough that they can be passed to the model given your memory constraints
    • The dataset is fully loaded into memory to serve to the front end, so avoid creating a text file that is too large to fit in memory.
  2. Choose two comparable models

    Two models are comparable if they:

    1. Have the exact same tokenization scheme
    2. Have the exact same vocabulary

    This allows us to do tokenwise comparisons on the model. For example, this could be:

    • A pretrained model and a finetuned version of it (e.g., distilbert-base-cased and distilbert-base-uncased-finetuned-sst-2-english)
    • A distilled version mimicking the original model (e.g., bert-base-cased and distilbert-base-cased)
    • Different sizes of the same model architecture (e.g., gpt2 and gpt2-large)
  3. Preprocess the models on the chosen dataset

    python scripts/preprocess.py all gpt2-medium distilgpt2 data/datasets/glue_mrpc_1+2.csv --output-dir data/sample/gpt2-glue-comparisons
    
  4. Start the app

    python backend/server/main.py --config data/sample/gpt2-glue-comparisons
    

    Note that if you use a different tokenization scheme than the default gpt, you will need to tell the frontend how to visualize the tokens. For example, a bert based tokenization scheme:

    python backend/server/main.py --config data/sample/bert-glue-comparisons -t bert
    

Architecture

LMdiff Architecture

(Admin) Getting the Data

Models and datasets for the deployed app are stored on the cloud and require a private .dvc/config file.

With the correct config:

dvc pull

will populate the data directories correctly for the deployed version.

Testing
make test

or

python -m pytest tests

All tests are stored in tests.

Frontend

We like pnpm but npm works just as well. We also like Vite for its rapid hot module reloading and pleasant dev experience. This repository uses Vue as a reactive framework.

From the root directory:

cd client
pnpm install --save-dev
pnpm run dev

If you want to hit the backend routes, make sure to also run the uvicorn backend.server:app command from the project root.

For production (serve with Vite)
pnpm run serve
For production (serve with this repo's FastAPI server)
cd client
pnpm run build:backend
cd ..
uvicorn backend.server:app

Or the gunicorn command from above.

All artifacts are stored in the client/dist directory with the appropriate basepath.

For production (serve with external tooling like NGINX)
pnpm run build

All artifacts are stored in the client/dist directory.

Notes

  • Check the endpoints by visiting <localhost>:<port>/docs
Owner
Hendrik Strobelt
IBM Research // MIT-IBM AI Lab Updates on Twitter: @hen_str
Hendrik Strobelt
[CoRL 2021] A robotics benchmark for cross-embodiment imitation.

x-magical x-magical is a benchmark extension of MAGICAL specifically geared towards cross-embodiment imitation. The tasks still provide the Demo/Test

Kevin Zakka 36 Nov 26, 2022
Unifying Global-Local Representations in Salient Object Detection with Transformer

GLSTR (Global-Local Saliency Transformer) This is the official implementation of paper "Unifying Global-Local Representations in Salient Object Detect

11 Aug 24, 2022
It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

CLIP-ONNX It is a simple library to speed up CLIP inference up to 3x (K80 GPU) Usage Install clip-onnx module and requirements first. Use this trick !

Gerasimov Maxim 93 Dec 20, 2022
Code for the paper "On the Power of Edge Independent Graph Models"

Edge Independent Graph Models Code for the paper: "On the Power of Edge Independent Graph Models" Sudhanshu Chanpuriya, Cameron Musco, Konstantinos So

Konstantinos Sotiropoulos 0 Oct 26, 2021
Keras documentation, hosted live at keras.io

Keras.io documentation generator This repository hosts the code used to generate the keras.io website. Generating a local copy of the website pip inst

Keras 2k Jan 08, 2023
Audio Visual Emotion Recognition using TDA

Audio Visual Emotion Recognition using TDA RAVDESS database with two datasets analyzed: Video and Audio dataset: Audio-Dataset: https://www.kaggle.com

Combinatorial Image Analysis research group 3 May 11, 2022
Learning 3D Part Assembly from a Single Image

Learning 3D Part Assembly from a Single Image This repository contains a PyTorch implementation of the paper: Learning 3D Part Assembly from A Single

18 Dec 21, 2022
[SIGGRAPH Asia 2019] Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning

AGIS-Net Introduction This is the official PyTorch implementation of the Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning. paper | suppl

Yue Gao 102 Jan 02, 2023
Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21

Skeletal-GNN Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21 Various deep learning techniques have been propose

37 Oct 23, 2022
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

ttopt Description Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train (TT) format and maximu

5 May 23, 2022
Camera ready code repo for the NeuRIPS 2021 paper: "Impression learning: Online representation learning with synaptic plasticity".

Impression-Learning-Camera-Ready Camera ready code repo for the NeuRIPS 2021 paper: "Impression learning: Online representation learning with synaptic

2 Feb 09, 2022
Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

Yihui He 1k Jan 03, 2023
PyTorch implementation of UPFlow (unsupervised optical flow learning)

UPFlow: Upsampling Pyramid for Unsupervised Optical Flow Learning By Kunming Luo, Chuan Wang, Shuaicheng Liu, Haoqiang Fan, Jue Wang, Jian Sun Megvii

kunming luo 87 Dec 20, 2022
Re-TACRED: Addressing Shortcomings of the TACRED Dataset

Re-TACRED Re-TACRED: Addressing Shortcomings of the TACRED Dataset

George Stoica 40 Dec 10, 2022
Motion and Shape Capture from Sparse Markers

MoSh++ This repository contains the official chumpy implementation of mocap body solver used for AMASS: AMASS: Archive of Motion Capture as Surface Sh

Nima Ghorbani 135 Dec 23, 2022
You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling Transformer-based models are widely used in natural language processi

Zhanpeng Zeng 12 Jan 01, 2023
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

BossNAS This repository contains PyTorch evaluation code, retraining code and pretrained models of our paper: BossNAS: Exploring Hybrid CNN-transforme

Changlin Li 127 Dec 26, 2022
A web application that provides real time temperature and humidity readings of a house.

About A web application which provides real time temperature and humidity readings of a house. If you're interested in the data collected so far click

Ben Thompson 3 Jan 28, 2022
DetCo: Unsupervised Contrastive Learning for Object Detection

DetCo: Unsupervised Contrastive Learning for Object Detection arxiv link News Sparse RCNN+DetCo improves from 45.0 AP to 46.5 AP(+1.5) with 3x+ms trai

Enze Xie 234 Dec 18, 2022