A diff tool for language models

Overview

LMdiff

Qualitative comparison of large language models.

Demo & Paper: http://lmdiff.net

LMdiff is a MIT-IBM Watson AI Lab collaboration between:
Hendrik Strobelt (IBM, MIT) , Benjamin Hoover (IBM, GeorgiaTech), Arvind Satyanarayan (MIT), and Sebastian Gehrmann (HarvardNLP, Google).

Setting up / Quick start

From the root directory install Conda dependencies:

conda env create -f environment.yml
conda activate LMdiff
pip install -e .

Run the backend in development mode, deploying default models and configurations:

uvicorn backend.server:app --reload

Check the output for the right port (something like http://localhost:8000) and open in Browser.

Rebuild frontend

This is optional, because we have a compiled version checked into this repo.

cd client
npm install
npm run build:backend
cd ..

Using your own models

To use your own models:

  1. Create a TextDataset of phrases to analyze

    You can create the dataset file in several ways:

    From a text file So you have already collected all the phrases you want into a text file separated by newlines. Simply run:
    python scripts/make_dataset.py path/to/my_dataset.txt my_dataset -o folder/i/want/to/save/in
    
    From a python object (list of strings) Want to only work within python?
    from analysis.create_dataset import create_text_dataset_from_object
    
    my_collection = ["Phrase 1", "My second phrase"]
    create_text_dataset_from_object(my_collection, "easy-first-dataset", "human_created", "folder/i/want/to/save/in")
    From [Huggingface Datasets](https://huggingface.co/docs/datasets/) It can be created from one of Huggingface's provided datasets with:
    from analysis.create_dataset import create_text_dataset_from_hf_datasets
    import datasets
    import path_fixes as pf
    
    glue_mrpc = datasets.load_dataset("glue", "mrpc", split="train")
    name = "glue_mrpc_train"
    
    def ds2str(glue):
        """(e.g.,) Turn the first 50 sentences of the dataset into sentence information"""
        sentences = glue['sentence1'][:50]
        return "\n".join(sentences)
    
    create_text_dataset_from_hf_datasets(glue_mrpc, name, ds2str, ds_type="human_created", outfpath=pf.DATASETS)

    The dataset is a simple .txt file, with a new phrase on every line, and with a bit of required metadata header at the top. E.g.,

    ---
    checksum: 92247a369d5da32a44497be822d4a90879807a8751f5db3ff1926adbeca7ba28
    name: dataset-dummy
    type: human_created
    ---
    
    This is sentence 1, please analyze this.
    Every line is a new phrase to pass to the model.
    I can keep adding phrases, so long as they are short enough to pass to the model. They don't even need to be one sentence long.
    

    The required fields in the header:

    • checksum :: A unique identifier for the state of that file. It can be calculated however you wish, but it should change if anything at all changes in the contents below (e.g., two phrases are transposed, a new phase added, or a period is added after a sentence)
    • name :: The name of the dataset.
    • type :: Either human_created or machine_generated if you want to compare on a dataset that was spit out by another model

    Each line in the contents is a new phrase to compare in the language model. A few warnings:

    • Make sure the phrases are short enough that they can be passed to the model given your memory constraints
    • The dataset is fully loaded into memory to serve to the front end, so avoid creating a text file that is too large to fit in memory.
  2. Choose two comparable models

    Two models are comparable if they:

    1. Have the exact same tokenization scheme
    2. Have the exact same vocabulary

    This allows us to do tokenwise comparisons on the model. For example, this could be:

    • A pretrained model and a finetuned version of it (e.g., distilbert-base-cased and distilbert-base-uncased-finetuned-sst-2-english)
    • A distilled version mimicking the original model (e.g., bert-base-cased and distilbert-base-cased)
    • Different sizes of the same model architecture (e.g., gpt2 and gpt2-large)
  3. Preprocess the models on the chosen dataset

    python scripts/preprocess.py all gpt2-medium distilgpt2 data/datasets/glue_mrpc_1+2.csv --output-dir data/sample/gpt2-glue-comparisons
    
  4. Start the app

    python backend/server/main.py --config data/sample/gpt2-glue-comparisons
    

    Note that if you use a different tokenization scheme than the default gpt, you will need to tell the frontend how to visualize the tokens. For example, a bert based tokenization scheme:

    python backend/server/main.py --config data/sample/bert-glue-comparisons -t bert
    

Architecture

LMdiff Architecture

(Admin) Getting the Data

Models and datasets for the deployed app are stored on the cloud and require a private .dvc/config file.

With the correct config:

dvc pull

will populate the data directories correctly for the deployed version.

Testing
make test

or

python -m pytest tests

All tests are stored in tests.

Frontend

We like pnpm but npm works just as well. We also like Vite for its rapid hot module reloading and pleasant dev experience. This repository uses Vue as a reactive framework.

From the root directory:

cd client
pnpm install --save-dev
pnpm run dev

If you want to hit the backend routes, make sure to also run the uvicorn backend.server:app command from the project root.

For production (serve with Vite)
pnpm run serve
For production (serve with this repo's FastAPI server)
cd client
pnpm run build:backend
cd ..
uvicorn backend.server:app

Or the gunicorn command from above.

All artifacts are stored in the client/dist directory with the appropriate basepath.

For production (serve with external tooling like NGINX)
pnpm run build

All artifacts are stored in the client/dist directory.

Notes

  • Check the endpoints by visiting <localhost>:<port>/docs
Owner
Hendrik Strobelt
IBM Research // MIT-IBM AI Lab Updates on Twitter: @hen_str
Hendrik Strobelt
Open source Python implementation of the HDR+ photography pipeline

hdrplus-python Open source Python implementation of the HDR+ photography pipeline, originally developped by Google and presented in a 2016 article. Th

77 Jan 05, 2023
LRBoost is a scikit-learn compatible approach to performing linear residual based stacking/boosting.

LRBoost is a sckit-learn compatible package for linear residual boosting. LRBoost combines a linear estimator and a non-linear estimator to leverage t

Andrew Patton 5 Nov 23, 2022
Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)

Discovering Non-monotonic Autoregressive Orderings with Variational Inference Description This package contains the source code implementation of the

Xuanlin (Simon) Li 10 Dec 29, 2022
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
La source de mon module 'pyfade' disponible sur Pypi.

Version: 1.2 Introduction Pyfade est un module permettant de créer des dégradés colorés. Il vous permettra de changer chaque ligne de votre texte par

Billy 20 Sep 12, 2021
tmm_fast is a lightweight package to speed up optical planar multilayer thin-film device computation.

tmm_fast tmm_fast or transfer-matrix-method_fast is a lightweight package to speed up optical planar multilayer thin-film device computation. It is es

26 Dec 11, 2022
PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)

mlp-mixer-pytorch PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021) Usage import torch from mlp_mixer

isaac 27 Jul 09, 2022
Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

Intel® Neural Compressor targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep l

Intel Corporation 846 Jan 04, 2023
tf2-keras implement yolov5

YOLOv5 in tesnorflow2.x-keras yolov5数据增强jupyter示例 Bilibili视频讲解地址: 《yolov5 解读,训练,复现》 Bilibili视频讲解PPT文件: yolov5_bilibili_talk_ppt.pdf Bilibili视频讲解PPT文件:

yangcheng 254 Jan 08, 2023
Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

A Unified Objective for Novel Class Discovery This is the official repository for the paper: A Unified Objective for Novel Class Discovery Enrico Fini

Enrico Fini 118 Dec 26, 2022
Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks

This is the code associated with the paper Predicting Semantic Map Representations from Images with Pyramid Occupancy Networks, published at CVPR 2020.

Thomas Roddick 219 Dec 20, 2022
A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

Corp-Rel is a PoC of Corpartion Relationship Knowledge Graph System. It's built on top of the Open Source Graph Database: Nebula Graph with a dataset

Wey Gu 20 Dec 11, 2022
MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets)

MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets) Using mixup data augmentation as reguliraztion and tuning the hyper par

Bhanu 2 Jan 16, 2022
Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

ASFormer: Transformer for Action Segmentation This repo provides training & inference code for BMVC 2021 paper: ASFormer: Transformer for Action Segme

42 Dec 23, 2022
Torchlight2 lan game server tool - A message forwarding tool for Torchlight 2 lan game

Torchlight 2 Lan Game Server Tool A message forwarding tool for Torchlight 2 lan

Huaijun Jiang 3 Nov 01, 2022
Code for paper: Towards Tokenized Human Dynamics Representation

Video Tokneization Codebase for video tokenization, based on our paper Towards Tokenized Human Dynamics Representation. Prerequisites (tested under Py

Kenneth Li 20 May 31, 2022
Task-related Saliency Network For Few-shot learning

Task-related Saliency Network For Few-shot learning This is an official implementation in Tensorflow of TRSN. Abstract An essential cue of human wisdo

1 Nov 18, 2021
This project provides a stock market environment using OpenGym with Deep Q-learning and Policy Gradient.

Stock Trading Market OpenAI Gym Environment with Deep Reinforcement Learning using Keras Overview This project provides a general environment for stoc

Kim, Ki Hyun 769 Dec 25, 2022
[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data This repo provides the source code & data of our paper: Break-It-Fix-It: Unsupervised

Michihiro Yasunaga 86 Nov 30, 2022
Контрольная работа по математическим методам машинного обучения

ML-MathMethods-Test Контрольная работа по математическим методам машинного обучения. Вычисление основных статистик, диаграмм и графиков, проверка разл

Stas Ivanovskii 1 Jan 06, 2022