OSLO: Open Source framework for Large-scale transformer Optimization

Related tags

Deep Learningoslo
Overview


O S L O

Open Source framework for Large-scale transformer Optimization

GitHub release Apache 2.0 Docs Issues



What's New:

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

  • If the C++ is available:
CPP_AVAILABLE=1 pip install oslo-core
  • If the C++ is not available:
CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed 
from oslo import GPTJForCausalLM

# 1. 3D Parallelism
model = GPTJForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
)

# 2. Kernel Fusion
model = model.fuse()

# 3. DeepSpeed Support
engines = deepspeed.initialize(
    model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
)

# 4. Data Processing
from oslo import (
    DatasetPreprocessor, 
    DatasetBlender, 
    DatasetForCausalLM, 
    ...    
)

OSLO offers the following features.

  • 3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
  • Kernel Fusion: A GPU optimization method to increase training and inference speed.
  • DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
  • Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo,
  author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
  title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
  howpublished = {\url{https://github.com/tunib-ai/oslo}},
  year         = {2021},
}

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

Comments
  • [WIP] Implement ZeRO Stage 3 (FSDP)

    [WIP] Implement ZeRO Stage 3 (FSDP)

    Title

    • Implement ZeRO Stage 3 (FullyShardedDataParallel)

    Description

    • [x] Add reduce_scatter_bucketer.py
      • [x] Add test_reduce_scatter_bucketer.py
    • [x] Add flatten_params_wrapper.py
      • [x] Add test_flatten_params_wrapper.py
    • [x] Add containers.py
      • [x] Add test_containers.py
    • [x] Add parallel.py
      • [x] Add test_parallel.py
    • [x] Add fsdp_optim_utils.py
    • [x] Update fsdp.py
    • [x] Add auto_wrap.py
      • [x] Add test_wrap.py
    opened by jinok2im 9
  • FusedAdam & CPUAdam

    FusedAdam & CPUAdam

    Title

    -FusedAdam & CPUAdam

    Description

    • Implement FusedAdam & CPUAdam

    Tasks

    • [x] Implement FusedAdam
    • [x] implement CPUAdam
    • [x] Test FusedAdam
    • [x] Test CPUAdam
    • [x] Test FusedSclaeMaskSoftmax (Name changed)
    opened by cozytk 6
  • [WIP] Add data processing modules referring to the lassl

    [WIP] Add data processing modules referring to the lassl

    Title

    • add data processing modules referring to the lassl

    Description

    • brought data processing functions that fit gpt2 with reference to lassl

    Linked Issues

    • None
    opened by gimmaru 6
  • Implementation of Sequential Parallelism

    Implementation of Sequential Parallelism

    SP with DP implementation

    • Implemented SP wrapper with DP

    Description

    • SequenceDataParallel works like native torch DDP with SP
    • you can find details in the file oslo/tests/torch/nn/parallal/data_parallel/test_sp.py
    opened by ohwi 5
  • Update data collators and Add models

    Update data collators and Add models

    Title

    • Update data collators and Add models

    Description

    • Updated data collators to utilize sequence parallel in Oslo trainer
    • Add models by referring to the transformers library
    opened by gimmaru 3
  • Implement Expert Parallel and Test for Initialization and Forward Pass

    Implement Expert Parallel and Test for Initialization and Forward Pass

    Title

    • Implement Expert Parallel and Test for Initialization and Forward Pass

    Description

    • Implement Wrapper, Modules and Features for Expert Parallel
    • Implement mapping_utils._ParallelMappingForHuggingFace as super class of _TensorParallelMappingForHuggingFace and _ExpertParallelMappingForHuggingFace
    • Test initialization and forward pass for expert parallel
    opened by scsc0511 3
  • Integrate Sequence Parallelism branches

    Integrate Sequence Parallelism branches

    Title

    • Sequence parallelism (feat. @reniew, @ohwi, @l-yohai)

    Description

    • This PR is Integration of SP current version. But there is something wrong.
    • We will fix the bugs for the coming week and write test modules according to the SP design.
    • It did not include the contents of the branch that worked for the test.
    opened by l-yohai 3
  • implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers

    implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers

    • implement tp-3d wrapper
    • rank transpose problem (tensor_3d_input_rank <-> tensor_3d_output_rank) by implementing ranking transpose function.
    • revise tp-3d layers for huggingface compatibility
    • implement tp-3d test codes
    • refactor all tp test codes
    • unify format across all tensor parallel modules.
    opened by bzantium 2
  • Refactoring MultiheadAttention with todo anchors

    Refactoring MultiheadAttention with todo anchors

    Title

    • Refactoring MultiheadAttention with todo anchors

    Description

    • Refactoring oslo/torch/nn/modules/functional/multi_head_attention_forward.py.
    • Remove unnecessary or unintended code and clean up annotations.
    • Unify return format and the variable name with native torch.

    Additionally, I need to test attention_mask. However, it seems that it can proceed with this part after FusedScaleMaskSoftmax is integrated.

    cc. @hyunwoongko @ohwi

    opened by l-yohai 2
  • Add tp-1d layers testing

    Add tp-1d layers testing

    • Add testing for tp-1d layers: col_linear, row_linear, vocab_embedding_1d
    • modify number to integer variable like summa_dim, world_size cc: @hyunwoongko
    opened by bzantium 2
  • [WIP] add test code of sp training

    [WIP] add test code of sp training

    Title

    • SP Model Test Code

    Description

    Writing a test code to verify that the gradient and loss values of the model are the same when the sequence parallelism is applied.

    • WIP - merging @ohwi 's test code comparing SP of ColossalAI and simple learning model.
    opened by l-yohai 2
Releases(v2.0.2)
  • v2.0.2(Aug 25, 2022)

  • v2.0.1(Feb 20, 2022)

  • v2.0.0(Feb 14, 2022)

    Official release of OSLO 2.0.0 🎉🎉

    This version of OSLO provides the following features:

    • Tensor model parallelism
    • Efficient activation checkpointing
    • Kernel fusion

    We plan to add the pipeline model parallelism and the ZeRO optimization in the next versions.


    New feature: Kernel Fusion

    {
      "kernel_fusion": {
        "enable": "bool",
        "memory_efficient_fusion": "bool",
        "custom_cuda_kernels": "list"
      }
    }
    

    For more information, please check the kernel fusion tutorial

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0a2(Feb 2, 2022)

  • v2.0.0a1(Feb 2, 2022)

    Add activation checkpointing

    You can use efficient activation checkpointing using OSLO with the following configuration.

    model = oslo.initialize(
        model,
        config={
            "model_parallelism": {
                "enable": True,
                "tensor_parallel_size": YOUR_TENSOR_PARALLEL_SIZE,
            },
            "activation_checkpointing": {
                "enable": True,
                "cpu_checkpointing": True,
                "partitioned_checkpointing": True,
                "contiguous_checkpointing": True,
            },
        },
    )
    

    Tutorial: https://tunib-ai.github.io/oslo/TUTORIALS/activation_checkpointing.html

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0a0(Jan 30, 2022)

    New API

    • We paid homage to DeepSpeed. Now it's easier and simpler to use.
    import oslo
    
    model = oslo.initialize(model, config="oslo-config.json")
    

    Add new models

    • Albert
    • Bert
    • Bart
    • T5
    • GPT2
    • GPTNeo
    • GPTJ
    • Electra
    • Roberta

    Add document

    • https://tunib-ai.github.io/oslo

    Remove old pipeline parallelism, kernel fusion code

    • We'll refurbish them using the latest methods
      • Kernel fusion: AOTAutograd
      • Pipeline parallelism: Sagemaker PP
    Source code(tar.gz)
    Source code(zip)
  • v.1.1.2(Jan 15, 2022)

    Updates

    [#7] Selective Kernel Fusion [#9] Fix argument bug

    New Feature: Selective Kernel Fusion

    Since version 1.1.2, you can fuse only partial kernels, not all kernels. Currently, only Attention class and MLP class are supported.

    from oslo import GPT2MLP, GPT2Attention
    
    # MLP only fusion
    model.fuse([GPT2MLP])
    
    # Attention only fusion
    model.fuse([GPT2Attention])
    
    # MLP + Attention fusion
    model.fuse([GPT2MLP, GPT2Attention])
    
    Source code(tar.gz)
    Source code(zip)
  • v1.1(Dec 29, 2021)

    [#3] Add deployment launcher of Parallelformers into OSLO.

    from oslo import GPTNeoForCausalLM
    
    model = GPTNeoForCausalLM.from_pretrained_with_parallel(
        "EleutherAI/gpt-neo-2.7B",
        tensor_parallel_size=2,
        pipeline_parallel_size=2,
        deployment=True  # <-- new feature !
    )
    

    You can easily use deployment launcher by deployment=True. Please refer to USAGE.md for more details.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Dec 22, 2021)

  • v1.0(Dec 21, 2021)


    O S L O

    Open Source framework for Large-scale transformer Optimization

    GitHub release Apache 2.0 Docs Issues



    What's New:

    What is OSLO about?

    OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

    Installation

    OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

    pip install oslo-core
    

    Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

    • If the C++ is available:
    CPP_AVAILABLE=1 pip install oslo-core
    
    • If the C++ is not available:
    CPP_AVAILABLE=0 pip install oslo-core
    

    Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

    Key Features

    import deepspeed 
    from oslo import GPTJForCausalLM
    
    # 1. 3D Parallelism
    model = GPTJForCausalLM.from_pretrained_with_parallel(
        "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
    )
    
    # 2. Kernel Fusion
    model = model.fuse()
    
    # 3. DeepSpeed Support
    engines = deepspeed.initialize(
        model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
    )
    
    # 4. Data Processing
    from oslo import (
        DatasetPreprocessor, 
        DatasetBlender, 
        DatasetForCausalLM, 
        ...    
    )
    

    OSLO offers the following features.

    • 3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
    • Kernel Fusion: A GPU optimization method to increase training and inference speed.
    • DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
    • Data Processing: Various utilities for efficient large-scale data processing.

    See USAGE.md to learn how to use them.

    Administrative Notes

    Citing OSLO

    If you find our work useful, please consider citing:

    @misc{oslo,
      author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
      title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
      howpublished = {\url{https://github.com/tunib-ai/oslo}},
      year         = {2021},
    }
    

    Licensing

    The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

    Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

    Acknowledgements

    The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

    Source code(tar.gz)
    Source code(zip)
Owner
TUNiB
TUNiB Inc.
TUNiB
DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

DPT This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and model

CASIA-IVA-Lab 111 Dec 21, 2022
Python project to take sound as input and output as RGB + Brightness values suitable for DMX

sound-to-light Python project to take sound as input and output as RGB + Brightness values suitable for DMX Current goals: Get one pixel working: Vary

Bobby Cox 1 Nov 17, 2021
OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

TUNiB 280 Nov 24, 2022
Machine learning Bot detection technique, based on United States election dataset

Machine learning Bot detection technique, based on United States election dataset (2020). Current github repo provides implementation described in pap

Alexander Shevtsov 4 Nov 20, 2022
Compare GAN code.

Compare GAN This repository offers TensorFlow implementations for many components related to Generative Adversarial Networks: losses (such non-saturat

Google 1.8k Jan 05, 2023
This is the code of paper ``Contrastive Coding for Active Learning under Class Distribution Mismatch'' with python.

Contrastive Coding for Active Learning under Class Distribution Mismatch Official PyTorch implementation of ["Contrastive Coding for Active Learning u

21 Dec 22, 2022
FFTNet vocoder implementation

Unofficial Implementation of FFTNet vocode paper. implement the model. implement tests. overfit on a single batch (sanity check). linearize weights fo

Eren Gölge 81 Dec 08, 2022
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con

10 Dec 04, 2022
The official PyTorch implementation for the paper "sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs".

Magnetic Graph Convolutional Networks About The official PyTorch implementation for the paper sMGC: A Complex-Valued Graph Convolutional Network via M

3 Feb 25, 2022
Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation Exploring Cross-Image Pixel Contrast for Semantic Segmentation, Wenguan Wang, Tianfei Z

Tianfei Zhou 510 Jan 02, 2023
KIDA: Knowledge Inheritance in Data Aggregation

KIDA: Knowledge Inheritance in Data Aggregation This project releases our 1st place solution on NeurIPS2021 ML4CO Dual Task. Slide and model weights a

24 Sep 08, 2022
R-package accompanying the paper "Dynamic Factor Model for Functional Time Series: Identification, Estimation, and Prediction"

dffm The goal of dffm is to provide functionality to apply the methods developed in the paper “Dynamic Factor Model for Functional Time Series: Identi

Sven Otto 3 Dec 09, 2022
A toolkit for controlling Euro Truck Simulator 2 with python to develop self-driving algorithms.

europilot Overview Europilot is an open source project that leverages the popular Euro Truck Simulator(ETS2) to develop self-driving algorithms. A con

1.4k Jan 04, 2023
Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

ASGNet The code is for the paper "Adaptive Prototype Learning and Allocation for Few-Shot Segmentation" (accepted to CVPR 2021) [arxiv] Overview data/

Gen Li 91 Dec 23, 2022
This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

An-Introduction-to-Statistical-Learning This repository contains the exercises and its solution contained in the book An Introduction to Statistical L

2.1k Jan 02, 2023
Video2x - A lossless video/GIF/image upscaler achieved with waifu2x, Anime4K, SRMD and RealSR.

Official Discussion Group (Telegram): https://t.me/video2x A Discord server is also available. Please note that most developers are only on Telegram.

K4YT3X 5.9k Dec 31, 2022
Updated for TTS(CE) = Also Known as TTN V3. The code requires the first server to be 'ttn' protocol.

Updated Updated for TTS(CE) = Also Known as TTN V3. The code requires the first server to be 'ttn' protocol. Introduction This balenaCloud (previously

Remko 1 Oct 17, 2021
TensorFlow for Raspberry Pi

TensorFlow on Raspberry Pi It's officially supported! As of TensorFlow 1.9, Python wheels for TensorFlow are being officially supported. As such, this

Sam Abrahams 2.2k Dec 16, 2022
[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning This repository is the official PyTorch implementation of CORE-Text, a

Jingyang Lin 18 Aug 11, 2022
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

transformer-slt This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

Kayo Yin 107 Dec 27, 2022