OSLO: Open Source framework for Large-scale transformer Optimization

Related tags

Deep Learningoslo
Overview


O S L O

Open Source framework for Large-scale transformer Optimization

GitHub release Apache 2.0 Docs Issues



What's New:

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

  • If the C++ is available:
CPP_AVAILABLE=1 pip install oslo-core
  • If the C++ is not available:
CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed 
from oslo import GPTJForCausalLM

# 1. 3D Parallelism
model = GPTJForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
)

# 2. Kernel Fusion
model = model.fuse()

# 3. DeepSpeed Support
engines = deepspeed.initialize(
    model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
)

# 4. Data Processing
from oslo import (
    DatasetPreprocessor, 
    DatasetBlender, 
    DatasetForCausalLM, 
    ...    
)

OSLO offers the following features.

  • 3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
  • Kernel Fusion: A GPU optimization method to increase training and inference speed.
  • DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
  • Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo,
  author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
  title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
  howpublished = {\url{https://github.com/tunib-ai/oslo}},
  year         = {2021},
}

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

Comments
  • [WIP] Implement ZeRO Stage 3 (FSDP)

    [WIP] Implement ZeRO Stage 3 (FSDP)

    Title

    • Implement ZeRO Stage 3 (FullyShardedDataParallel)

    Description

    • [x] Add reduce_scatter_bucketer.py
      • [x] Add test_reduce_scatter_bucketer.py
    • [x] Add flatten_params_wrapper.py
      • [x] Add test_flatten_params_wrapper.py
    • [x] Add containers.py
      • [x] Add test_containers.py
    • [x] Add parallel.py
      • [x] Add test_parallel.py
    • [x] Add fsdp_optim_utils.py
    • [x] Update fsdp.py
    • [x] Add auto_wrap.py
      • [x] Add test_wrap.py
    opened by jinok2im 9
  • FusedAdam & CPUAdam

    FusedAdam & CPUAdam

    Title

    -FusedAdam & CPUAdam

    Description

    • Implement FusedAdam & CPUAdam

    Tasks

    • [x] Implement FusedAdam
    • [x] implement CPUAdam
    • [x] Test FusedAdam
    • [x] Test CPUAdam
    • [x] Test FusedSclaeMaskSoftmax (Name changed)
    opened by cozytk 6
  • [WIP] Add data processing modules referring to the lassl

    [WIP] Add data processing modules referring to the lassl

    Title

    • add data processing modules referring to the lassl

    Description

    • brought data processing functions that fit gpt2 with reference to lassl

    Linked Issues

    • None
    opened by gimmaru 6
  • Implementation of Sequential Parallelism

    Implementation of Sequential Parallelism

    SP with DP implementation

    • Implemented SP wrapper with DP

    Description

    • SequenceDataParallel works like native torch DDP with SP
    • you can find details in the file oslo/tests/torch/nn/parallal/data_parallel/test_sp.py
    opened by ohwi 5
  • Update data collators and Add models

    Update data collators and Add models

    Title

    • Update data collators and Add models

    Description

    • Updated data collators to utilize sequence parallel in Oslo trainer
    • Add models by referring to the transformers library
    opened by gimmaru 3
  • Implement Expert Parallel and Test for Initialization and Forward Pass

    Implement Expert Parallel and Test for Initialization and Forward Pass

    Title

    • Implement Expert Parallel and Test for Initialization and Forward Pass

    Description

    • Implement Wrapper, Modules and Features for Expert Parallel
    • Implement mapping_utils._ParallelMappingForHuggingFace as super class of _TensorParallelMappingForHuggingFace and _ExpertParallelMappingForHuggingFace
    • Test initialization and forward pass for expert parallel
    opened by scsc0511 3
  • Integrate Sequence Parallelism branches

    Integrate Sequence Parallelism branches

    Title

    • Sequence parallelism (feat. @reniew, @ohwi, @l-yohai)

    Description

    • This PR is Integration of SP current version. But there is something wrong.
    • We will fix the bugs for the coming week and write test modules according to the SP design.
    • It did not include the contents of the branch that worked for the test.
    opened by l-yohai 3
  • implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers

    implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers

    • implement tp-3d wrapper
    • rank transpose problem (tensor_3d_input_rank <-> tensor_3d_output_rank) by implementing ranking transpose function.
    • revise tp-3d layers for huggingface compatibility
    • implement tp-3d test codes
    • refactor all tp test codes
    • unify format across all tensor parallel modules.
    opened by bzantium 2
  • Refactoring MultiheadAttention with todo anchors

    Refactoring MultiheadAttention with todo anchors

    Title

    • Refactoring MultiheadAttention with todo anchors

    Description

    • Refactoring oslo/torch/nn/modules/functional/multi_head_attention_forward.py.
    • Remove unnecessary or unintended code and clean up annotations.
    • Unify return format and the variable name with native torch.

    Additionally, I need to test attention_mask. However, it seems that it can proceed with this part after FusedScaleMaskSoftmax is integrated.

    cc. @hyunwoongko @ohwi

    opened by l-yohai 2
  • Add tp-1d layers testing

    Add tp-1d layers testing

    • Add testing for tp-1d layers: col_linear, row_linear, vocab_embedding_1d
    • modify number to integer variable like summa_dim, world_size cc: @hyunwoongko
    opened by bzantium 2
  • [WIP] add test code of sp training

    [WIP] add test code of sp training

    Title

    • SP Model Test Code

    Description

    Writing a test code to verify that the gradient and loss values of the model are the same when the sequence parallelism is applied.

    • WIP - merging @ohwi 's test code comparing SP of ColossalAI and simple learning model.
    opened by l-yohai 2
Releases(v2.0.2)
  • v2.0.2(Aug 25, 2022)

  • v2.0.1(Feb 20, 2022)

  • v2.0.0(Feb 14, 2022)

    Official release of OSLO 2.0.0 🎉🎉

    This version of OSLO provides the following features:

    • Tensor model parallelism
    • Efficient activation checkpointing
    • Kernel fusion

    We plan to add the pipeline model parallelism and the ZeRO optimization in the next versions.


    New feature: Kernel Fusion

    {
      "kernel_fusion": {
        "enable": "bool",
        "memory_efficient_fusion": "bool",
        "custom_cuda_kernels": "list"
      }
    }
    

    For more information, please check the kernel fusion tutorial

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0a2(Feb 2, 2022)

  • v2.0.0a1(Feb 2, 2022)

    Add activation checkpointing

    You can use efficient activation checkpointing using OSLO with the following configuration.

    model = oslo.initialize(
        model,
        config={
            "model_parallelism": {
                "enable": True,
                "tensor_parallel_size": YOUR_TENSOR_PARALLEL_SIZE,
            },
            "activation_checkpointing": {
                "enable": True,
                "cpu_checkpointing": True,
                "partitioned_checkpointing": True,
                "contiguous_checkpointing": True,
            },
        },
    )
    

    Tutorial: https://tunib-ai.github.io/oslo/TUTORIALS/activation_checkpointing.html

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0a0(Jan 30, 2022)

    New API

    • We paid homage to DeepSpeed. Now it's easier and simpler to use.
    import oslo
    
    model = oslo.initialize(model, config="oslo-config.json")
    

    Add new models

    • Albert
    • Bert
    • Bart
    • T5
    • GPT2
    • GPTNeo
    • GPTJ
    • Electra
    • Roberta

    Add document

    • https://tunib-ai.github.io/oslo

    Remove old pipeline parallelism, kernel fusion code

    • We'll refurbish them using the latest methods
      • Kernel fusion: AOTAutograd
      • Pipeline parallelism: Sagemaker PP
    Source code(tar.gz)
    Source code(zip)
  • v.1.1.2(Jan 15, 2022)

    Updates

    [#7] Selective Kernel Fusion [#9] Fix argument bug

    New Feature: Selective Kernel Fusion

    Since version 1.1.2, you can fuse only partial kernels, not all kernels. Currently, only Attention class and MLP class are supported.

    from oslo import GPT2MLP, GPT2Attention
    
    # MLP only fusion
    model.fuse([GPT2MLP])
    
    # Attention only fusion
    model.fuse([GPT2Attention])
    
    # MLP + Attention fusion
    model.fuse([GPT2MLP, GPT2Attention])
    
    Source code(tar.gz)
    Source code(zip)
  • v1.1(Dec 29, 2021)

    [#3] Add deployment launcher of Parallelformers into OSLO.

    from oslo import GPTNeoForCausalLM
    
    model = GPTNeoForCausalLM.from_pretrained_with_parallel(
        "EleutherAI/gpt-neo-2.7B",
        tensor_parallel_size=2,
        pipeline_parallel_size=2,
        deployment=True  # <-- new feature !
    )
    

    You can easily use deployment launcher by deployment=True. Please refer to USAGE.md for more details.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Dec 22, 2021)

  • v1.0(Dec 21, 2021)


    O S L O

    Open Source framework for Large-scale transformer Optimization

    GitHub release Apache 2.0 Docs Issues



    What's New:

    What is OSLO about?

    OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

    Installation

    OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

    pip install oslo-core
    

    Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

    • If the C++ is available:
    CPP_AVAILABLE=1 pip install oslo-core
    
    • If the C++ is not available:
    CPP_AVAILABLE=0 pip install oslo-core
    

    Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

    Key Features

    import deepspeed 
    from oslo import GPTJForCausalLM
    
    # 1. 3D Parallelism
    model = GPTJForCausalLM.from_pretrained_with_parallel(
        "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
    )
    
    # 2. Kernel Fusion
    model = model.fuse()
    
    # 3. DeepSpeed Support
    engines = deepspeed.initialize(
        model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
    )
    
    # 4. Data Processing
    from oslo import (
        DatasetPreprocessor, 
        DatasetBlender, 
        DatasetForCausalLM, 
        ...    
    )
    

    OSLO offers the following features.

    • 3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
    • Kernel Fusion: A GPU optimization method to increase training and inference speed.
    • DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
    • Data Processing: Various utilities for efficient large-scale data processing.

    See USAGE.md to learn how to use them.

    Administrative Notes

    Citing OSLO

    If you find our work useful, please consider citing:

    @misc{oslo,
      author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
      title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
      howpublished = {\url{https://github.com/tunib-ai/oslo}},
      year         = {2021},
    }
    

    Licensing

    The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

    Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

    Acknowledgements

    The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

    Source code(tar.gz)
    Source code(zip)
Owner
TUNiB
TUNiB Inc.
TUNiB
Multi-view 3D reconstruction using neural rendering. Unofficial implementation of UNISURF, VolSDF, NeuS and more.

Volume rendering + 3D implicit surface Showcase What? previous: surface rendering; now: volume rendering previous: NeRF's volume density; now: implici

Jianfei Guo 682 Jan 04, 2023
LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision Project | Arxiv | Abstract It is very challenging for various visual tasks such as image

CVSM Group - email: <a href=[email protected]"> 377 Jan 07, 2023
Research code for the paper "Variational Gibbs inference for statistical estimation from incomplete data".

Variational Gibbs inference (VGI) This repository contains the research code for Simkus, V., Rhodes, B., Gutmann, M. U., 2021. Variational Gibbs infer

Vaidotas Šimkus 1 Apr 08, 2022
Breaking the Dilemma of Medical Image-to-image Translation

Breaking the Dilemma of Medical Image-to-image Translation Supervised Pix2Pix and unsupervised Cycle-consistency are two modes that dominate the field

Kid Liet 86 Dec 21, 2022
The AugNet Python module contains functions for the fast computation of image similarity.

AugNet AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation arxiv link In our work, we propose AugNet, a new deep le

Ming 74 Dec 28, 2022
Deepfake Scanner by Deepware.

Deepware Scanner (CLI) This repository contains the command-line deepfake scanner tool with the pre-trained models that are currently used at deepware

deepware 110 Jan 02, 2023
Flexible-CLmser: Regularized Feedback Connections for Biomedical Image Segmentation

Flexible-CLmser: Regularized Feedback Connections for Biomedical Image Segmentation The skip connections in U-Net pass features from the levels of enc

Boheng Cao 1 Dec 29, 2021
ShapeGlot: Learning Language for Shape Differentiation

ShapeGlot: Learning Language for Shape Differentiation Created by Panos Achlioptas, Judy Fan, Robert X.D. Hawkins, Noah D. Goodman, Leonidas J. Guibas

Panos 32 Dec 23, 2022
KIDA: Knowledge Inheritance in Data Aggregation

KIDA: Knowledge Inheritance in Data Aggregation This project releases our 1st place solution on NeurIPS2021 ML4CO Dual Task. Slide and model weights a

24 Sep 08, 2022
Official repository for "On Improving Adversarial Transferability of Vision Transformers" (2021)

Improving-Adversarial-Transferability-of-Vision-Transformers Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Khan, Fatih Porikli arxiv link A

Muzammal Naseer 47 Dec 02, 2022
Performant, differentiable reinforcement learning

deluca Performant, differentiable reinforcement learning Notes This is pre-alpha software and is undergoing a number of core changes. Updates to follo

Google 114 Dec 27, 2022
Code for Generating Disentangled Arguments with Prompts: A Simple Event Extraction Framework that Works

GDAP Code for Generating Disentangled Arguments with Prompts: A Simple Event Extraction Framework that Works Environment Python (verified: v3.8) CUDA

45 Oct 29, 2022
For visualizing the dair-v2x-i dataset

3D Detection & Tracking Viewer The project is based on hailanyi/3D-Detection-Tracking-Viewer and is modified, you can find the original version of the

34 Dec 29, 2022
Orthogonal Over-Parameterized Training

The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great impo

Weiyang Liu 11 Apr 18, 2022
Chinese named entity recognization with BiLSTM using Keras

Chinese named entity recognization (Bilstm with Keras) Project Structure ./ ├── README.md ├── data │   ├── README.md │   ├── data 数据集 │   │   ├─

1 Dec 17, 2021
Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project

Semantic Code Search Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project. The model

Chen Wu 24 Nov 29, 2022
A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation

A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation

196 Jan 05, 2023
a pytorch implementation of auto-punctuation learned character by character

Learning Auto-Punctuation by Reading Engadget Articles Link to Other of my work 🌟 Deep Learning Notes: A collection of my notes going from basic mult

Ge Yang 137 Nov 09, 2022
Research on Tabular Deep Learning (Python package & papers)

Research on Tabular Deep Learning For paper implementations, see the section "Papers and projects". rtdl is a PyTorch-based package providing a user-f

Yura Gorishniy 510 Dec 30, 2022