OSLO: Open Source framework for Large-scale transformer Optimization

Related tags

Deep Learningoslo
Overview


O S L O

Open Source framework for Large-scale transformer Optimization

GitHub release Apache 2.0 Docs Issues



What's New:

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

  • If the C++ is available:
CPP_AVAILABLE=1 pip install oslo-core
  • If the C++ is not available:
CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed 
from oslo import GPTJForCausalLM

# 1. 3D Parallelism
model = GPTJForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
)

# 2. Kernel Fusion
model = model.fuse()

# 3. DeepSpeed Support
engines = deepspeed.initialize(
    model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
)

# 4. Data Processing
from oslo import (
    DatasetPreprocessor, 
    DatasetBlender, 
    DatasetForCausalLM, 
    ...    
)

OSLO offers the following features.

  • 3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
  • Kernel Fusion: A GPU optimization method to increase training and inference speed.
  • DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
  • Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo,
  author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
  title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
  howpublished = {\url{https://github.com/tunib-ai/oslo}},
  year         = {2021},
}

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

Comments
  • [WIP] Implement ZeRO Stage 3 (FSDP)

    [WIP] Implement ZeRO Stage 3 (FSDP)

    Title

    • Implement ZeRO Stage 3 (FullyShardedDataParallel)

    Description

    • [x] Add reduce_scatter_bucketer.py
      • [x] Add test_reduce_scatter_bucketer.py
    • [x] Add flatten_params_wrapper.py
      • [x] Add test_flatten_params_wrapper.py
    • [x] Add containers.py
      • [x] Add test_containers.py
    • [x] Add parallel.py
      • [x] Add test_parallel.py
    • [x] Add fsdp_optim_utils.py
    • [x] Update fsdp.py
    • [x] Add auto_wrap.py
      • [x] Add test_wrap.py
    opened by jinok2im 9
  • FusedAdam & CPUAdam

    FusedAdam & CPUAdam

    Title

    -FusedAdam & CPUAdam

    Description

    • Implement FusedAdam & CPUAdam

    Tasks

    • [x] Implement FusedAdam
    • [x] implement CPUAdam
    • [x] Test FusedAdam
    • [x] Test CPUAdam
    • [x] Test FusedSclaeMaskSoftmax (Name changed)
    opened by cozytk 6
  • [WIP] Add data processing modules referring to the lassl

    [WIP] Add data processing modules referring to the lassl

    Title

    • add data processing modules referring to the lassl

    Description

    • brought data processing functions that fit gpt2 with reference to lassl

    Linked Issues

    • None
    opened by gimmaru 6
  • Implementation of Sequential Parallelism

    Implementation of Sequential Parallelism

    SP with DP implementation

    • Implemented SP wrapper with DP

    Description

    • SequenceDataParallel works like native torch DDP with SP
    • you can find details in the file oslo/tests/torch/nn/parallal/data_parallel/test_sp.py
    opened by ohwi 5
  • Update data collators and Add models

    Update data collators and Add models

    Title

    • Update data collators and Add models

    Description

    • Updated data collators to utilize sequence parallel in Oslo trainer
    • Add models by referring to the transformers library
    opened by gimmaru 3
  • Implement Expert Parallel and Test for Initialization and Forward Pass

    Implement Expert Parallel and Test for Initialization and Forward Pass

    Title

    • Implement Expert Parallel and Test for Initialization and Forward Pass

    Description

    • Implement Wrapper, Modules and Features for Expert Parallel
    • Implement mapping_utils._ParallelMappingForHuggingFace as super class of _TensorParallelMappingForHuggingFace and _ExpertParallelMappingForHuggingFace
    • Test initialization and forward pass for expert parallel
    opened by scsc0511 3
  • Integrate Sequence Parallelism branches

    Integrate Sequence Parallelism branches

    Title

    • Sequence parallelism (feat. @reniew, @ohwi, @l-yohai)

    Description

    • This PR is Integration of SP current version. But there is something wrong.
    • We will fix the bugs for the coming week and write test modules according to the SP design.
    • It did not include the contents of the branch that worked for the test.
    opened by l-yohai 3
  • implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers

    implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers

    • implement tp-3d wrapper
    • rank transpose problem (tensor_3d_input_rank <-> tensor_3d_output_rank) by implementing ranking transpose function.
    • revise tp-3d layers for huggingface compatibility
    • implement tp-3d test codes
    • refactor all tp test codes
    • unify format across all tensor parallel modules.
    opened by bzantium 2
  • Refactoring MultiheadAttention with todo anchors

    Refactoring MultiheadAttention with todo anchors

    Title

    • Refactoring MultiheadAttention with todo anchors

    Description

    • Refactoring oslo/torch/nn/modules/functional/multi_head_attention_forward.py.
    • Remove unnecessary or unintended code and clean up annotations.
    • Unify return format and the variable name with native torch.

    Additionally, I need to test attention_mask. However, it seems that it can proceed with this part after FusedScaleMaskSoftmax is integrated.

    cc. @hyunwoongko @ohwi

    opened by l-yohai 2
  • Add tp-1d layers testing

    Add tp-1d layers testing

    • Add testing for tp-1d layers: col_linear, row_linear, vocab_embedding_1d
    • modify number to integer variable like summa_dim, world_size cc: @hyunwoongko
    opened by bzantium 2
  • [WIP] add test code of sp training

    [WIP] add test code of sp training

    Title

    • SP Model Test Code

    Description

    Writing a test code to verify that the gradient and loss values of the model are the same when the sequence parallelism is applied.

    • WIP - merging @ohwi 's test code comparing SP of ColossalAI and simple learning model.
    opened by l-yohai 2
Releases(v2.0.2)
  • v2.0.2(Aug 25, 2022)

  • v2.0.1(Feb 20, 2022)

  • v2.0.0(Feb 14, 2022)

    Official release of OSLO 2.0.0 🎉🎉

    This version of OSLO provides the following features:

    • Tensor model parallelism
    • Efficient activation checkpointing
    • Kernel fusion

    We plan to add the pipeline model parallelism and the ZeRO optimization in the next versions.


    New feature: Kernel Fusion

    {
      "kernel_fusion": {
        "enable": "bool",
        "memory_efficient_fusion": "bool",
        "custom_cuda_kernels": "list"
      }
    }
    

    For more information, please check the kernel fusion tutorial

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0a2(Feb 2, 2022)

  • v2.0.0a1(Feb 2, 2022)

    Add activation checkpointing

    You can use efficient activation checkpointing using OSLO with the following configuration.

    model = oslo.initialize(
        model,
        config={
            "model_parallelism": {
                "enable": True,
                "tensor_parallel_size": YOUR_TENSOR_PARALLEL_SIZE,
            },
            "activation_checkpointing": {
                "enable": True,
                "cpu_checkpointing": True,
                "partitioned_checkpointing": True,
                "contiguous_checkpointing": True,
            },
        },
    )
    

    Tutorial: https://tunib-ai.github.io/oslo/TUTORIALS/activation_checkpointing.html

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0a0(Jan 30, 2022)

    New API

    • We paid homage to DeepSpeed. Now it's easier and simpler to use.
    import oslo
    
    model = oslo.initialize(model, config="oslo-config.json")
    

    Add new models

    • Albert
    • Bert
    • Bart
    • T5
    • GPT2
    • GPTNeo
    • GPTJ
    • Electra
    • Roberta

    Add document

    • https://tunib-ai.github.io/oslo

    Remove old pipeline parallelism, kernel fusion code

    • We'll refurbish them using the latest methods
      • Kernel fusion: AOTAutograd
      • Pipeline parallelism: Sagemaker PP
    Source code(tar.gz)
    Source code(zip)
  • v.1.1.2(Jan 15, 2022)

    Updates

    [#7] Selective Kernel Fusion [#9] Fix argument bug

    New Feature: Selective Kernel Fusion

    Since version 1.1.2, you can fuse only partial kernels, not all kernels. Currently, only Attention class and MLP class are supported.

    from oslo import GPT2MLP, GPT2Attention
    
    # MLP only fusion
    model.fuse([GPT2MLP])
    
    # Attention only fusion
    model.fuse([GPT2Attention])
    
    # MLP + Attention fusion
    model.fuse([GPT2MLP, GPT2Attention])
    
    Source code(tar.gz)
    Source code(zip)
  • v1.1(Dec 29, 2021)

    [#3] Add deployment launcher of Parallelformers into OSLO.

    from oslo import GPTNeoForCausalLM
    
    model = GPTNeoForCausalLM.from_pretrained_with_parallel(
        "EleutherAI/gpt-neo-2.7B",
        tensor_parallel_size=2,
        pipeline_parallel_size=2,
        deployment=True  # <-- new feature !
    )
    

    You can easily use deployment launcher by deployment=True. Please refer to USAGE.md for more details.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Dec 22, 2021)

  • v1.0(Dec 21, 2021)


    O S L O

    Open Source framework for Large-scale transformer Optimization

    GitHub release Apache 2.0 Docs Issues



    What's New:

    What is OSLO about?

    OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

    Installation

    OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

    pip install oslo-core
    

    Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

    • If the C++ is available:
    CPP_AVAILABLE=1 pip install oslo-core
    
    • If the C++ is not available:
    CPP_AVAILABLE=0 pip install oslo-core
    

    Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

    Key Features

    import deepspeed 
    from oslo import GPTJForCausalLM
    
    # 1. 3D Parallelism
    model = GPTJForCausalLM.from_pretrained_with_parallel(
        "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
    )
    
    # 2. Kernel Fusion
    model = model.fuse()
    
    # 3. DeepSpeed Support
    engines = deepspeed.initialize(
        model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
    )
    
    # 4. Data Processing
    from oslo import (
        DatasetPreprocessor, 
        DatasetBlender, 
        DatasetForCausalLM, 
        ...    
    )
    

    OSLO offers the following features.

    • 3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
    • Kernel Fusion: A GPU optimization method to increase training and inference speed.
    • DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
    • Data Processing: Various utilities for efficient large-scale data processing.

    See USAGE.md to learn how to use them.

    Administrative Notes

    Citing OSLO

    If you find our work useful, please consider citing:

    @misc{oslo,
      author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
      title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
      howpublished = {\url{https://github.com/tunib-ai/oslo}},
      year         = {2021},
    }
    

    Licensing

    The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

    Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

    Acknowledgements

    The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

    Source code(tar.gz)
    Source code(zip)
Owner
TUNiB
TUNiB Inc.
TUNiB
Streamlit tool to explore coco datasets

What is this This tool given a COCO annotations file and COCO predictions file will let you explore your dataset, visualize results and calculate impo

Jakub Cieslik 75 Dec 16, 2022
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation E2EC: An End-to-End Contour-based Method for High-Quality H

zhangtao 146 Dec 29, 2022
Direct design of biquad filter cascades with deep learning by sampling random polynomials.

IIRNet Direct design of biquad filter cascades with deep learning by sampling random polynomials. Usage git clone https://github.com/csteinmetz1/IIRNe

Christian J. Steinmetz 55 Nov 02, 2022
This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Curious Representation Learning for Embodied Intelligence This is the pytorch code for the paper Curious Representation Learning for Embodied Intellig

19 Oct 19, 2022
​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

Microsoft 983 Dec 23, 2022
A lightweight python AUTOmatic-arRAY library.

A lightweight python AUTOmatic-arRAY library. Write numeric code that works for: numpy cupy dask autograd jax mars tensorflow pytorch ... and indeed a

Johnnie Gray 62 Dec 27, 2022
DIT is a DTLS MitM proxy implemented in Python 3. It can intercept, manipulate and suppress datagrams between two DTLS endpoints and supports psk-based and certificate-based authentication schemes (RSA + ECC).

DIT - DTLS Interception Tool DIT is a MitM proxy tool to intercept DTLS traffic. It can intercept, manipulate and/or suppress DTLS datagrams between t

52 Nov 30, 2022
李云龙二次元风格化!打滚卖萌,使用了animeGANv2进行了视频的风格迁移

李云龙二次元风格化!一键star、fork,你也可以生成这样的团长! 打滚卖萌求star求fork! 0.效果展示 视频效果前往B站观看效果最佳:李云龙二次元风格化: github开源repo:李云龙二次元风格化 百度AIstudio开源地址,一键fork即可运行: 李云龙二次元风格化!一键fork

oukohou 44 Dec 04, 2022
Framework for training options with different attention mechanism and using them to solve downstream tasks.

Using Attention in HRL Framework for training options with different attention mechanism and using them to solve downstream tasks. Requirements GPU re

5 Nov 03, 2022
Code for Universal Semi-Supervised Semantic Segmentation models paper accepted in ICCV 2019

USSS_ICCV19 Code for Universal Semi Supervised Semantic Segmentation accepted to ICCV 2019. Full Paper available at https://arxiv.org/abs/1811.10323.

Tarun K 68 Nov 24, 2022
LETR: Line Segment Detection Using Transformers without Edges

LETR: Line Segment Detection Using Transformers without Edges Introduction This repository contains the official code and pretrained models for Line S

mlpc-ucsd 157 Jan 06, 2023
Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data arXiv This is the code base for weakly supervised NER. We provide a

Amazon 92 Jan 04, 2023
DilatedNet in Keras for image segmentation

Keras implementation of DilatedNet for semantic segmentation A native Keras implementation of semantic segmentation according to Multi-Scale Context A

303 Mar 15, 2022
A robust pointcloud registration pipeline based on correlation.

PHASER: A Robust and Correspondence-Free Global Pointcloud Registration Ubuntu 18.04+ROS Melodic: Overview Pointcloud registration using correspondenc

ETHZ ASL 101 Dec 01, 2022
Automatic labeling, conversion of different data set formats, sample size statistics, model cascade

Simple Gadget Collection for Object Detection Tasks Automatic image annotation Conversion between different annotation formats Obtain statistical info

llt 4 Aug 24, 2022
Official Pytorch implementation for video neural representation (NeRV)

NeRV: Neural Representations for Videos (NeurIPS 2021) Project Page | Paper | UVG Data Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, Abhinav S

hao 214 Dec 28, 2022
A script depending on VASP output for calculating Fermi-Softness.

Fermi softness calculation for Vienna Ab initio Simulation Package (VASP) Update 1.1.0: Big update: Rewrote the code. Use Bader atomic division instea

qslin 11 Nov 08, 2022
Robot Hacking Manual (RHM). From robotics to cybersecurity. Papers, notes and writeups from a journey into robot cybersecurity.

RHM: Robot Hacking Manual Download in PDF RHM v0.4 ┃ Read online The Robot Hacking Manual (RHM) is an introductory series about cybersecurity for robo

Víctor Mayoral Vilches 233 Dec 30, 2022
Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems This is our experimental code for RecSys 2021 paper "Learning

11 Jul 28, 2022
INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing Existing studies on semantic parsing focus primarily on mapping a natural-la

7 Aug 22, 2022