Pylomin

Pylomin (PYtorch LOw-Memory INference) is a deep learning optimization library for low-memory inferencing in PyTorch.

Motivation

The scale of deep learning models has grown exponentially in recent years, which has greatly increased the difficulty of product deployment.

Image source: Microsoft Research Blog

The goal of this library is to enable low-cost deployment of deep learning models:

Extremely low memory requirement
- For example, we can reduce the peak memory requirement for the inference of a BERT-like model (with 1.6 GiB parameters) to 46 MiB.
Minimize memory requirements while maintaining the model throughput
- Eliminate the time waiting for parameters to load by prefetching (under development)
- TODO: add a number here after development

Peak memory is the maximum amount of memory needed to store model parameters and hidden states at any time during the model inference.

Installation

pylomin$ python3 -m pip install -e .

Getting Started

1. Lazy-loading

Load model parameters only when needed and release them immediately after use.

model = pylomin.lazy_loading(model)

Or provide a list of target_classes or target_modules to be converted to lazy-loading mode. In addition, when using target_classes, you can also provide a list of modules to be skipped.

# Use target_classes
model = pylomin.lazy_loading(model, target_classes=[nn.Linear, nn.Embedding],
                             skip_modules=[model.embeddings.word_embeddings])

# Use target_modules
target_modules = [module for module in model.modules() if some_condition]
model = pylomin.lazy_loading(model, target_modules=target_modules)

2. Chunked-embedding

Attempts to split an torch.nn.Embedding layer into multiple chunks with each has num_embeddings equal to chunk_size, except the last one.

model = pylomin.chunked_embedding(model,
                                  target_module_name='embeddings.word_embeddings',
                                  chunk_size=2048)

Examples

See examples/.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
examples		examples
src/pylomin		src/pylomin
tests		tests
.deepsource.toml		.deepsource.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

License

siahuat0727/pylomin

Folders and files

Latest commit

History

Repository files navigation

Pylomin

Motivation

Installation

Getting Started

1. Lazy-loading

2. Chunked-embedding

Examples

About

Resources

License

Stars

Watchers

Forks

Languages