TorchAcc

TorchAcc is an AI training acceleration framework developed by Alibaba Cloud’s PAI team.

TorchAcc is built on PyTorch/XLA and provides an easy-to-use interface to accelerate the training of PyTorch models. At the same time, TorchAcc has implemented extensive optimizations for distributed training, memory management, and computation specifically for GPUs, ultimately achieving improved ease of use, better GPU training performance, and enhanced scalability for distributed training.

Documentation

Highlighted Features

Rich distributed parallelism strategies
- Data Parallelism
- Fully Sharded Data Parallelism
- Tensor Parallelism
- Pipeline Parallelism
- Context Parallelism
  - Ulysess
  - Ring Attention
  - FlashSequence (2D Sequence Parallelism)
Memory efficient
High Performance
Easy-to-use API

You can accelerate your transformer models with just a few lines of code using TorchAcc.

Architecture Overview

The main goal of TorchAcc is to provide a high-performance AI training framework. It utilizes IR abstractions at different layers and employs static graph compilation optimization like XLA and dynamic graph compilation optimization like BladeDISC, as well as distributed optimization techniques, to offer a comprehensive end-to-end optimization solution from the underlying operators to the upper-level models.

Installation

Docker

sudo docker run  --gpus all --net host --ipc host --shm-size 10G -it --rm --cap-add=SYS_PTRACE registry.cn-hangzhou.aliyuncs.com/pai-dlc/acc:r2.3.0-cuda12.1.0-py3.10 bash

Build from source

see the contribution guide.

Getting Started

We present a straightforward example for training a Transformer model using TorchAcc, illustrating the usage of the TorchAcc API. You can quickly initiate training a Transformer model with TorchAcc by executing the following command:

torchrun --nproc_per_node=4 benchmarks/transformer.py --bf16 --acc --disable_loss_print --fsdp_size=4 --gc

LLMs training examples

Utilizing HuggingFace Transformers

If you are familiar with HuggingFace Transformers's Trainer, you can easily accelerate a Transformer model using TorchAcc, see the huggingface transformers

LLMs training acceleration with FlashModels

If you want to try the latest features of Torchacc or want to use the TorchAcc interface more flexibly for model acceleration, you can use our LLM acceleration library, FlashModels. FlashModels integrates various distributed implementations of commonly used open-source LLMs and provides a wealth of examples and benchmarks.

https://github.com/AlibabaPAI/FlashModels

SFT using modelscope/swift

coming soon..

Contributing

see the contribution guide.

Contact Us

You can contact us by adding our DingTalk group:

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docker		docker
docs		docs
examples		examples
requirements		requirements
tests		tests
torchacc		torchacc
.gitignore		.gitignore
.style.yapf		.style.yapf
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TorchAcc

Highlighted Features

Architecture Overview

Installation

Docker

Build from source

Getting Started

LLMs training examples

Utilizing HuggingFace Transformers

LLMs training acceleration with FlashModels

SFT using modelscope/swift

Contributing

Contact Us

License

About

Releases

Packages

Contributors 5

Languages

License

AlibabaPAI/torchacc

Folders and files

Latest commit

History

Repository files navigation

TorchAcc

Highlighted Features

Architecture Overview

Installation

Docker

Build from source

Getting Started

LLMs training examples

Utilizing HuggingFace Transformers

LLMs training acceleration with FlashModels

SFT using modelscope/swift

Contributing

Contact Us

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages