

BioMegatron: Larger Biomedical Domain Language Model.We developed efficient, model-parallel ( tensor, sequence, and pipeline), and multi-node pre-training of transformer based models such as GPT, BERT, and T5 using mixed precision.īelow are some of the projects where we have directly used Megatron: This repository is for ongoing research on training large transformer language models at scale. Megatron ( 1, 2, and 3) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. examples_deepspeed/ folder which includes examples that have DeepSpeed technologies integrated and are tested by DeepSpeed team. All of them do NOT have DeepSpeed technologies integrations, and some of them may not work due to changes in this Megatron-DeepSpeed repo.

Note that examples mentioned below are from the original NVIDIA/Megatron-LM repo. HPC clusters) or Azure VM based environment, please refer to the bash scripts in the examples_deepspeed/azure folder. If you have a custom infrastructure (e.g. We strongly recommend to start with AzureML recipe in the examples_deepspeed/azureml folder. To try out DeepSpeed on Azure, this fork of Megatron offers easy-to-use recipes and bash scripts. We do not plan to continue supporting the version before sync. This backup branch is just for comparison tests and for temporary use when you need to debug the main branch. We also created a backup branch which is the version before this sync. Given the amount of merged commits, bugs can happen in the cases that we haven't tested, and your contribution (bug report, bug fix pull request) is highly welcomed. Details can be found in the examples_deepspeed/rebase folder. In July 2023, we had a sync with the NVIDIA/Megatron-LM repo (where this repo is forked from) by git-merging 1100+ commits.

The examples_deepspeed/ folder includes example scripts about the features supported by DeepSpeed.
