Skip to content

EleutherAI/monkfish

 
 

Repository files navigation

Monkfish: Distributed latent video model training on TPUs (and other stuff maybe)

This is the training code for a 2 stage autoregressive video model.

TODO:

  • Chunked scatter/gather/init functions
  • Parallel model save/load
  • Dtype conversions at scatter/gather/init functions
  • Distributed data loading
  • Distributed model training
  • Multi-platform file backend via PyFilesystem2
  • GPU Support
  • SLURM Support
  • Kubernetes Support
  • Text conditional diffusion Transformer
  • (5/6)-D parallelism
    • FSDP
    • Ring attention
    • Pipeline parallelism
    • Async swarm
  • Llama 3 support
  • Sophisticated logging (Logfire/SQL database)

References For Developers

Parameter scaling:

Jax sharding:

Data loader Design:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Shell 0.8%