Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecating ray-ml images #46378

Open
aslonnie opened this issue Jul 2, 2024 · 0 comments
Open

Deprecating ray-ml images #46378

aslonnie opened this issue Jul 2, 2024 · 0 comments
Labels
RFC RFC issues

Comments

@aslonnie
Copy link
Collaborator

aslonnie commented Jul 2, 2024

Hi Ray community,

We are deprecating rayproject/ray-ml container images:
https://hub.docker.com/r/rayproject/ray-ml/tags

Starting from Ray version 2.31.0 :

  • The images will be tagged with .deprecated prefixes (such as 2.31.0.deprecated- )
  • We will stop updating the latest tags.
  • We will not build ray-ml container images for Python 3.12
  • We might remove Python packages from future ray-ml (deprecated) images without notice.
  • We will try to keep building it as much as we can, but we might stop publishing ray-ml (deprecated) images without notice.

But why?

In the past, we build and release ray-ml images as a convenient way for people to run Machine Learning related Python packages in a Ray environment. In these images, put in around 200+ additional Python packages, including PyTorch, TensorFlow, JAX, XGBoost, Dask, and many many other ones that Ray can work with. Installing all these packages in one image have several drawbacks:

  • Most Machine Learning applications only use a small subset of the packages. For example, a Ray app rarely uses torch and tensorflow at the same time.
  • They increase the size of the container image by around 5GiB. For context, a typical GPU ray-ml image is around 10GiB (compressed layers): Ray and its system dependencies are no more than 1GiB, Nvidia CUDA (devel version) SDK is around 4GiB, and all the ML packages are around 5GiB.
  • As a result of the image size, the image takes longer time and becomes harder to load. When being used in a cluster, it makes the cluster take longer to launch and slower to scale up.
  • More importantly, as time goes by and new versions of these libraries get released, compiling these packages together without dependency conflicts almost becomes an impossible art.
  • And over time, most serious users will learn the limitations of ray-ml images, stop using it and build on top of ray directly.

Therefore, to make ray images load faster, and to allow ray to work better with newer versions of Python interpreter and other machine learning libraries, we will stop recommending or supporting ray-ml as an "all-in-one" solution.

What should we use then?

The release and publishing of rayproject/ray container images will remain unchanged.
https://hub.docker.com/r/rayproject/ray/tags

You can pip install Python packages on top of rayproject/ray. A simple Dockerfile example:

FROM rayproject/ray:2.31.0-py310-gpu
RUN pip install torch

Which will install the latest PyTorch on top of rayproject/ray:2.31.0-py310-gpu.

If it is to install a package that Ray supports working with, you can also use the constraint file that comes with the image, to install the exact library versions that we tested against during Ray's release process:

FROM rayproject/ray:2.31.0-py310-gpu
RUN pip install torch -c /home/ray/requirements_compiled.txt

This will result in much smaller images that are much faster to load than using ray-ml images.


Please comment and let me know if any questions.

@aslonnie aslonnie added the RFC RFC issues label Jul 2, 2024
@aslonnie aslonnie pinned this issue Jul 2, 2024
@aslonnie aslonnie changed the title Deprecating ray-ml container image Deprecating ray-ml images Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC RFC issues
Projects
None yet
Development

No branches or pull requests

1 participant