Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible image builds for Fornax images #10

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

zoghbi-a
Copy link

This is an attempt to migrate the Fornax images from gitlab. For now these images are for testing only and are not deployed in production. We will ultimately be using these public images.

The docker images are restructured from what we have on gitlab to make the builds reproducible, so building images at different times should produce images with the exact same packages (as much as possible). This is done with the conda-lock package. Reproducible builds will allow us to ultimately to create DOIs for them and cite them.

I am putting this out here to get feedback. This PR contains most of the needed parts, and the content of the images mirrors what we have on fornax (as much as possible). A description of how the builds are done is in the readme file. Part of this structure was borrowed from PANGEO

@zoghbi-a zoghbi-a changed the title Reproducible image builds for Fornax imagex Reproducible image builds for Fornax images Jul 16, 2024
@bsipocz
Copy link
Member

bsipocz commented Jul 16, 2024

It's not exactly related to this PR, but I wonder if you experimented with pixi (https://github.com/prefix-dev/pixi)? It looked compelling during demos/talks at scipy and the high-energy physics folks have good reproducibility experiences with it (and it finally makes conda and pip on parity with each other).

dependencies:
- jupyterlab
- dask
- jupytext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have jupyterlab-myst in here, too?

nest-asyncio==1.6.0
notebook==7.2.0
notebook-shim==0.2.4
numpy==1.26.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point we need to bump this to 2.0, and this migration feels to be a good point for it.

Copy link
Author

@zoghbi-a zoghbi-a Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think the hold back is lsdb notebooks, and lightkurve, which has a some dependency that requires numpy<2. There may be others that I am not aware of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they rely on healpy atm, and also one of the ML notebooks also has a dependency that requires np <2. So I would suggest having a reminder issue about numpy 2, and just roll with the old one for now.

- netpbm
- acstools
- astropy
- astroquery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyvo is pulled in by astroquery, but it would good to spell it out separately as there is direct use of it.

Maybe the policy could be to list everything that is directly imported in the demo notebooks?

@bsipocz
Copy link
Member

bsipocz commented Jul 16, 2024

I know that we call this tractor as it was one of the complicated packages to install, but I wonder, if the base image doesn't really have the astro packages, that we should call this something more descriptive. Or have a bit more of the standard packages available in the base image, too (e.g. pyvo, astroquery, maybe scikit-learn, or some of the other astropy affiliated packages).

@zoghbi-a
Copy link
Author

I know that we call this tractor as it was one of the complicated packages to install, but I wonder, if the base image doesn't really have the astro packages, that we should call this something more descriptive. Or have a bit more of the standard packages available in the base image, too (e.g. pyvo, astroquery, maybe scikit-learn, or some of the other astropy affiliated packages).

The idea of the base image is that it contains the system and jupyter environment. This is not meant to be made available as an image to the end user.

@zoghbi-a
Copy link
Author

zoghbi-a commented Jul 16, 2024

@bsipocz, does github CI supports building these images and say pushing them to dockerhub (assuming we have an account)?

@bsipocz
Copy link
Member

bsipocz commented Jul 17, 2024

I haven't yet done it myself, but it seems to be possible. Happy to help figuring it out if we plan to do it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be more useful for the users to call this image something else rather than tractor. E.g. keep the forced photometry name and drop tractor?
(After all we just called it tractor informally as that was the most complicated component to install, but it has zero descriptive information content for a generic user, especially for one who is not familiar with that library)

@zoghbi-a
Copy link
Author

zoghbi-a commented Aug 1, 2024

It's not exactly related to this PR, but I wonder if you experimented with pixi (https://github.com/prefix-dev/pixi)? It looked compelling during demos/talks at scipy and the high-energy physics folks have good reproducibility experiences with it (and it finally makes conda and pip on parity with each other).

This looks like a cool tool. Are we willing to force users to use this instead of conda directly? From the documentation, pixi uses conda under the hood, but you are discouraged from interacting with it directly.

@bsipocz
Copy link
Member

bsipocz commented Aug 1, 2024

Are we willing to force users to use this instead of conda directly

It promises better user experience, and the usage should not be that different, so I don't think it should be a big issue. (Being said, I'm not a conda user myself so it may not be the best input here. I'll ping the scientists on the team fin case they have strong opinions against trying this)

@bsipocz
Copy link
Member

bsipocz commented Sep 13, 2024

Until I figure out how to deal with nasa-fornax/fornax-demo-notebooks#303, can we have the demo notebook kernel named super genericly? E.g. python3? So it will work out of the box on CI as well as for FSC users.

(See Troy's comment here: nasa-fornax/fornax-demo-notebooks#321)

@zoghbi-a
Copy link
Author

The names conda-root-py comes from nb_conda_kernels, which is used to allow custom environments created by the user to be automatically converted to kernels. That naming is built into it and cannot be changed. I don't like it.
With this PR, the plant is have one environment called notebook. Would that work for what you need?

@bsipocz
Copy link
Member

bsipocz commented Sep 13, 2024

I don't think that would work out of the box either, so far I only saw python3 in practice.

Here is what I got on CI (and of course locally, too): https://app.circleci.com/pipelines/github/nasa-fornax/fornax-demo-notebooks/184/workflows/3b48be23-3bc0-41cc-8e46-95914baa0f3c/jobs/139?invite=true#step-103-273

I have changed some of these back to python3, but in practice, people keep committing back unrelated metadata changes with their content changes, so the situation is not great.
(I've opened a feature request upstream to make this all configurable in the build system, but I don't think it will be addressed any time soon).

@bsipocz
Copy link
Member

bsipocz commented Sep 13, 2024

Re dependencies: I have cleaned up the dependency files for the currently existing notebooks. They are not yet conflicting, so if possible, it would be great to include them in the image.

they are stored with their notebooks, so */requirements*.txt should get them. As well as some site-building dependencies in site-requirements.txt, but I don't think those are necessary on the platform.

@zoghbi-a
Copy link
Author

zoghbi-a commented Sep 13, 2024

I don't think that would work out of the box either, so far I only saw python3 in practice.

The reason there are named kernels is that in the high energy image, there are multiple ones, and different notebooks will use different kernels.
Can we use something like nbsphinx to select the kernel. We can say that a default kernel is notebook unless something else is selected, which what we'll need when the high energy notebooks are ready.

@bsipocz
Copy link
Member

bsipocz commented Sep 13, 2024

Sure, but what about keeping the default python3 instead of notebook?

(FYI: we don't use nbsphinx, but I'll check if our tools are checking nbsphinx config or if I can easily add this into MyST-NB myself).

@bsipocz
Copy link
Member

bsipocz commented Sep 13, 2024

(Btw, this would come up to be a problem for anyone downloading a notebook and try to run it on their own machine, thus my preference to be as generic as possible for most things.)

@bsipocz
Copy link
Member

bsipocz commented Sep 14, 2024

OK, I've found a config option for the kernel name, so that question should not block this PR, and I'm OK with calling it notebook (though maybe calling it demo_notebook would be a better description, but that's bikeshedding and I'm happy with anything that will stay relatively stable (so we won't get them regularly overwritten).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants