Reproducible image builds for Fornax images #10

zoghbi-a · 2024-07-16T16:14:44Z

This is an attempt to migrate the Fornax images from gitlab. For now these images are for testing only and are not deployed in production. We will ultimately be using these public images.

The docker images are restructured from what we have on gitlab to make the builds reproducible, so building images at different times should produce images with the exact same packages (as much as possible). This is done with the conda-lock package. Reproducible builds will allow us to ultimately to create DOIs for them and cite them.

I am putting this out here to get feedback. This PR contains most of the needed parts, and the content of the images mirrors what we have on fornax (as much as possible). A description of how the builds are done is in the readme file. Part of this structure was borrowed from PANGEO

bsipocz · 2024-07-16T17:28:56Z

It's not exactly related to this PR, but I wonder if you experimented with pixi (https://github.com/prefix-dev/pixi)? It looked compelling during demos/talks at scipy and the high-energy physics folks have good reproducibility experiences with it (and it finally makes conda and pip on parity with each other).

bsipocz · 2024-07-16T17:31:08Z

base-image/conda-notebook.yml

+dependencies:
+  - jupyterlab
+  - dask
+  - jupytext


can we have jupyterlab-myst in here, too?

bsipocz · 2024-07-16T17:32:02Z

base-image/packages.txt

+nest-asyncio==1.6.0
+notebook==7.2.0
+notebook-shim==0.2.4
+numpy==1.26.4


at some point we need to bump this to 2.0, and this migration feels to be a good point for it.

For now, I think the hold back is lsdb notebooks, and lightkurve, which has a some dependency that requires numpy<2. There may be others that I am not aware of.

yes, they rely on healpy atm, and also one of the ML notebooks also has a dependency that requires np <2. So I would suggest having a reminder issue about numpy 2, and just roll with the old one for now.

bsipocz · 2024-07-16T17:34:41Z

tractor/conda-science_demo.yml

+  - netpbm
+  - acstools
+  - astropy
+  - astroquery


pyvo is pulled in by astroquery, but it would good to spell it out separately as there is direct use of it.

Maybe the policy could be to list everything that is directly imported in the demo notebooks?

bsipocz · 2024-07-16T17:39:48Z

I know that we call this tractor as it was one of the complicated packages to install, but I wonder, if the base image doesn't really have the astro packages, that we should call this something more descriptive. Or have a bit more of the standard packages available in the base image, too (e.g. pyvo, astroquery, maybe scikit-learn, or some of the other astropy affiliated packages).

zoghbi-a · 2024-07-16T19:22:46Z

I know that we call this tractor as it was one of the complicated packages to install, but I wonder, if the base image doesn't really have the astro packages, that we should call this something more descriptive. Or have a bit more of the standard packages available in the base image, too (e.g. pyvo, astroquery, maybe scikit-learn, or some of the other astropy affiliated packages).

The idea of the base image is that it contains the system and jupyter environment. This is not meant to be made available as an image to the end user.

zoghbi-a · 2024-07-16T19:23:48Z

@bsipocz, does github CI supports building these images and say pushing them to dockerhub (assuming we have an account)?

bsipocz · 2024-07-17T03:45:00Z

I haven't yet done it myself, but it seems to be possible. Happy to help figuring it out if we plan to do it here.

bsipocz · 2024-07-26T22:29:20Z

tractor/Dockerfile

I wonder if it would be more useful for the users to call this image something else rather than tractor. E.g. keep the forced photometry name and drop tractor?
(After all we just called it tractor informally as that was the most complicated component to install, but it has zero descriptive information content for a generic user, especially for one who is not familiar with that library)

zoghbi-a · 2024-08-01T20:22:17Z

It's not exactly related to this PR, but I wonder if you experimented with pixi (https://github.com/prefix-dev/pixi)? It looked compelling during demos/talks at scipy and the high-energy physics folks have good reproducibility experiences with it (and it finally makes conda and pip on parity with each other).

This looks like a cool tool. Are we willing to force users to use this instead of conda directly? From the documentation, pixi uses conda under the hood, but you are discouraged from interacting with it directly.

bsipocz · 2024-08-01T22:52:18Z

Are we willing to force users to use this instead of conda directly

It promises better user experience, and the usage should not be that different, so I don't think it should be a big issue. (Being said, I'm not a conda user myself so it may not be the best input here. I'll ping the scientists on the team fin case they have strong opinions against trying this)

…ult kernel; add config scripts

bsipocz · 2024-09-13T00:38:07Z

Until I figure out how to deal with nasa-fornax/fornax-demo-notebooks#303, can we have the demo notebook kernel named super genericly? E.g. python3? So it will work out of the box on CI as well as for FSC users.

(See Troy's comment here: nasa-fornax/fornax-demo-notebooks#321)

zoghbi-a · 2024-09-13T01:29:21Z

The names conda-root-py comes from nb_conda_kernels, which is used to allow custom environments created by the user to be automatically converted to kernels. That naming is built into it and cannot be changed. I don't like it.
With this PR, the plant is have one environment called notebook. Would that work for what you need?

bsipocz · 2024-09-13T01:41:35Z

I don't think that would work out of the box either, so far I only saw python3 in practice.

Here is what I got on CI (and of course locally, too): https://app.circleci.com/pipelines/github/nasa-fornax/fornax-demo-notebooks/184/workflows/3b48be23-3bc0-41cc-8e46-95914baa0f3c/jobs/139?invite=true#step-103-273

I have changed some of these back to python3, but in practice, people keep committing back unrelated metadata changes with their content changes, so the situation is not great.
(I've opened a feature request upstream to make this all configurable in the build system, but I don't think it will be addressed any time soon).

bsipocz · 2024-09-13T01:46:56Z

Re dependencies: I have cleaned up the dependency files for the currently existing notebooks. They are not yet conflicting, so if possible, it would be great to include them in the image.

they are stored with their notebooks, so */requirements*.txt should get them. As well as some site-building dependencies in site-requirements.txt, but I don't think those are necessary on the platform.

zoghbi-a · 2024-09-13T16:05:59Z

I don't think that would work out of the box either, so far I only saw python3 in practice.

The reason there are named kernels is that in the high energy image, there are multiple ones, and different notebooks will use different kernels.
Can we use something like nbsphinx to select the kernel. We can say that a default kernel is notebook unless something else is selected, which what we'll need when the high energy notebooks are ready.

bsipocz · 2024-09-13T16:41:48Z

Sure, but what about keeping the default python3 instead of notebook?

(FYI: we don't use nbsphinx, but I'll check if our tools are checking nbsphinx config or if I can easily add this into MyST-NB myself).

bsipocz · 2024-09-13T16:43:35Z

(Btw, this would come up to be a problem for anyone downloading a notebook and try to run it on their own machine, thus my preference to be as generic as possible for most things.)

bsipocz · 2024-09-14T00:43:45Z

OK, I've found a config option for the kernel name, so that question should not block this PR, and I'm OK with calling it notebook (though maybe calling it demo_notebook would be a better description, but that's bikeshedding and I'm happy with anything that will stay relatively stable (so we won't get them regularly overwritten).

zoghbi-a added 6 commits May 9, 2024 12:04

starting with a new structure

3a33cac

add environment-lock.yml to base-image

f504707

switched to conda-lock

5bcb36c

add image infrastructure and tractor image to test it

70d8d19

update the readme file

da62680

fix typos in readme

46f3136

zoghbi-a changed the title ~~Reproducible image builds for Fornax imagex~~ Reproducible image builds for Fornax images Jul 16, 2024

bsipocz reviewed Jul 16, 2024

View reviewed changes

bsipocz reviewed Jul 26, 2024

View reviewed changes

zoghbi-a added 10 commits August 12, 2024 22:09

add more explicit astro-dependencies to base-image

f4a7106

switch to updating the main conda env rather than creating new ones

4335e76

update jupyter base image; use default entrypoint; make notebook defa…

4fe4d39

…ult kernel; add config scripts

update image labels

e5b2565

fix names in tractor/conda-notebook.yml

d5905dc

move conda activate to /etc/bash.bashrc

34e4a49

undo label change as it does not work

1dcfeeb

changes how the default env is set so it does not rely on ~/.bashrc

ee631f7

fix new kernel installations

ece889e

add heasoft image

87029b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducible image builds for Fornax images #10

Reproducible image builds for Fornax images #10

zoghbi-a commented Jul 16, 2024

bsipocz commented Jul 16, 2024

bsipocz Jul 16, 2024

bsipocz Jul 16, 2024

zoghbi-a Sep 13, 2024 •

edited

Loading

bsipocz Sep 13, 2024

bsipocz Jul 16, 2024

bsipocz commented Jul 16, 2024

zoghbi-a commented Jul 16, 2024

zoghbi-a commented Jul 16, 2024 •

edited

Loading

bsipocz commented Jul 17, 2024

bsipocz Jul 26, 2024

zoghbi-a commented Aug 1, 2024

bsipocz commented Aug 1, 2024

bsipocz commented Sep 13, 2024

zoghbi-a commented Sep 13, 2024

bsipocz commented Sep 13, 2024 •

edited

Loading

bsipocz commented Sep 13, 2024 •

edited

Loading

zoghbi-a commented Sep 13, 2024 •

edited

Loading

bsipocz commented Sep 13, 2024

bsipocz commented Sep 13, 2024

bsipocz commented Sep 14, 2024

Reproducible image builds for Fornax images #10

Are you sure you want to change the base?

Reproducible image builds for Fornax images #10

Conversation

zoghbi-a commented Jul 16, 2024

bsipocz commented Jul 16, 2024

bsipocz Jul 16, 2024

Choose a reason for hiding this comment

bsipocz Jul 16, 2024

Choose a reason for hiding this comment

zoghbi-a Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

bsipocz Sep 13, 2024

Choose a reason for hiding this comment

bsipocz Jul 16, 2024

Choose a reason for hiding this comment

bsipocz commented Jul 16, 2024

zoghbi-a commented Jul 16, 2024

zoghbi-a commented Jul 16, 2024 • edited Loading

bsipocz commented Jul 17, 2024

bsipocz Jul 26, 2024

Choose a reason for hiding this comment

zoghbi-a commented Aug 1, 2024

bsipocz commented Aug 1, 2024

bsipocz commented Sep 13, 2024

zoghbi-a commented Sep 13, 2024

bsipocz commented Sep 13, 2024 • edited Loading

bsipocz commented Sep 13, 2024 • edited Loading

zoghbi-a commented Sep 13, 2024 • edited Loading

bsipocz commented Sep 13, 2024

bsipocz commented Sep 13, 2024

bsipocz commented Sep 14, 2024

zoghbi-a Sep 13, 2024 •

edited

Loading

zoghbi-a commented Jul 16, 2024 •

edited

Loading

bsipocz commented Sep 13, 2024 •

edited

Loading

bsipocz commented Sep 13, 2024 •

edited

Loading

zoghbi-a commented Sep 13, 2024 •

edited

Loading