Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wheels cli #20

Merged
merged 24 commits into from
Nov 12, 2023
Merged

Add wheels cli #20

merged 24 commits into from
Nov 12, 2023

Conversation

joemarshall
Copy link
Contributor

This adds a very basic cli so in a you can do

pyodide lockfile add-wheels *.whl and it will add wheels to a pyodide-lock.json in the current folder and write it out to pyodide-lock-new.json. There's options for input and output json.

What this means is that if you do:
pyodide build --with-dependencies to create a load of wheel files, you can then do pyodide lockfile add-wheels dist/*.whl and it will create you a lockfile including your new wheels, with dependencies between these wheels correctly marked.

No tests or anything yet, but thought it was worth pushing so you can see it.

This was referenced Sep 22, 2023
@joemarshall joemarshall marked this pull request as ready for review September 22, 2023 14:42
@rth
Copy link
Member

rth commented Sep 22, 2023

Thanks @joemarshall for making this workflow more usable! I'll try to review it soon.

pyproject.toml Outdated Show resolved Hide resolved
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joemarshall a few comments below but overall it looks quite nice!

) -> None:
"""Add a list of wheel files to this pyodide-lock.json

Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally use numpy style docstrings

version = split_name[1]
python_tag = split_name[-3]
split_name[-2]
platform_tag = split_name[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirements = get_wheel_dependencies(wheel_file, package.name)
for r in requirements:
req_marker = r.marker
req_name = canonicalize_name(r.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as in https://github.com/pyodide/pyodide-lock/pull/18/files#r1332985361 Are you sure the name should be normalized? Last I checked it wasn't in pyodide-lock.json and one install by package name usually not the normalized name. But I could be mistaked.

else:
raise RuntimeError(
f"Requirement {req_name} is not in this distribution."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very long method. Some of it should probably be split into a separate helper function as @bollwyvl did in #18

@@ -74,3 +87,54 @@ def _generate_package_hash(full_path: Path) -> str:
while chunk := f.read(4096):
sha256_hash.update(chunk)
return sha256_hash.hexdigest()


def get_wheel_dependencies(wheel_path: Path, pkg_name: str) -> list[Requirement]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can parse them with importlib.metadata so we don't have to bother checking it's correct for all edge cases.

@rth
Copy link
Member

rth commented Sep 25, 2023

I realized the PRs you did @joemarshall and @bollwyvl are on the same topic with a very similar API. So it would be nice to take some parts of each. I don't know how you prefer to proceed, I'm tempted to to merge #18 first (since it's smaller and does fewer things), and then sync it with this PR and use methods/functions added there here. Assuming you are OK with the proposed API. But as you prefer, I'm open to suggestions. Also if you have some review comments, since you were thinking about similar use cases :)

The one thing that is a blocker for both IMO, is that currently, both normalize the package name (and the name of the dependencies). At some level, it might not matter since we likely normalize the user-provided name when installing. But I feel like it should be the same in pyodide, micropip and here (whichever convention is chosen).

@rth
Copy link
Member

rth commented Sep 25, 2023

both normalize the package name (and the name of the dependencies).

On that topic, there was also pyodide/pyodide#4005 as not using project names, makes it not possible to install such packages via pip (if pyodide-lock.json is re-exported as a Simple JSON index) in a venv under node. However now that I look at the PEP 691 it does say that name should be normalized.

Now I'm confused, maybe we should be normalizing package names in pyodide-lock.json. Any thoughts on this @ryanking13 ?

@ryanking13
Copy link
Member

Now I'm confused, maybe we should be normalizing package names in pyodide-lock.json. Any thoughts on this @ryanking13 ?

Hmm, yes I think normalizing package names in pyodide-lock.json would be a good idea to make things simple. micropip is currently saving the normalized names of packages because micropip doesn't know the real project name before downloading the wheel and see the metadata in it, while the convention in pyodide-lock.json is to use the original package name as a key.

I think using normalized names everywhere will fix the old problem in micropip: it cannot handle packages that are included in Pyodide but have non-normalized names (e.g. pyodide-lock.json has ruamel.yaml package but micropip.install("ruamel.yaml") fails because micropip searches the normalized name: ruamel-yaml).

So I am +1 for using a normalized name in pyodide-lock.json.

@rth rth mentioned this pull request Sep 26, 2023
5 tasks
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I megered #18. @joemarshall would you mind syncing with upstream/main and seeing if you could use the methods/functions added there? Thanks!

pyodide_lock/spec.py Outdated Show resolved Hide resolved
@joemarshall
Copy link
Contributor Author

@rth @bollwyvl So, this PR now uses much of the code from #18.

I did some refactoring which probably requires an opinion from @bollwyvl also, because it makes the dependency handling more robust, by messing with some of the code from #18.

In particular:

I've made PackageSpec.from_wheel internal only (i.e. PackageSpec._from_wheel). This was because in order to handle dependencies nicely, one needs to know the full set of packages available to the package. Otherwise two things break -

  1. You can't check if your dependencies exist
  2. If dependencies are included with extras, we need to make sure importing the dependency also imports things for that extra; if that isn't done, then you end up with different behaviour in pyodide depending on import order of packages which is bad. This way, installing a package with extras does the same thing as on a standard python package setup, the extras are available for any user of that dependency. Fixing this requires messing with the PackageSpec objects for different packages when building the pyodide-lock.

Because of this, rather than create a PackageSpec for each wheel and add them all into the PyodideLockSpec, it makes sense to add a bundle of wheels to a PyodideLockSpec. In this PR, this is done using PyodideLockSpec.add_wheels.

I made PackageSpec.from_wheel internal, because it seems to me quite dangerous, in that the moment your wheel has extras or you have a missing dependency, it is very easy to create a broken lockfile.

There is also some handling done in PyodideLockSpec.add_wheels to re-write package filenames to URLs, for example if you're hosting your self-built packages at a different URL to those of your core pyodide packages.

Oh and I improved the handling of marker environment to make it use the version numbers etc. from the PyodideLockSpec object, and it now checks wheel abi etc. against that also.

I think I ported pretty much all the tests from #18 or equivalent into the new add_wheels format, probably worth a look to see if I missed anything important.

@bollwyvl
Copy link
Contributor

I made PackageSpec.from_wheel internal

Welp, that's the one I really want to be able to use downstream.

But frankly, even after adding that initial method, I feel like the data models should probably be "dumb", and only busy themselves with validation... and potentially removing the IO-based methods altogether.

Going further, having non-pydantic types at the borders e.g. accept dict, return a dataclass or TypedDict would ensure the internal logic can be upgraded to e.g. pydantic >=2 or 3 (rewritten in zig) or 4 (rewritten in formally verified spark) without breaking all downstreams.

check if your dependencies exist

The thing would be a valid package spec, but it's the lockfile as a whole that might be invalid... and that's not even being validated right now in main. From a portable JSON schema point of view, this can't really be fully validated, as the only standards-based approach for that would be... rather complex.

However, with pyodide/micropip#83, this isn't even really the case, as dependencies could be sourced from an index, really getting to the use case I want: As a project, be able to ship one locally-built wheel and make it autoimportable. If this means having to fake up a PyPI index for its non-pydodide-distributed dependencies, I can live with that.

At any rate: having the wheel spec generation be as simple and accurate as possible, and having the lock spec handle the higher-order invariants seems like it will keep code simpler and more easily tested.

package filenames to URLs

Yep, while changing the field name would be bad, this should have a format="uri", and description to that effect, so that it is explicit in the schema.

with extras

Extras are frankly pretty broken, upstream, as they don't constrain future solves, aren't validated by pip check, etc. And at least one PEP 517 backend generates straight up wrong extras, silently, if the (custom) config is not exactly right.

If this is gonna do the thing, then the spec needs to add extras, and all the existing in-distribution packages need them populated.

tests/test_wheel.py Outdated Show resolved Hide resolved
@joemarshall
Copy link
Contributor Author

I made PackageSpec.from_wheel internal

Welp, that's the one I really want to be able to use downstream.

But frankly, even after adding that initial method, I feel like the data models should probably be "dumb", and only busy themselves with validation... and potentially removing the IO-based methods altogether.

Going further, having non-pydantic types at the borders e.g. accept dict, return a dataclass or TypedDict would ensure the internal logic can be upgraded to e.g. pydantic >=2 or 3 (rewritten in zig) or 4 (rewritten in formally verified spark) without breaking all downstreams.

check if your dependencies exist

The thing would be a valid package spec, but it's the lockfile as a whole that might be invalid... and that's not even being validated right now in main. From a portable JSON schema point of view, this can't really be fully validated, as the only standards-based approach for that would be... rather complex.

However, with pyodide/micropip#83, this isn't even really the case, as dependencies could be sourced from an index, really getting to the use case I want: As a project, be able to ship one locally-built wheel and make it autoimportable. If this means having to fake up a PyPI index for its non-pydodide-distributed dependencies, I can live with that.

At any rate: having the wheel spec generation be as simple and accurate as possible, and having the lock spec handle the higher-order invariants seems like it will keep code simpler and more easily tested.

package filenames to URLs

Yep, while changing the field name would be bad, this should have a format="uri", and description to that effect, so that it is explicit in the schema.

with extras

Extras are frankly pretty broken, upstream, as they don't constrain future solves, aren't validated by pip check, etc. And at least one PEP 517 backend generates straight up wrong extras, silently, if the (custom) config is not exactly right.

If this is gonna do the thing, then the spec needs to add extras, and all the existing in-distribution packages need them populated.

Ah - I think we have slightly different but related use cases here -

My use case is to build a wheel along with all dependencies (so I have a folder of wheels), and make all those wheels import nicely in pyodide. So for me, I make the assumption that:

  1. The base wheels in pyodide-lock are fine.
  2. I have all the dependencies in the folder of wheels I'm giving it, so I don't need to look elsewhere for wheels.

Assuming I built with pyodide build --with-dependencies these should be true.

This means that all I need here is to add the correct wheels to the pyodide-lock so they import correctly. No need to create a dummy pypi or anything.

The reason for doing the propagation of extras into the dependencies here is to mirror what happens when you call pip install.

As an example I have a package (lets call it toplevel) which depends on 'eth-hash[pycryptodome]'.

If I install it with pip, it installs eth-hash, and pycryptodome as well as toplevel. This means if I import eth-hash, it works using pycryptodome, because pycryptodome is installed. Whatever order I import things in, the same behaviour persists.

In pyodide-lock, if I ignore extras as it would if I used the original from_wheel method, then I first import all my wheels into the pyodide-lock. toplevel has a dependency on eth-hash, which is fine, but eth-hash doesn't depend on pycryptodome. This means that if I import toplevel, I get eth-hash without the (required) functionality from pycryptodome. Whereas if I import pycryptodome manually and then import toplevel, everything works as it should.

I agree that extras are pretty rubbish, and I can see why pyodide-lock doesn't support them, which is why I think the hack to resolve installed extras at the time of making the pyodide-lock file makes sense. After all, it does mirror the behaviour on a normal python, that if you ever install a package with a dependency that has an extra, all packages that use that dependency also have the extra included.

I don't think there's anything particularly wrong with making from_wheel public again, but I do think that one should be really careful about just dropping dependencies in this way, and unless we were to build extras support into pyodide-lock, I would be wary of using it, as I've already come across one real world example where it doesn't work (which typically is the exact one that is paying me to do this work!)

My preferred option actually is to keep _from_wheel private, but just note that you can easily use add_wheels with ignore_missing_dependencies==True to achieve the behaviour you are wanting here.

@joemarshall
Copy link
Contributor Author

But frankly, even after adding that initial method, I feel like the data models should probably be "dumb", and only busy themselves with validation... and potentially removing the IO-based methods altogether.

Good point. I think you're absolutely right. I've moved the add_wheels to be utils.add_wheels_to_spec and from_wheel is now utils.package_spec_from_wheel.

Oh and done those other review changes above. I think hopefully we're at a point that this provides the functionality both of us want?

@joemarshall
Copy link
Contributor Author

@rth Hopefully ready for merging now

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joemarshall!

I've left several comments about code details. Overall, I think it looks good, but regarding the high-level design, I wasn't involved into the discussion deeply, so it would be nice to get an additional review from @rth.

pyodide_lock/cli/lockfile.py Outdated Show resolved Hide resolved
pyodide_lock/cli.py Outdated Show resolved Hide resolved
pyodide_lock/spec.py Outdated Show resolved Hide resolved
pyodide_lock/cli.py Outdated Show resolved Hide resolved
pyodide_lock/utils.py Outdated Show resolved Hide resolved
pyodide_lock/utils.py Show resolved Hide resolved
pyodide_lock/utils.py Show resolved Hide resolved
pyodide_lock/utils.py Outdated Show resolved Hide resolved
pyodide_lock/utils.py Outdated Show resolved Hide resolved
pyodide_lock/utils.py Outdated Show resolved Hide resolved
@joemarshall
Copy link
Contributor Author

@rth Done all the review changes above now.

@joemarshall
Copy link
Contributor Author

@bollwyvl Any chance we could get this merged so it can drop off my todo list?

@ryanking13
Copy link
Member

Sorry for the long wait, if no one else review this by the end of the week, I'll check back over the weekend and merge this.

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @joemarshall!

@ryanking13 ryanking13 merged commit d825167 into pyodide:main Nov 12, 2023
3 checks passed
@ryanking13
Copy link
Member

Released v0.1.0a4 including this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants