Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate JSON Schema #3

Open
rth opened this issue Jun 23, 2023 · 3 comments
Open

Generate JSON Schema #3

rth opened this issue Jun 23, 2023 · 3 comments

Comments

@rth
Copy link
Member

rth commented Jun 23, 2023

In pyodide/pyodide#3573 @bollwyvl requested that it would be useful to generate a JSON Schema for pyodide-lock.json.

We can do that from the current Pydantic spec, though the question is where to put it (keeping in mind that this is a Python package).

@bollwyvl could you please explain more what would work for you?

@rth
Copy link
Member Author

rth commented Jun 23, 2023

Here is the current JSON schema,

>>> from pyodide_lock import PyodideLockSpec
>>> print(PyodideLockSpec.schema_json(indent=2))
{
  "title": "PyodideLockSpec",
  "description": "A specification for the pyodide-lock.json file.",
  "type": "object",
  "properties": {
    "info": {
      "$ref": "#/definitions/InfoSpec"
    },
    "packages": {
      "title": "Packages",
      "type": "object",
      "additionalProperties": {
        "$ref": "#/definitions/PackageSpec"
      }
    }
  },
  "required": [
    "info",
    "packages"
  ],
  "additionalProperties": false,
  "definitions": {
    "InfoSpec": {
      "title": "InfoSpec",
      "type": "object",
      "properties": {
        "arch": {
          "title": "Arch",
          "default": "wasm32",
          "enum": [
            "wasm32",
            "wasm64"
          ],
          "type": "string"
        },
        "platform": {
          "title": "Platform",
          "type": "string"
        },
        "version": {
          "title": "Version",
          "type": "string"
        },
        "python": {
          "title": "Python",
          "type": "string"
        }
      },
      "required": [
        "platform",
        "version",
        "python"
      ],
      "additionalProperties": false
    },
    "PackageSpec": {
      "title": "PackageSpec",
      "type": "object",
      "properties": {
        "name": {
          "title": "Name",
          "type": "string"
        },
        "version": {
          "title": "Version",
          "type": "string"
        },
        "file_name": {
          "title": "File Name",
          "type": "string"
        },
        "install_dir": {
          "title": "Install Dir",
          "type": "string"
        },
        "sha256": {
          "title": "Sha256",
          "default": "",
          "type": "string"
        },
        "package_type": {
          "title": "Package Type",
          "default": "package",
          "enum": [
            "package",
            "cpython_module",
            "shared_library",
            "static_library"
          ],
          "type": "string"
        },
        "imports": {
          "title": "Imports",
          "default": [],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "depends": {
          "title": "Depends",
          "default": [],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "unvendored_tests": {
          "title": "Unvendored Tests",
          "default": false,
          "type": "boolean"
        },
        "shared_library": {
          "title": "Shared Library",
          "default": false,
          "type": "boolean"
        }
      },
      "required": [
        "name",
        "version",
        "file_name",
        "install_dir"
      ],
      "additionalProperties": false
    }
  }
}

@bollwyvl
Copy link
Contributor

Right: so with that schema in-hand, if it were checked in, in e.g. pyodide_lock/schema/v1.json tools which aren't a specific major version of pydantic could use them at rest, if only via raw github URL (but that could at least carry a tag). Adding a $schema and $id would make it portable to editors, or it could get into the de facto schema store.

From a JSON schema, stdlib-compatible static types can be generate with:

... which could also be checked in, documented, etc. But these won't catch things like "a negative number," without a lot of jumping jacks.

Alternately, the relationship with pydantic can be inverted to use stdlib dataclasses and enum, and inherit them for validating objects, while losing some specificity like pattern, etc. or re-implementing chunks of pydantic. I haven't yet seen a good pattern for TypedTuple subclasses, but those would no doubt be the most performant non-sub-field-validating option.

Or, libraries can be used to validate instances against the schema at runtime:

The value of having these as dependencies is then debateable vs a one-stop-shop like pydantic... except for the portability. And indeed, on the typescript side, ajv an also fulfill this role with jtd.

At-rest schema can plug into existing documentation toolchains:

As well as property-based testing tools:

@rth
Copy link
Member Author

rth commented Jun 28, 2023

Thanks for the feedback!

But these won't catch things like "a negative number," without a lot of jumping jacks.

It feels to me like it depends on the perspective: If we are talking about Python then pydantic would do all the validation needed, but then JS validation would be less optimal. If we are talking about JS then indeed starting from the schema would probably be easier but then in Python one would need to re-implement parts of what pydantic does.

Personally, for now, I would be +1 to go with the first option of committing pyodide_lock/schema/v1.json and keeping it in pydantic, as it's just more straightforward for this repo. This would already allow having a reasonable spec validation in Python via pydantic for pyodide-build, this package once we add a CLI, and any CLI you may want to add /or have on the jupyterlite side. Having perfect validation on the JS side I would say this is maybe slightly less critical, as currently the way to make such a lock file is via some Python script using this library, which should produce a valid file.

That's something we can have now, which would already be an improvement over the current situation.

If it turns out that this is not enough, and that the produced schema doesn't allow sufficient validation on the JS side, I'm open to re-designing it differently, but it would likely need more work in this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants