Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: Python wrapper package, Python API #634

Open
SKalt opened this issue Jun 11, 2024 · 14 comments · May be fixed by #672
Open

feature request: Python wrapper package, Python API #634

SKalt opened this issue Jun 11, 2024 · 14 comments · May be fixed by #672

Comments

@SKalt
Copy link
Contributor

SKalt commented Jun 11, 2024

I'm building a documentation site using mkdocs, a python-based static site generator. Adding a python wrapper package would let me install the pagefind CLI without a node.js toolchain or a script to download and validate the pre-compiled binary. A Python API would also be useful for writing plugins for mkdocs and other existing python static site generators.

I'd be happy to contribute the python wrapper package and API bindings to this repo. Alternately, I'd also be happy to create a separate repo for managing the python wrapper package if you don't want to increase the maintenance burden here.

@bglw
Copy link
Contributor

bglw commented Jun 13, 2024

👋

There's precedence for both approaches. The Node wrapper is maintained in this repo, while there exists Nix and Juliet wrappers maintained externally.

I'm not a heavy Python user myself, so I'm slightly hesitant to take on maintenance — however I think maintaining it in this repo would be the best result for users, so we should go with that. I've been wanting to offer a distribution channel for users without Node, and Python seems like a good avenue (plus I'd love to have it available in mkdocs, which I'm a big fan of).

One thing I would like to avoid is having the wrapper package download binaries from GitHub actions, since that has caused issues/slowdowns in the past. Is it possible to bundle a platform-specific binary into a python wrapper in a similar manner to the current npm optionalDependencies?

If you're happy contributing the code, and helping me get automated publishing set up, then it would be much appreciated! 🙂 Let me know what I can do to help you.

@SKalt
Copy link
Contributor Author

SKalt commented Jun 13, 2024

Looks like I should be able to crib off of https://github.com/astral-sh/ruff/blob/main/.github/workflows/release.yaml, which uses maturin (part of the Py03 project) to shove all the relevant binaries and files into a single python package.

This is a long shot, but would you be open to using goreleaser for non-node packaging? On one hand, with a few lines of hacks GoReleaser can invoke cargo to build the binary and then repackage the binary for many non-node distribution formats including deb, rpm, HomeBrew, Scoop, etc. On the other hand, GoReleaser probably won't have first-class support for Rust for the foreseeable future. There's also a rust-releaser project that seems interesting, but it's not as mature as GoReleaser.

@SKalt
Copy link
Contributor Author

SKalt commented Jun 19, 2024

Update: my OSS time at work has dried up for the moment; I'm not sure when I'll get back to this.

@bglw
Copy link
Contributor

bglw commented Jun 20, 2024

No worries! Some notes for when you do, or for anyone else who comes across this issue and wants to take it on:

I don't mind what tool is used to actually release the Python package, but it doesn't need to worry about actually building Pagefind, so it doesn't need any level or support for (or knowledge of) Rust.

The publishing will slot into the existing action alongside the publish-binary-npm-packages step, which get given all built assets to download:

- uses: actions/download-artifact@v3
with:
name: release
path: build-artifacts

So the starting point will have the built executables for every platform, and the checksums for each executable :) All it needs to do it package the files up and publish it.

@SKalt
Copy link
Contributor Author

SKalt commented Jun 22, 2024

Hey, I had some time and took a look at the node.js wrapper. Would you mind passing pages/other content by-referneve over FFI rather than the service rpc to avoid serialization/deserialization overhead?

@bglw
Copy link
Contributor

bglw commented Jun 23, 2024

I'd hesitate to move things to FFI; I don't want it to be a requirement for creating a wrapper package, as the current "external" communication means you can wrap it with anything that can read/write to stdio, or indeed it could talk over a network.

If the overhead does become a performance bottleneck though, that should be solved.

I based a lot of the concept on esbuild's Go service API, which avoids FFI. Notably, over a certain message size they pass content via temporary files which is quicker and has a lower de-serialization overhead compared to the stdio messaging. Pagefind doesn't do this yet, so trying out a similar idea would be the first port of call if we need to improve things. I haven't seen any discussion around their wrapper performance being a problem so I have some confidence this will be good enough for us too.

But, if there was still too much overhead with that approach and it was impacting an actual use-case, I'd be open to looking into FFI 🙂 (I would want to keep the current workflow though, so it would mean maintaining two service APIs)

@SKalt
Copy link
Contributor Author

SKalt commented Jun 25, 2024

That makes sense, and avoiding FFI will certainly speed up implementation.

On a different topic, have you considered putting together an ARCHITECTURE.md for this repo? I'd like to hear more about the trade-offs/decisions that got pagefind to its current state.

@bglw
Copy link
Contributor

bglw commented Jun 25, 2024

It's a good idea! (and I do swoon over matklads posts 😄)

If I can find the perfect task to procrastinate from by writing one, it just might happen 😉

@SKalt
Copy link
Contributor Author

SKalt commented Jul 5, 2024

Hi @bglw, I've got a working draft of the Python API over in my fork of this repo (not yet PR'd here since I didn't see an drop-down with the option to create a draft PR). Could you take a look at the integration test and tell me what you think?

I'm planning on following the packaging pattern laid out in https://simonwillison.net/2022/May/23/bundling-binary-tools-in-python-wheels/ to include the built pagefind binary at the root of the python package.

@SKalt
Copy link
Contributor Author

SKalt commented Jul 24, 2024

Hey, since I haven't got any feedback on my proposed wrapper API, I'm going to move the python-related code to a separate repo in order to deploy independent of this repo's release cycle. I'm thinking of deploying the experimental python API in a package named pagefind-python, and the binary as pagefind-python-bin in order to reserve the pagefind package name for official use.

@bglw
Copy link
Contributor

bglw commented Jul 24, 2024

Hey @SKalt — apologies, I missed that message 🙈

I won't have time this week, but I can give it a proper review next week. Happy for you to open a standard PR and mention that it's a draft, equally happy for you to continue in a separate repo and publish yourself :)

If you want to publish separately for now and later merge into an official package release once it's less experimental, we can do that too

@SKalt SKalt linked a pull request Jul 27, 2024 that will close this issue
@SKalt
Copy link
Contributor Author

SKalt commented Jul 27, 2024

Hi @bglw, WIP PR sent! Let me know what you think of the API.

@SKalt
Copy link
Contributor Author

SKalt commented Aug 18, 2024

https://github.com/sKalt/pagefind_python <- you can find the python API here, now with CI and a rough--but-passable build system.

@bglw
Copy link
Contributor

bglw commented Aug 18, 2024

CI scripts looking good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants