Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPEC] Automated testnet deployment #664

Open
tbraun96 opened this issue Jul 5, 2023 · 0 comments
Open

[SPEC] Automated testnet deployment #664

tbraun96 opened this issue Jul 5, 2023 · 0 comments
Labels
optimization ⚙️ Tasks that are refactor, optimize, or are considered chores. p3 🔵 Issues should be resolved eventually task ✔️

Comments

@tbraun96
Copy link
Contributor

tbraun96 commented Jul 5, 2023

Overview

As we know, pipelines passing is never enough to determine if we're actually stable enough for a merge into master. Even though it's a good indicator that, in well-behaved environments, the code will run for at least a limited number of sessions, we must use the live testnet to prove that we can successfully run hundreds to thousands of sessions without fail. If we wish for increasing long-term stability, the practice of using the testnet before merging code into master should be codified into our development process to ensure we don't undo our hard work.

Currently, we have no enforced policies in Github Actions that require us to pass a testnet before merging into master. Deploying to testnets is manual, whereby @1xstj deploys the latest commit from the PR branch onto the testnet using SSH to execute a remote terminal.

Sometimes, we do not actually need a testnet, and other times, we do need a testnet. Because of this indeterminacy, we can use environments and auto-defined deployment targets to adjust the deployment target. One target will be the testnet deployment, and the other deployment target will be the null deployment. For the testnet deployment, the job passes when e.g. 500 sessions passes. We can use websockets or polling to determine when to terminate. For the null deployment, the job passes instantly. So long as one of these passes (as well as the normal PR checks), we can then confidently merge into master assured of the stability of the PR.

Task List

  • Create a Github-actions only SSH credentials for AWS testnet access, then store those credentials inside github secrets
  • Define criteria for when testnet deployment is required (e.g., based on files-changed, hash of cargo.toml changes, etc)
  • Create the testnet deployment environment with accompanying logic for connecting to SSH then executing the deployment commands. Require that this environment only begins with manual approval.
  • Create the github actions file that detects which environment is required then either deploys to the testnet or instantly finishes (i.e., the null deployment). We can call this the testnet-deployer

The ideal workflow:

Case A: User makes a PR that does not affect core logic

The testnet-deployer runs and detects that this PR does not need a testnet and instantly returns true

Case B: User makes a PR that affects core logic

The testnet-deployer runs and detects that this PR needs a testnet and submits a testnet deployment request. A manual approval must then be given in the GitHub interface. Next, the testnet-deployer continues by executing the relevant SSH commands, thus starting a testnet. Then, the testnet-deployer uses websockets (or polls) until it either notices stalling or the 500 session target is reached. Finally, the testnet-deployer either returns success or falure depending on the previous result.

Further discussion

We may not want auto-detection when choosing a deployment target. If we wish for manual selection of the deployment target, the testnet-deployer can instead send two simultaneous requests to both the testnet deployment target and the null deployment target and wait for approval. If we decide we need a testnet, we deny the null deployment and accept the testnet, and if we decide not to need a testnet, we deny the testnet request and accept the null deployment. The testnet-deployer, in either case, will succeed once either request returns with a success. The accepted testnet deployment does not succeed until the 500 sessions are reached, whereas the null deployment succeeds immediately.

@1xstj 1xstj added p3 🔵 Issues should be resolved eventually optimization ⚙️ Tasks that are refactor, optimize, or are considered chores. task ✔️ labels Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization ⚙️ Tasks that are refactor, optimize, or are considered chores. p3 🔵 Issues should be resolved eventually task ✔️
Projects
Status: Not Started 🕧
Development

No branches or pull requests

2 participants