Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add qemu style nix checks for hydra-cluster, hydra-node, hydra-tui #1647

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

locallycompact
Copy link
Contributor

@locallycompact locallycompact commented Sep 19, 2024

Allows for running node, cluster and tui tests in an isolated environment with sandbox off for internet access.

nix build --option sandbox false .#checks.x86_64-linux.hydra-node -L

  • CHANGELOG updated or not needed
  • Documentation updated or not needed
  • Haddocks updated or not needed
  • No new TODOs introduced or explained herafter

Copy link

github-actions bot commented Sep 19, 2024

Transaction costs

Sizes and execution budgets for Hydra protocol transactions. Note that unlisted parameters are currently using arbitrary values and results are not fully deterministic and comparable to previous runs.

Metadata
Generated at 2024-09-21 10:20:49.413659256 UTC
Max. memory units 14000000
Max. CPU units 10000000000
Max. tx size (kB) 16384

Script summary

Name Hash Size (Bytes)
νInitial 2fac819a1f4f14e29639d1414220d2a18b6abd6b8e444d88d0dda8ff 3799
νCommit 2043a9f1a685bcf491413a5f139ee42e335157c8c6bc8d9e4018669d 1743
νHead 2ee477c60839936be49a50030690865b5bed4db8cd2f05bf255ac680 10068
μHead a1610f6e64843161f4a88229c0286176f5325de3e2f773eec2b1d818* 4508
νDeposit c2117fd9ebdee3e96b81fd67ff7092d638926415c10f1f7e5a267ad0 2791
  • The minting policy hash is only usable for comparison. As the script is parameterized, the actual script is unique per head.

Init transaction costs

Parties Tx size % max Mem % max CPU Min fee ₳
1 5094 5.75 2.27 0.44
2 5297 7.09 2.80 0.46
3 5496 8.56 3.39 0.49
5 5902 11.12 4.39 0.53
10 6906 18.11 7.16 0.65
57 16355 82.81 32.75 1.77

Commit transaction costs

This uses ada-only outputs for better comparability.

UTxO Tx size % max Mem % max CPU Min fee ₳
1 567 10.52 4.15 0.29
2 759 13.86 5.65 0.34
3 944 17.33 7.20 0.38
5 1322 24.65 10.44 0.48
10 2253 45.22 19.36 0.75
20 4123 95.99 40.76 1.40

CollectCom transaction costs

Parties UTxO (bytes) Tx size % max Mem % max CPU Min fee ₳
1 57 560 22.14 8.66 0.42
2 114 671 33.89 13.40 0.55
3 170 786 46.27 18.50 0.69
4 227 893 62.56 25.17 0.88
5 284 1004 78.06 31.64 1.05
6 337 1116 93.57 38.25 1.23

Cost of Decrement Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 632 17.95 7.88 0.38
2 785 19.09 9.07 0.40
3 982 20.81 10.38 0.44
5 1350 26.05 13.92 0.52
10 2105 32.64 20.08 0.65
50 8036 99.81 75.38 1.85

Close transaction costs

Parties Tx size % max Mem % max CPU Min fee ₳
1 644 20.03 8.99 0.41
2 809 21.53 10.43 0.44
3 946 23.03 11.85 0.46
5 1373 26.98 15.60 0.54
10 1888 33.48 21.90 0.66
50 8021 96.90 82.84 1.88

Contest transaction costs

Parties Tx size % max Mem % max CPU Min fee ₳
1 683 25.86 11.12 0.47
2 798 27.69 12.65 0.50
3 996 29.72 14.45 0.54
5 1401 34.20 18.23 0.62
10 2082 43.45 26.07 0.78
40 6607 98.74 74.17 1.77

Abort transaction costs

There is some variation due to the random mixture of initial and already committed outputs.

Parties Tx size % max Mem % max CPU Min fee ₳
1 4971 17.47 7.59 0.56
2 5061 24.98 10.81 0.65
3 5166 37.83 16.52 0.80
4 5285 55.83 24.61 1.01
5 5444 74.54 33.04 1.23
6 5620 91.81 40.80 1.43

FanOut transaction costs

Involves spending head output and burning head tokens. Uses ada-only UTxO for better comparability.

Parties UTxO UTxO (bytes) Tx size % max Mem % max CPU Min fee ₳
5 0 0 4934 7.89 3.34 0.46
5 1 57 4968 9.02 4.05 0.47
5 5 284 5103 13.15 6.73 0.53
5 10 570 5275 19.01 10.37 0.61
5 20 1137 5611 30.52 17.57 0.77
5 30 1707 5954 41.25 24.44 0.92
5 40 2278 6295 53.17 31.82 1.09
5 50 2844 6631 64.51 38.94 1.24
5 81 4612 7685 99.47 60.99 1.73

End-to-end benchmark results

This page is intended to collect the latest end-to-end benchmark results produced by Hydra's continuous integration (CI) system from the latest master code.

Please note that these results are approximate as they are currently produced from limited cloud VMs and not controlled hardware. Rather than focusing on the absolute results, the emphasis should be on relative results, such as how the timings for a scenario evolve as the code changes.

Generated at 2024-09-21 10:23:02.463197046 UTC

Baseline Scenario

Number of nodes 1
Number of txs 3000
Avg. Confirmation Time (ms) 4.184044471
P99 6.971361249999921ms
P95 4.57936845ms
P50 3.7388875ms
Number of Invalid txs 0

Three local nodes

Number of nodes 3
Number of txs 9000
Avg. Confirmation Time (ms) 21.969437453
P99 107.73078102000024ms
P95 28.43345175ms
P50 19.708637000000003ms
Number of Invalid txs 0

flake.nix Outdated Show resolved Hide resolved
flake.nix Outdated Show resolved Hide resolved
flake.nix Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Sep 19, 2024

Test Results

503 tests  ±0   497 ✅ ±0   20m 14s ⏱️ -6s
160 suites ±0     6 💤 ±0 
  7 files   ±0     0 ❌ ±0 

Results for commit a3fa77a. ± Comparison against base commit d5729d3.

♻️ This comment has been updated with latest results.

validateJSON "does-not-matter.json" id Null
`shouldThrow` exceptionContaining @IOException "installed"
`shouldThrow` exceptionContaining @IOException ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this "installed" removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no idea. It throws, but it's a different message in the VM apparently.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the different message / common denominator to assert for then? Asserting that it contains "" is degenerating this statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vm-test-run-hydra-node>   1) Hydra.JSONSchema, validateJSON withJsonSpecifications, fails with missing tool
vm-test-run-hydra-node>        predicate failed on expected exception: IOException
vm-test-run-hydra-node>        does-not-matter.json: withBinaryFile: does not exist (No such file or directory)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so this is indeed not testing what it should (it "fails with missing tool").

So before this would fail with a tool missing to be installed error, but seems like withClearedPATH behaves different in the nixos vm test?

Not saying this is best test ever and maybe we should rather not test this at all, but setting this to "" certainly is not the solution here.

@noonio
Copy link
Contributor

noonio commented Sep 19, 2024

I see this error locally:

       >   test/Hydra/API/ServerSpec.hs:150:5:
       >   1) Hydra.API.Server echoes history (past outputs) to client upon reconnection
       >        uncaught exception: RunServerException
       >        RunServerException {ioException = Network.Socket.bind: resource busy (Address already in use), host = 127.0.0.1, port = 42303}
       >        (after 67 tests)
       >
       >   To rerun use: --match "/Hydra.API.Server/echoes history (past outputs) to client upon reconnection/"
       >
       > Randomized with seed 1445254403

and, with extreme sadness, I note that this is not the error I see in the CI here.

@noonio
Copy link
Contributor

noonio commented Sep 19, 2024

I think, at least, if we're going to go down this path we need to hard-code the seeds into the tests; otherwise we're just going to be in a very strange world of non-reproducible error hell.

That said; getting this error does make me feel a bit mixed on even this approach; given that the tests are now run a bit non-idiomatically with how development is done; meaning that if I want to replicate that exact error I either have to adjust the nix derivation (terrible) or build the cabal project and run that way (but, of course, I expect I won't get that error again in that case).

@locallycompact locallycompact force-pushed the lc/qemu-nix-tests branch 3 times, most recently from 0d4497d to 6c363c1 Compare September 19, 2024 17:57
@locallycompact
Copy link
Contributor Author

I see this error locally:

       >   test/Hydra/API/ServerSpec.hs:150:5:
       >   1) Hydra.API.Server echoes history (past outputs) to client upon reconnection
       >        uncaught exception: RunServerException
       >        RunServerException {ioException = Network.Socket.bind: resource busy (Address already in use), host = 127.0.0.1, port = 42303}
       >        (after 67 tests)
       >
       >   To rerun use: --match "/Hydra.API.Server/echoes history (past outputs) to client upon reconnection/"
       >
       > Randomized with seed 1445254403

and, with extreme sadness, I note that this is not the error I see in the CI here.

This is a normal flakey test.

@locallycompact locallycompact force-pushed the lc/qemu-nix-tests branch 7 times, most recently from 0e1885e to e2bdc27 Compare September 19, 2024 20:08
flake.nix Show resolved Hide resolved
@noonio
Copy link
Contributor

noonio commented Sep 20, 2024

Thanks for putting this up as a talking point :) It's good to see.

I think ultimately I'm against this style of testing here. Here's my reasoning:

  1. Flakeyness/Reproducibility: As discussed, the randomness is hidden a bit. We need to hard-code seeds if we did adopt this, just so it's reproducible.
  2. Development incompatibility: I think we've decided on using cabal-style building in the devShell; not least because it allows non-nix people to participate (if they wish); but it also is just better for incremental work (LSP, etc, etc). The test failures we get here can't be reproduced directly without re-invoking the vm tests locally, which I think is unacceptably slow. Relatedly, it makes the CI-kind of workflow quite different to the local workflow. I think this is an antipattern, honestly, because it means we need to maintain two styles always.
  3. Resourcey-quirks: As pointed out, it's a bit weird that we must hardcode RAM and CPU concerns into the tests. This results in poor allocation in different build environments; which if people are using the tests in CI and in dev, is very unfortunate. There seems to be no escaping this here in this style.
  4. Sandbox hacking/test "isolation" awareness: It seems very very unfortunate to me that we need to remember facts about the sandbox; and think about if our tests use up resources or not. On the one hand, I find this maybe even useful - It's a good reminder about which tests might do this. But on the other hand, it feels very annoying, in that if I accidentally make a test do that, it'll fail with some weird error that requires arcane knowledge to resolve ("oh yeah, you needed to update this thing way over here to say --sandbox false"). And, this isn't something you'll notice locally; you'll have to wait a whole CI cycle for it, at least.

Overall, I feel like commiting to a "CI as the developer runs it" style of work seems very reasonable; and moreoever just a bit easier to think about day-to-day.

Some avenues I can think of for exploring this:

  • IOGX looks very interesting; I saw it in action over here: plutus-accumulator
  • Use the ./ci folder of scripts approach (I can't find a good article, but I've seen a few projects do this). The technique here is that the GitHub action workflows are very miniimal, and just call out to some (well-defined) scripts of this kind. Then, it's much easier to run it locally if you wish.
  • Perhaps look at Nickel ; this is just a project that, I think, has a pretty nice/complex approach to Nixification; for example here's the nix infra stuff for the self-hosted runners. I haven't looked too closely; but maybe there's something to learn.
  • Garnix - what does it do? Do we need vmTests for it?
  • nixbuild - you mentioned this one; is it any good?
  • Just try and simplify our GitHub actions generally. I think @ch1bo is investigating this, partially. Maybe this will make things a bit more easy to reason about.
  • Use Hydra?! I think this is related to IOGX; I believe we used to use it, but it was too slow (or something?); maybe it's improved?

Overall, though, I think the pain of vmTests just doesn't make sense here, for our kind of "day-to-day" workflow. I think there could definitely be a place for the vmTests that would, say, set up a full hydra node environment and test some set of transactions (i.e. it would serve as a bit of a demo of how to build and run a hydra node from scratch, or something? not sure exactly, but could be fun to think about); but that should be additional, not replacing what we have. I mean, I can see that this PR only adds tests, so in some sense it's fine, but I just don't want it to replace our existing testing approach, merely just add something that is additionally useful.

What are your thoughts?

@locallycompact
Copy link
Contributor Author

  1. Flakeyness/Reproducibility: As discussed, the randomness is hidden a bit. We need to hard-code seeds if we did adopt this, just so it's reproducible.

This doesn't bother me. Haskell derivations aren't reproducible either. The flakeyness of test depth hasn't shown up here either. The flakeyness we saw on this and the other branch were due to race conditions which are randomly present and we should just drop those tests.

  1. Development incompatibility: I think we've decided on using cabal-style building in the devShell; not least because it allows non-nix people to participate (if they wish); but it also is just better for incremental work (LSP, etc, etc). The test failures we get here can't be reproduced directly without re-invoking the vm tests locally, which I think is unacceptably slow. Relatedly, it makes the CI-kind of workflow quite different to the local workflow. I think this is an antipattern, honestly, because it means we need to maintain two styless always.

This is an extremely normal practice that I am used to. Nobody develops by building haskell applications as a derivation, everyone uses cabal in a shell, but we use derivations in CI because they finish instantly in the case of a no-op.

  1. Resourcey-quirks: As pointed out, it's a bit weird that we must hardcode RAM and CPU concerns into the tests. This results in poor allocation in different build environments; which if people are using the tests in CI and in dev, is very unfortunate. There seems to be no escaping this here in this style.

This is odd but doesn't bother me either. It's OK to allocate one core per spun up service within the machine, more or less.

  1. Sandbox hacking/test "isolation" awareness: It seems very very unfortunate to me that we need to remember facts about the sandbox; and think about if our tests use up resources or not. On the one hand, I find this maybe even useful - It's a good reminder about which tests might do this. But on the other hand, it feels very annoying, in that if I accidentally make a test do that, it'll fail with some weird error that requires arcane knowledge to resolve ("oh yeah, you needed to update this thing way over here to say --sandbox false"). And, this isn't something you'll notice locally; you'll have to wait a whole CI cycle for it, at least.

This is a good thing. It highlights to us that we should put tests that require the internet in a separate test entirely, and put tests that don't in a non-sandboxed VM so we get instant finality on those deriviations.

Overall, I feel like commiting to a "CI as the developer runs it" style of work seems very reasonable; and moreoever just a bit easier to think about day-to-day.

I think this is an anti-pattern. CI is a massive sledgehammer that you should not be running on a 2-second feedback loop. It's something I would use as a checkpoint every 30 minutes to say "What does CI have to say?", or if I want to link a specific error.

Some avenues I can think of for exploring this:

Expect hydraJobs, which requires derivations.

  • Use the ./ci folder of scripts approach (I can't find a good article, but I've seen a few projects do this). The technique here is that the GitHub action workflows are very miniimal, and just call out to some (well-defined) scripts of this kind. Then, it's much easier to run it locally if you wish.

I do this with Haskell scripts sometimes as well, but derivations are better.

Not sure how this helps us.

  • Garnix - what does it do? Do we need vmTests for it?

Anything nix will expect derivations.

  • nixbuild - you mentioned this one; is it any good?

Requires derivations.

Ideally I would have one github pipeline that consumes the flake and renders it in individual jobs like gitlab dynamic pipeline.

  • Use Hydra?! I think this is related to IOGX; I believe we used to use it, but it was too slow (or something?); maybe it's improved?

Requires derivations.

What are your thoughts?

My final thought is that I would actually use this locally. Running cabal test all jams up ports so I don't use it. I test specific packages with cabal. But when finalising the branch, I want to use nix flake check so that anything that is done doesn't even show up in the terminal a second time - I just get left with the derivations that are still a problem. VM tests are annoyingly un-granular, but that can be improved if we commit to the style.

@noonio
Copy link
Contributor

noonio commented Sep 20, 2024

This doesn't bother me. Haskell derivations aren't reproducible either. The flakeyness of test depth hasn't shown up here either.
The flakeyness we saw on this and the other branch were due to race conditions which are randomly present and we should
just drop those tests.

It has shown up as I myself was testing out this very commit; see my comment earlier in the thread! :)

This is odd but doesn't bother me either. It's OK to allocate one core per spun up service
within the machine, more or less.

I don't see that it is, in general, because of the poor use of resources. Recalling that the central idea of this issue was speeding things up; this doesn't accomplish that, in general; i.e. it requires care and curation, which is difficult and time-consuming.

This is a good thing. It highlights to us that we should put tests that require the internet in a separate test entirely, and put
tests that don't in a non-sandboxed VM so we get instant finality on those deriviations.

I can be convinced; but how do we make it easy to make this consistent and fast between CI and local dev, so we're not maintaining two different ways of doing the same tests?

re: "Some avenues I can think of for exploring this: ..."

What I was hoping to see is an exploration of how some of these approaches would help our central goal - faster CI.

My final thought is that I would actually use this locally.

I think if it's useful for you it's fine to add these extra derivations; but I just don't want to switch our CI to it without doing some more investigations in other ways to speed up the CI ( see above comment ) and then, if we do decide this is best, carefully resolving the problems with the seed/randomness, and the dual-maintenance/dev problems.

Maybe to help make some progress here; do you have some example projects out there that use VM tests really nicely? Would be great to see/learn from!

@locallycompact
Copy link
Contributor Author

There are lots of examples in nixpkgs itself.

https://github.com/search?q=repo%3ANixOS%2Fnixpkgs+makeTest&type=code

@ch1bo
Copy link
Collaborator

ch1bo commented Sep 20, 2024

Development incompatibility: I think we've decided on using cabal-style building in the devShell; not least because it allows non-nix people to participate (if they wish); but it also is just better for incremental work (LSP, etc, etc).

I would like to second this. Allowing contributions without needing nix for typical development workflows is a great trait to retain for the project (is it currently true?)

(Noon) Overall, I feel like commiting to a "CI as the developer runs it" style of work seems very reasonable; and moreoever just a bit easier to think about day-to-day.

(Daniel) I think this is an anti-pattern. CI is a massive sledgehammer that you should not be running on a 2-second feedback loop. It's something I would use as a checkpoint every 30 minutes to say "What does CI have to say?", or if I want to link a specific error.

I think your points are orthogonal to each other. From my viewpoint, I would like to have continuous integration workflow to

  1. be similar to what I do in development, but only on a longer feedback loop. That means,
    • low maintenance
    • reproducible
    • complete (e.g. in development I might not run all tests)
  2. ensure things do work also in production where our development workflow differs to what we do in production. Examples:
    • build static binaries / for other platforms / package things into docker
    • assemble code docs with the website
    • benchmarks on specific hardware
    • multi-machine integration tests
    • ...

For 1, IMO the current state is too much nix already (see above)
For 2, IMO we are doing too much in github actions yaml (could be easier with nix?)

Overall, though, I think the pain of vmTests just doesn't make sense here, for our kind of "day-to-day" workflow. I think there could definitely be a place for the vmTests

I think any kind of fault testing that requires full machines (instead of containers), e.g. using jepsen would be a nice use case for orchestrated virtual machines?

@locallycompact locallycompact force-pushed the lc/qemu-nix-tests branch 2 times, most recently from a0a57ee to 611e4a6 Compare September 20, 2024 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants