Add qemu style nix checks for hydra-cluster, hydra-node, hydra-tui #1647

locallycompact · 2024-09-19T15:06:26Z

Allows for running node, cluster and tui tests in an isolated environment with sandbox off for internet access.

nix build --option sandbox false .#checks.x86_64-linux.hydra-node -L

CHANGELOG updated or not needed
Documentation updated or not needed
Haddocks updated or not needed
No new TODOs introduced or explained herafter

github-actions · 2024-09-19T15:12:59Z

Transaction costs

Sizes and execution budgets for Hydra protocol transactions. Note that unlisted parameters are currently using arbitrary values and results are not fully deterministic and comparable to previous runs.

Metadata
Generated at	2024-09-21 10:20:49.413659256 UTC
Max. memory units	14000000
Max. CPU units	10000000000
Max. tx size (kB)	16384

Script summary

Name	Hash	Size (Bytes)
νInitial	2fac819a1f4f14e29639d1414220d2a18b6abd6b8e444d88d0dda8ff	3799
νCommit	2043a9f1a685bcf491413a5f139ee42e335157c8c6bc8d9e4018669d	1743
νHead	2ee477c60839936be49a50030690865b5bed4db8cd2f05bf255ac680	10068
μHead	a1610f6e64843161f4a88229c0286176f5325de3e2f773eec2b1d818*	4508
νDeposit	c2117fd9ebdee3e96b81fd67ff7092d638926415c10f1f7e5a267ad0	2791

The minting policy hash is only usable for comparison. As the script is parameterized, the actual script is unique per head.

`Init` transaction costs

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	5094	5.75	2.27	0.44
2	5297	7.09	2.80	0.46
3	5496	8.56	3.39	0.49
5	5902	11.12	4.39	0.53
10	6906	18.11	7.16	0.65
57	16355	82.81	32.75	1.77

`Commit` transaction costs

This uses ada-only outputs for better comparability.

UTxO	Tx size	% max Mem	% max CPU	Min fee ₳
1	567	10.52	4.15	0.29
2	759	13.86	5.65	0.34
3	944	17.33	7.20	0.38
5	1322	24.65	10.44	0.48
10	2253	45.22	19.36	0.75
20	4123	95.99	40.76	1.40

`CollectCom` transaction costs

Parties	UTxO (bytes)	Tx size	% max Mem	% max CPU	Min fee ₳
1	57	560	22.14	8.66	0.42
2	114	671	33.89	13.40	0.55
3	170	786	46.27	18.50	0.69
4	227	893	62.56	25.17	0.88
5	284	1004	78.06	31.64	1.05
6	337	1116	93.57	38.25	1.23

Cost of Decrement Transaction

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	632	17.95	7.88	0.38
2	785	19.09	9.07	0.40
3	982	20.81	10.38	0.44
5	1350	26.05	13.92	0.52
10	2105	32.64	20.08	0.65
50	8036	99.81	75.38	1.85

`Close` transaction costs

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	644	20.03	8.99	0.41
2	809	21.53	10.43	0.44
3	946	23.03	11.85	0.46
5	1373	26.98	15.60	0.54
10	1888	33.48	21.90	0.66
50	8021	96.90	82.84	1.88

`Contest` transaction costs

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	683	25.86	11.12	0.47
2	798	27.69	12.65	0.50
3	996	29.72	14.45	0.54
5	1401	34.20	18.23	0.62
10	2082	43.45	26.07	0.78
40	6607	98.74	74.17	1.77

`Abort` transaction costs

There is some variation due to the random mixture of initial and already committed outputs.

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	4971	17.47	7.59	0.56
2	5061	24.98	10.81	0.65
3	5166	37.83	16.52	0.80
4	5285	55.83	24.61	1.01
5	5444	74.54	33.04	1.23
6	5620	91.81	40.80	1.43

`FanOut` transaction costs

Involves spending head output and burning head tokens. Uses ada-only UTxO for better comparability.

Parties	UTxO	UTxO (bytes)	Tx size	% max Mem	% max CPU	Min fee ₳
5	0	0	4934	7.89	3.34	0.46
5	1	57	4968	9.02	4.05	0.47
5	5	284	5103	13.15	6.73	0.53
5	10	570	5275	19.01	10.37	0.61
5	20	1137	5611	30.52	17.57	0.77
5	30	1707	5954	41.25	24.44	0.92
5	40	2278	6295	53.17	31.82	1.09
5	50	2844	6631	64.51	38.94	1.24
5	81	4612	7685	99.47	60.99	1.73

End-to-end benchmark results

This page is intended to collect the latest end-to-end benchmark results produced by Hydra's continuous integration (CI) system from the latest master code.

Please note that these results are approximate as they are currently produced from limited cloud VMs and not controlled hardware. Rather than focusing on the absolute results, the emphasis should be on relative results, such as how the timings for a scenario evolve as the code changes.

Generated at 2024-09-21 10:23:02.463197046 UTC

Baseline Scenario

Number of nodes	1
Number of txs	3000
Avg. Confirmation Time (ms)	4.184044471
P99	6.971361249999921ms
P95	4.57936845ms
P50	3.7388875ms
Number of Invalid txs	0

Three local nodes

Number of nodes	3
Number of txs	9000
Avg. Confirmation Time (ms)	21.969437453
P99	107.73078102000024ms
P95	28.43345175ms
P50	19.708637000000003ms
Number of Invalid txs	0

flake.nix

github-actions · 2024-09-19T15:19:26Z

Test Results

503 tests ±0 497 ✅ ±0 20m 14s ⏱️ -6s
160 suites ±0 6 💤 ±0
7 files ±0 0 ❌ ±0

Results for commit a3fa77a. ± Comparison against base commit d5729d3.

♻️ This comment has been updated with latest results.

noonio · 2024-09-19T15:28:05Z

hydra-node/test/Hydra/JSONSchemaSpec.hs

        validateJSON "does-not-matter.json" id Null
-          `shouldThrow` exceptionContaining @IOException "installed"
+          `shouldThrow` exceptionContaining @IOException ""


why is this "installed" removed?

Actually no idea. It throws, but it's a different message in the VM apparently.

What's the different message / common denominator to assert for then? Asserting that it contains "" is degenerating this statement.

vm-test-run-hydra-node> 1) Hydra.JSONSchema, validateJSON withJsonSpecifications, fails with missing tool vm-test-run-hydra-node> predicate failed on expected exception: IOException vm-test-run-hydra-node> does-not-matter.json: withBinaryFile: does not exist (No such file or directory)

Ah so this is indeed not testing what it should (it "fails with missing tool").

So before this would fail with a tool missing to be installed error, but seems like withClearedPATH behaves different in the nixos vm test?

Not saying this is best test ever and maybe we should rather not test this at all, but setting this to "" certainly is not the solution here.

noonio · 2024-09-19T15:37:39Z

I see this error locally:

       >   test/Hydra/API/ServerSpec.hs:150:5:
       >   1) Hydra.API.Server echoes history (past outputs) to client upon reconnection
       >        uncaught exception: RunServerException
       >        RunServerException {ioException = Network.Socket.bind: resource busy (Address already in use), host = 127.0.0.1, port = 42303}
       >        (after 67 tests)
       >
       >   To rerun use: --match "/Hydra.API.Server/echoes history (past outputs) to client upon reconnection/"
       >
       > Randomized with seed 1445254403

and, with extreme sadness, I note that this is not the error I see in the CI here.

noonio · 2024-09-19T15:41:54Z

I think, at least, if we're going to go down this path we need to hard-code the seeds into the tests; otherwise we're just going to be in a very strange world of non-reproducible error hell.

That said; getting this error does make me feel a bit mixed on even this approach; given that the tests are now run a bit non-idiomatically with how development is done; meaning that if I want to replicate that exact error I either have to adjust the nix derivation (terrible) or build the cabal project and run that way (but, of course, I expect I won't get that error again in that case).

locallycompact · 2024-09-19T18:03:02Z

I see this error locally:

       >   test/Hydra/API/ServerSpec.hs:150:5:
       >   1) Hydra.API.Server echoes history (past outputs) to client upon reconnection
       >        uncaught exception: RunServerException
       >        RunServerException {ioException = Network.Socket.bind: resource busy (Address already in use), host = 127.0.0.1, port = 42303}
       >        (after 67 tests)
       >
       >   To rerun use: --match "/Hydra.API.Server/echoes history (past outputs) to client upon reconnection/"
       >
       > Randomized with seed 1445254403

and, with extreme sadness, I note that this is not the error I see in the CI here.

This is a normal flakey test.

flake.nix

noonio · 2024-09-20T09:05:59Z

Thanks for putting this up as a talking point :) It's good to see.

I think ultimately I'm against this style of testing here. Here's my reasoning:

Flakeyness/Reproducibility: As discussed, the randomness is hidden a bit. We need to hard-code seeds if we did adopt this, just so it's reproducible.
Development incompatibility: I think we've decided on using cabal-style building in the devShell; not least because it allows non-nix people to participate (if they wish); but it also is just better for incremental work (LSP, etc, etc). The test failures we get here can't be reproduced directly without re-invoking the vm tests locally, which I think is unacceptably slow. Relatedly, it makes the CI-kind of workflow quite different to the local workflow. I think this is an antipattern, honestly, because it means we need to maintain two styles always.
Resourcey-quirks: As pointed out, it's a bit weird that we must hardcode RAM and CPU concerns into the tests. This results in poor allocation in different build environments; which if people are using the tests in CI and in dev, is very unfortunate. There seems to be no escaping this here in this style.
Sandbox hacking/test "isolation" awareness: It seems very very unfortunate to me that we need to remember facts about the sandbox; and think about if our tests use up resources or not. On the one hand, I find this maybe even useful - It's a good reminder about which tests might do this. But on the other hand, it feels very annoying, in that if I accidentally make a test do that, it'll fail with some weird error that requires arcane knowledge to resolve ("oh yeah, you needed to update this thing way over here to say --sandbox false"). And, this isn't something you'll notice locally; you'll have to wait a whole CI cycle for it, at least.

Overall, I feel like commiting to a "CI as the developer runs it" style of work seems very reasonable; and moreoever just a bit easier to think about day-to-day.

Some avenues I can think of for exploring this:

IOGX looks very interesting; I saw it in action over here: plutus-accumulator
Use the ./ci folder of scripts approach (I can't find a good article, but I've seen a few projects do this). The technique here is that the GitHub action workflows are very miniimal, and just call out to some (well-defined) scripts of this kind. Then, it's much easier to run it locally if you wish.
Perhaps look at Nickel ; this is just a project that, I think, has a pretty nice/complex approach to Nixification; for example here's the nix infra stuff for the self-hosted runners. I haven't looked too closely; but maybe there's something to learn.
Garnix - what does it do? Do we need vmTests for it?
nixbuild - you mentioned this one; is it any good?
Just try and simplify our GitHub actions generally. I think @ch1bo is investigating this, partially. Maybe this will make things a bit more easy to reason about.
Use Hydra?! I think this is related to IOGX; I believe we used to use it, but it was too slow (or something?); maybe it's improved?

Overall, though, I think the pain of vmTests just doesn't make sense here, for our kind of "day-to-day" workflow. I think there could definitely be a place for the vmTests that would, say, set up a full hydra node environment and test some set of transactions (i.e. it would serve as a bit of a demo of how to build and run a hydra node from scratch, or something? not sure exactly, but could be fun to think about); but that should be additional, not replacing what we have. I mean, I can see that this PR only adds tests, so in some sense it's fine, but I just don't want it to replace our existing testing approach, merely just add something that is additionally useful.

What are your thoughts?

locallycompact · 2024-09-20T09:57:34Z

Flakeyness/Reproducibility: As discussed, the randomness is hidden a bit. We need to hard-code seeds if we did adopt this, just so it's reproducible.

This doesn't bother me. Haskell derivations aren't reproducible either. The flakeyness of test depth hasn't shown up here either. The flakeyness we saw on this and the other branch were due to race conditions which are randomly present and we should just drop those tests.

Development incompatibility: I think we've decided on using cabal-style building in the devShell; not least because it allows non-nix people to participate (if they wish); but it also is just better for incremental work (LSP, etc, etc). The test failures we get here can't be reproduced directly without re-invoking the vm tests locally, which I think is unacceptably slow. Relatedly, it makes the CI-kind of workflow quite different to the local workflow. I think this is an antipattern, honestly, because it means we need to maintain two styless always.

This is an extremely normal practice that I am used to. Nobody develops by building haskell applications as a derivation, everyone uses cabal in a shell, but we use derivations in CI because they finish instantly in the case of a no-op.

Resourcey-quirks: As pointed out, it's a bit weird that we must hardcode RAM and CPU concerns into the tests. This results in poor allocation in different build environments; which if people are using the tests in CI and in dev, is very unfortunate. There seems to be no escaping this here in this style.

This is odd but doesn't bother me either. It's OK to allocate one core per spun up service within the machine, more or less.

Sandbox hacking/test "isolation" awareness: It seems very very unfortunate to me that we need to remember facts about the sandbox; and think about if our tests use up resources or not. On the one hand, I find this maybe even useful - It's a good reminder about which tests might do this. But on the other hand, it feels very annoying, in that if I accidentally make a test do that, it'll fail with some weird error that requires arcane knowledge to resolve ("oh yeah, you needed to update this thing way over here to say --sandbox false"). And, this isn't something you'll notice locally; you'll have to wait a whole CI cycle for it, at least.

This is a good thing. It highlights to us that we should put tests that require the internet in a separate test entirely, and put tests that don't in a non-sandboxed VM so we get instant finality on those deriviations.

Overall, I feel like commiting to a "CI as the developer runs it" style of work seems very reasonable; and moreoever just a bit easier to think about day-to-day.

I think this is an anti-pattern. CI is a massive sledgehammer that you should not be running on a 2-second feedback loop. It's something I would use as a checkpoint every 30 minutes to say "What does CI have to say?", or if I want to link a specific error.

Some avenues I can think of for exploring this:

IOGX looks very interesting; I saw it in action over here: plutus-accumulator

Expect hydraJobs, which requires derivations.

Use the ./ci folder of scripts approach (I can't find a good article, but I've seen a few projects do this). The technique here is that the GitHub action workflows are very miniimal, and just call out to some (well-defined) scripts of this kind. Then, it's much easier to run it locally if you wish.

I do this with Haskell scripts sometimes as well, but derivations are better.

Perhaps look at Nickle ; this is just a project that, I think, has a pretty nice/complex approach to Nixification; for example here's the nix infra stuff for the self-hosted runners. I haven't looked too closely; but maybe there's something to learn.

Not sure how this helps us.

Garnix - what does it do? Do we need vmTests for it?

Anything nix will expect derivations.

nixbuild - you mentioned this one; is it any good?

Requires derivations.

Just try and simplify our GitHub actions generally. I think @ch1bo is investigating this, partially. Maybe this will make things a bit more easy to reason about.

Ideally I would have one github pipeline that consumes the flake and renders it in individual jobs like gitlab dynamic pipeline.

Use Hydra?! I think this is related to IOGX; I believe we used to use it, but it was too slow (or something?); maybe it's improved?

Requires derivations.

What are your thoughts?

My final thought is that I would actually use this locally. Running cabal test all jams up ports so I don't use it. I test specific packages with cabal. But when finalising the branch, I want to use nix flake check so that anything that is done doesn't even show up in the terminal a second time - I just get left with the derivations that are still a problem. VM tests are annoyingly un-granular, but that can be improved if we commit to the style.

noonio · 2024-09-20T10:33:13Z

This doesn't bother me. Haskell derivations aren't reproducible either. The flakeyness of test depth hasn't shown up here either.
The flakeyness we saw on this and the other branch were due to race conditions which are randomly present and we should
just drop those tests.

It has shown up as I myself was testing out this very commit; see my comment earlier in the thread! :)

This is odd but doesn't bother me either. It's OK to allocate one core per spun up service
within the machine, more or less.

I don't see that it is, in general, because of the poor use of resources. Recalling that the central idea of this issue was speeding things up; this doesn't accomplish that, in general; i.e. it requires care and curation, which is difficult and time-consuming.

This is a good thing. It highlights to us that we should put tests that require the internet in a separate test entirely, and put
tests that don't in a non-sandboxed VM so we get instant finality on those deriviations.

I can be convinced; but how do we make it easy to make this consistent and fast between CI and local dev, so we're not maintaining two different ways of doing the same tests?

re: "Some avenues I can think of for exploring this: ..."

What I was hoping to see is an exploration of how some of these approaches would help our central goal - faster CI.

My final thought is that I would actually use this locally.

I think if it's useful for you it's fine to add these extra derivations; but I just don't want to switch our CI to it without doing some more investigations in other ways to speed up the CI ( see above comment ) and then, if we do decide this is best, carefully resolving the problems with the seed/randomness, and the dual-maintenance/dev problems.

Maybe to help make some progress here; do you have some example projects out there that use VM tests really nicely? Would be great to see/learn from!

locallycompact · 2024-09-20T12:29:52Z

There are lots of examples in nixpkgs itself.

https://github.com/search?q=repo%3ANixOS%2Fnixpkgs+makeTest&type=code

.github/workflows/ci-nix.yaml

ch1bo · 2024-09-20T14:44:45Z

Development incompatibility: I think we've decided on using cabal-style building in the devShell; not least because it allows non-nix people to participate (if they wish); but it also is just better for incremental work (LSP, etc, etc).

I would like to second this. Allowing contributions without needing nix for typical development workflows is a great trait to retain for the project (is it currently true?)

(Noon) Overall, I feel like commiting to a "CI as the developer runs it" style of work seems very reasonable; and moreoever just a bit easier to think about day-to-day.

(Daniel) I think this is an anti-pattern. CI is a massive sledgehammer that you should not be running on a 2-second feedback loop. It's something I would use as a checkpoint every 30 minutes to say "What does CI have to say?", or if I want to link a specific error.

I think your points are orthogonal to each other. From my viewpoint, I would like to have continuous integration workflow to

be similar to what I do in development, but only on a longer feedback loop. That means,
- low maintenance
- reproducible
- complete (e.g. in development I might not run all tests)
ensure things do work also in production where our development workflow differs to what we do in production. Examples:
- build static binaries / for other platforms / package things into docker
- assemble code docs with the website
- benchmarks on specific hardware
- multi-machine integration tests
- ...

For 1, IMO the current state is too much nix already (see above)
For 2, IMO we are doing too much in github actions yaml (could be easier with nix?)

Overall, though, I think the pain of vmTests just doesn't make sense here, for our kind of "day-to-day" workflow. I think there could definitely be a place for the vmTests

I think any kind of fault testing that requires full machines (instead of containers), e.g. using jepsen would be a nice use case for orchestrated virtual machines?

locallycompact force-pushed the lc/qemu-nix-tests branch from f057c61 to 8ff741e Compare September 19, 2024 15:07

ch1bo requested changes Sep 19, 2024

View reviewed changes

flake.nix Outdated Show resolved Hide resolved

flake.nix Outdated Show resolved Hide resolved

flake.nix Outdated Show resolved Hide resolved

noonio reviewed Sep 19, 2024

View reviewed changes

locallycompact force-pushed the lc/qemu-nix-tests branch 3 times, most recently from 0d4497d to 6c363c1 Compare September 19, 2024 17:57

locallycompact force-pushed the lc/qemu-nix-tests branch 7 times, most recently from 0e1885e to e2bdc27 Compare September 19, 2024 20:08

ch1bo reviewed Sep 20, 2024

View reviewed changes

flake.nix Show resolved Hide resolved

locallycompact force-pushed the lc/qemu-nix-tests branch from e2bdc27 to 510b894 Compare September 20, 2024 08:36

locallycompact force-pushed the lc/qemu-nix-tests branch from 510b894 to 51e8148 Compare September 20, 2024 09:28

locallycompact force-pushed the lc/qemu-nix-tests branch from 51e8148 to 8071acb Compare September 20, 2024 14:19

ch1bo reviewed Sep 20, 2024

View reviewed changes

.github/workflows/ci-nix.yaml Show resolved Hide resolved

locallycompact force-pushed the lc/qemu-nix-tests branch 2 times, most recently from a0a57ee to 611e4a6 Compare September 20, 2024 15:06

locallycompact force-pushed the lc/qemu-nix-tests branch from 611e4a6 to 961f844 Compare September 20, 2024 15:36

Add qemu style nix checks for hydra-cluster, hydra-node, hydra-tui

a3fa77a

locallycompact force-pushed the lc/qemu-nix-tests branch from 961f844 to a3fa77a Compare September 21, 2024 10:17

ch1bo assigned locallycompact Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add qemu style nix checks for hydra-cluster, hydra-node, hydra-tui #1647

Add qemu style nix checks for hydra-cluster, hydra-node, hydra-tui #1647

locallycompact commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading

noonio Sep 19, 2024

locallycompact Sep 19, 2024

ch1bo Sep 20, 2024

locallycompact Sep 21, 2024

ch1bo Sep 23, 2024

noonio commented Sep 19, 2024 •

edited

Loading

noonio commented Sep 19, 2024

locallycompact commented Sep 19, 2024

noonio commented Sep 20, 2024 •

edited

Loading

locallycompact commented Sep 20, 2024

noonio commented Sep 20, 2024 •

edited

Loading

locallycompact commented Sep 20, 2024

ch1bo commented Sep 20, 2024

Add qemu style nix checks for hydra-cluster, hydra-node, hydra-tui #1647

Are you sure you want to change the base?

Add qemu style nix checks for hydra-cluster, hydra-node, hydra-tui #1647

Conversation

locallycompact commented Sep 19, 2024 • edited Loading

github-actions bot commented Sep 19, 2024 • edited Loading

Transaction costs

Script summary

Init transaction costs

Commit transaction costs

CollectCom transaction costs

Cost of Decrement Transaction

Close transaction costs

Contest transaction costs

Abort transaction costs

FanOut transaction costs

End-to-end benchmark results

Baseline Scenario

Three local nodes

github-actions bot commented Sep 19, 2024 • edited Loading

Test Results

noonio Sep 19, 2024

Choose a reason for hiding this comment

locallycompact Sep 19, 2024

Choose a reason for hiding this comment

ch1bo Sep 20, 2024

Choose a reason for hiding this comment

locallycompact Sep 21, 2024

Choose a reason for hiding this comment

ch1bo Sep 23, 2024

Choose a reason for hiding this comment

noonio commented Sep 19, 2024 • edited Loading

noonio commented Sep 19, 2024

locallycompact commented Sep 19, 2024

noonio commented Sep 20, 2024 • edited Loading

locallycompact commented Sep 20, 2024

noonio commented Sep 20, 2024 • edited Loading

locallycompact commented Sep 20, 2024

ch1bo commented Sep 20, 2024

locallycompact commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading

`Init` transaction costs

`Commit` transaction costs

`CollectCom` transaction costs

`Close` transaction costs

`Contest` transaction costs

`Abort` transaction costs

`FanOut` transaction costs

github-actions bot commented Sep 19, 2024 •

edited

Loading

noonio commented Sep 19, 2024 •

edited

Loading

noonio commented Sep 20, 2024 •

edited

Loading

noonio commented Sep 20, 2024 •

edited

Loading