Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(bin): add criterion benchmarks #1758

Merged
merged 23 commits into from
Mar 27, 2024
Merged

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Mar 18, 2024

Wraps the neqo-client and neqo-server code, starts the server and runs various benchmarks through the client.

Benchmarks:

  • single-request-1gb
  • single-request-1mb
  • requests-per-second
  • handshakes-per-second

It would be great to have end-to-end benchmarks (in CI) to land performance optimizations like #1741 with confidence.

This pull request is a shot at the above using criterion.

Benefits:

  • Little boilerplate, written in Rust.
  • Easy to extend with various benchmarks (e.g. multiple connections, handshakes per second, ...)
  • Leverage features of criterion (though arguably the wrong tool for the job)
  • Hook easily into existing bench.yml workflow including its baseline comparison

Obviously there are a million other ways to do this.

What are people's thoughts? Are there long-term plans for neqo? Are there prominent examples worth taking inspiration from? Does Firefox / Mozilla already have some continuous benchmarking setup worth hooking into?

@larseggert
Copy link
Collaborator

Better performance (regression) testing is definitely something we are interested in having.

One part of this is the building-up of a suite of benchmarks that we run in CI. I envision a process where if we identify a performance issue, we first quantify it through some (micro)benchmarks that from then on become part of the suite. If, based on the benchmark results, we decide that a performance issue needs to be fixed, we do that.

A second part is macro-level benchmarks, e.g., end-to-end throughput and other tests. I haven't spend much time on how we should best do this. At the moment, I feel that some simple end-to-end tests, like you propose here or we have in bench.yml are already sufficient to identify performance issues that we can address using the process in the previous paragraph.

@mxinden mxinden changed the title feat(bin): add single conn 1GB download benchmark perf(bin): add criterion benchmarks Mar 20, 2024
@mxinden
Copy link
Collaborator Author

mxinden commented Mar 20, 2024

Making some progress. Actual numbers are still preliminary.

Benchmarking single-request-1gb/client: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 76.6s.
single-request-1gb/client
                        time:   [8.3150 s 9.7354 s 11.247 s]
                        thrpt:  [91.047 MiB/s 105.18 MiB/s 123.15 MiB/s]
                 change:
                        time:   [-29.289% -14.840% +0.7625%] (p = 0.10 > 0.05)
                        thrpt:  [-0.7568% +17.426% +41.421%]
                        No change in performance detected.

Benchmarking single-request-1mb/client: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.8s or enable flat sampling.
single-request-1mb/client
                        time:   [159.50 ms 162.72 ms 167.38 ms]
                        thrpt:  [5.9743 MiB/s 6.1454 MiB/s 6.2697 MiB/s]
                 change:
                        time:   [+30.929% +53.168% +81.193%] (p = 0.00 < 0.05)
                        thrpt:  [-44.810% -34.712% -23.623%]
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low severe

requests-per-second(10_000)/client
                        time:   [217.67 ms 242.75 ms 269.26 ms]
                        thrpt:  [37.140 Kelem/s 41.194 Kelem/s 45.942 Kelem/s]
                 change:
                        time:   [-13.591% -0.6218% +13.808%] (p = 0.92 > 0.05)
                        thrpt:  [-12.133% +0.6256% +15.729%]
                        No change in performance detected.

Benchmarking handshakes-per-second(100)/client: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 25.0s.
handshakes-per-second(100)/client
                        time:   [1.8661 s 1.9559 s 2.0763 s]
                        thrpt:  [48.162  elem/s 51.127  elem/s 53.587  elem/s]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

@mxinden mxinden force-pushed the bin-bench branch 5 times, most recently from 29bed4c to b532192 Compare March 21, 2024 10:45
Wraps the `neqo-client` and `neqo-server` code, starts the server and runs
various benchmarks through the client.

Benchmarks:
- single-request-1gb
- single-request-1mb
- requests-per-second
- handshakes-per-second
Copy link

github-actions bot commented Mar 23, 2024

Benchmark results

Performance differences relative to efc4813.

  • drain a timer quickly time: [390.60 ns 397.57 ns 404.04 ns]
    change: [-1.9385% -0.4140% +1.1917%] (p = 0.61 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 1+1 entries
    time: [197.26 ns 197.75 ns 198.27 ns]
    change: [-0.3258% +0.0111% +0.3359%] (p = 0.95 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 3+1 entries
    time: [236.87 ns 237.51 ns 238.20 ns]
    change: [-0.0436% +0.3555% +0.7441%] (p = 0.07 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 10+1 entries
    time: [235.38 ns 236.34 ns 237.55 ns]
    change: [-1.2305% -0.6099% -0.0753%] (p = 0.03 < 0.05)
    Change within noise threshold.

  • coalesce_acked_from_zero 1000+1 entries
    time: [213.98 ns 214.23 ns 214.51 ns]
    change: [-1.6919% -0.9648% -0.1586%] (p = 0.01 < 0.05)
    Change within noise threshold.

  • RxStreamOrderer::inbound_frame()
    time: [119.54 ms 119.62 ms 119.71 ms]
    change: [-0.9084% -0.8153% -0.7256%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • transfer/Run multiple transfers with varying seeds
    time: [118.04 ms 118.29 ms 118.53 ms]
    thrpt: [33.747 MiB/s 33.816 MiB/s 33.887 MiB/s]
    change:
    time: [-1.2476% -0.9168% -0.5991%] (p = 0.00 < 0.05)
    thrpt: [+0.6027% +0.9253% +1.2634%]
    Change within noise threshold.

  • transfer/Run multiple transfers with the same seed
    time: [118.99 ms 119.16 ms 119.34 ms]
    thrpt: [33.517 MiB/s 33.567 MiB/s 33.617 MiB/s]
    change:
    time: [-0.6255% -0.4197% -0.2152%] (p = 0.00 < 0.05)
    thrpt: [+0.2157% +0.4215% +0.6294%]
    Change within noise threshold.

  • 1-conn/1-100mb-resp (aka. Download)/client
    time: [1.1177 s 1.1346 s 1.1565 s]
    thrpt: [86.466 MiB/s 88.139 MiB/s 89.469 MiB/s]

  • 1-conn/10_000-1b-seq-resp (aka. RPS)/client
    time: [431.40 ms 433.45 ms 435.49 ms]
    thrpt: [22.963 Kelem/s 23.071 Kelem/s 23.180 Kelem/s]

  • 100-seq-conn/1-1b-resp (aka. HPS)/client
    time: [3.3576 s 3.3600 s 3.3626 s]
    thrpt: [29.739 elem/s 29.762 elem/s 29.784 elem/s]

Client/server transfer results

Transfer of 134217728 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 799.1 ± 259.1 491.5 1201.7 1.00
neqo msquic reno on 2045.5 ± 170.7 1914.0 2463.2 1.00
neqo msquic reno 2085.7 ± 258.0 1900.0 2462.9 1.00
neqo msquic cubic on 2056.1 ± 251.4 1762.2 2490.3 1.00
neqo msquic cubic 2080.4 ± 257.9 1884.8 2453.0 1.00
msquic neqo reno on 4525.2 ± 248.3 4263.5 5123.2 1.00
msquic neqo reno 4363.4 ± 229.5 4093.9 4874.4 1.00
msquic neqo cubic on 4562.0 ± 292.7 4222.7 5148.2 1.00
msquic neqo cubic 4401.6 ± 225.6 4164.1 4834.0 1.00
neqo neqo reno on 3940.2 ± 349.8 3521.2 4634.8 1.00
neqo neqo reno 4162.0 ± 247.8 3718.0 4598.2 1.00
neqo neqo cubic on 4409.7 ± 219.7 4060.8 4789.8 1.00
neqo neqo cubic 4182.1 ± 179.3 3916.7 4492.2 1.00

⬇️ Download logs

It just takes too long on the bench machine
@mxinden mxinden marked this pull request as ready for review March 24, 2024 10:30
Copy link
Collaborator Author

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request is ready for a first review.

neqo-bin/benches/main.rs Show resolved Hide resolved
@larseggert
Copy link
Collaborator

Is there a way to group the benchmarks, so we can pin those where it makes sense to a core, and let others such as this one be scheduled on multiple cores?

Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.05%. Comparing base (47dfb3b) to head (6eaea5c).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1758      +/-   ##
==========================================
- Coverage   93.07%   93.05%   -0.02%     
==========================================
  Files         117      117              
  Lines       36449    36374      -75     
==========================================
- Hits        33926    33849      -77     
- Misses       2523     2525       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mxinden
Copy link
Collaborator Author

mxinden commented Mar 27, 2024

@larseggert mind taking another look?

Copy link
Collaborator

@larseggert larseggert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you break out the renaming of ClientError and ServerError to Error into its own PR? Are there any other chores that should not be part of this one?

@mxinden
Copy link
Collaborator Author

mxinden commented Mar 27, 2024

Could you break out the renaming of ClientError and ServerError to Error into its own PR?

This pull request moves src/bin/server/main.rs to src/server/mod.rs in order for the benchmarks to be able to call fn server(). Thus it moves the ServerError from the main module into the server module. ServerError being in the server module makes clippy complain that the name is duplicated. Thus I renamed ServerError to Error and ClientError to Error.

Long story short, these renames are not unrelated. Do you still want me to separate them out into a separate pull request @larseggert? If so, shall I include the file moving in the separate pull request as well?

Are there any other chores that should not be part of this one?

In general, very much in favor of atomic tightly scoped pull requests. That said, in my eyes all changes are related to the pull request.

@mxinden
Copy link
Collaborator Author

mxinden commented Mar 27, 2024

Latest benchmark run results are now within the same ballpark as when run on my laptop.

  • 1-conn/1-100mb-resp (aka. Download)/client
    time: [1.1177 s 1.1346 s 1.1565 s]
    thrpt: [86.466 MiB/s 88.139 MiB/s 89.469 MiB/s]

  • 1-conn/10_000-1b-seq-resp (aka. RPS)/client
    time: [431.40 ms 433.45 ms 435.49 ms]
    thrpt: [22.963 Kelem/s 23.071 Kelem/s 23.180 Kelem/s]

  • 100-seq-conn/1-1b-resp (aka. HPS)/client
    time: [3.3576 s 3.3600 s 3.3626 s]
    thrpt: [29.739 elem/s 29.762 elem/s 29.784 elem/s]

#1758 (comment)

@larseggert larseggert added this pull request to the merge queue Mar 27, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 27, 2024
@larseggert larseggert added this pull request to the merge queue Mar 27, 2024
Merged via the queue into mozilla:main with commit 36fae62 Mar 27, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants