Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(client): use process_output and process_multiple_input #1794

Merged
merged 3 commits into from
Apr 9, 2024

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Apr 7, 2024

neqo_transport::Connection offers 4 process methods:

  • process
  • process_output
  • process_input
  • process_multiple_input

Where process is a wrapper around process_input and process_output calling both in sequence.

/// Process input and generate output.
#[must_use = "Output of the process function must be handled"]
pub fn process(&mut self, dgram: Option<&Datagram>, now: Instant) -> Output {
if let Some(d) = dgram {
self.input(d, now, now);
self.process_saved(now);
}
self.process_output(now)
}

Where process_input delegates to process_multiple_input.

/// Process new input datagrams on the connection.
pub fn process_input(&mut self, d: &Datagram, now: Instant) {
self.process_multiple_input(iter::once(d), now);
}
/// Process new input datagrams on the connection.
pub fn process_multiple_input<'a, I>(&mut self, dgrams: I, now: Instant)
where
I: IntoIterator<Item = &'a Datagram>,
I::IntoIter: ExactSizeIterator,
{
let dgrams = dgrams.into_iter();
if dgrams.len() == 0 {
return;
}
for d in dgrams {
self.input(d, now, now);
}
self.process_saved(now);
self.streams.cleanup_closed_streams();
}

Previously neqo-client would use process only. Thus continuously interleaving output and input. Say neqo-client would have multiple datagrams buffered through a GRO read, it could potentially have to do a write in between each process calls, as each call to process with an input datagram might return an output datagram to be written.

With this commit neqo-client uses process_output and process_multiple_input directly, thus reducing interleaving on batch reads (GRO and in the future recvmmsg) and in the future batch writes (GSO and sendmmsg).

By using process_multiple_input instead of process or process_input, auxiliarry logic, like self.cleanup_closed_streams only has to run per input datagram batch, and not for each input datagram.

Extracted from #1741.

`neqo_transport::Connection` offers 4 process methods:

- `process`
- `process_output`
- `process_input`
- `process_multiple_input`

Where `process` is a wrapper around `process_input` and `process_output` calling
both in sequence.

https://github.com/mozilla/neqo/blob/5dfe106669ccb695187511305c21b8e8a8775e91/neqo-transport/src/connection/mod.rs#L1099-L1107

Where `process_input` delegates to `process_multiple_input`.

https://github.com/mozilla/neqo/blob/5dfe106669ccb695187511305c21b8e8a8775e91/neqo-transport/src/connection/mod.rs#L979-L1000

Previously `neqo-client` would use `process` only. Thus continuously
interleaving output and input. Say `neqo-client` would have multiple datagrams
buffered through a GRO read, it could potentially have to do a write in between
each `process` calls, as each call to `process` with an input datagram might
return an output datagram to be written.

With this commit `neqo-client` uses `process_output` and `process_multiple_input`
directly, thus reducing interleaving on batch reads (GRO and in the future
recvmmsg) and in the future batch writes (GSO and sendmmsg).

By using `process_multiple_input` instead of `process` or `process_input`,
auxiliarry logic, like `self.cleanup_closed_streams` only has to run per input
datagram batch, and not for each input datagram.

Extracted from mozilla#1741.
@mxinden mxinden changed the title refactor(client): use process_output and process_input refactor(client): use process_output and process_multiple_input Apr 7, 2024
Copy link

codecov bot commented Apr 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.12%. Comparing base (5dfe106) to head (e32cf8b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1794      +/-   ##
==========================================
- Coverage   93.12%   93.12%   -0.01%     
==========================================
  Files         116      116              
  Lines       36097    36095       -2     
==========================================
- Hits        33614    33612       -2     
  Misses       2483     2483              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Apr 7, 2024

Benchmark results

Performance differences relative to 5dfe106.

  • coalesce_acked_from_zero 1+1 entries
    time: [190.57 ns 190.94 ns 191.35 ns]
    change: [-4.7791% -4.3920% -3.9451%] (p = 0.00 < 0.05)
    💚 Performance has improved.

  • coalesce_acked_from_zero 3+1 entries
    time: [234.45 ns 235.10 ns 235.77 ns]
    change: [-2.3259% -1.4705% -0.1340%] (p = 0.01 < 0.05)
    Change within noise threshold.

  • coalesce_acked_from_zero 10+1 entries
    time: [234.10 ns 234.89 ns 235.85 ns]
    change: [-2.7054% -2.1462% -1.6020%] (p = 0.00 < 0.05)
    💚 Performance has improved.

  • coalesce_acked_from_zero 1000+1 entries
    time: [216.01 ns 216.17 ns 216.36 ns]
    change: [-7.6881% -3.5269% -1.0068%] (p = 0.05 < 0.05)
    💚 Performance has improved.

  • RxStreamOrderer::inbound_frame()
    time: [118.27 ms 118.36 ms 118.46 ms]
    change: [-0.8168% -0.7127% -0.5985%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • transfer/Run multiple transfers with varying seeds
    time: [121.99 ms 122.28 ms 122.57 ms]
    thrpt: [32.634 MiB/s 32.711 MiB/s 32.788 MiB/s]
    change:
    time: [+1.9254% +2.2538% +2.5948%] (p = 0.00 < 0.05)
    thrpt: [-2.5291% -2.2041% -1.8890%]
    Change within noise threshold.

  • transfer/Run multiple transfers with the same seed
    time: [122.53 ms 122.71 ms 122.89 ms]
    thrpt: [32.550 MiB/s 32.598 MiB/s 32.646 MiB/s]
    change:
    time: [+1.4874% +1.6781% +1.8766%] (p = 0.00 < 0.05)
    thrpt: [-1.8420% -1.6504% -1.4656%]
    Change within noise threshold.

  • 1-conn/1-100mb-resp (aka. Download)/client
    time: [1.0894 s 1.1168 s 1.1462 s]
    thrpt: [87.248 MiB/s 89.543 MiB/s 91.792 MiB/s]
    change:
    time: [-1.4535% +2.3258% +5.9524%] (p = 0.27 > 0.05)
    thrpt: [-5.6180% -2.2729% +1.4749%]
    No change in performance detected.

  • 1-conn/10_000-1b-seq-resp (aka. RPS)/client
    time: [388.41 ms 391.83 ms 395.20 ms]
    thrpt: [25.304 Kelem/s 25.522 Kelem/s 25.746 Kelem/s]
    change:
    time: [-0.1760% +0.8961% +2.0684%] (p = 0.13 > 0.05)
    thrpt: [-2.0264% -0.8881% +0.1763%]
    No change in performance detected.

  • 100-seq-conn/1-1b-resp (aka. HPS)/client
    time: [3.6314 s 3.6334 s 3.6354 s]
    thrpt: [27.507 elem/s 27.522 elem/s 27.537 elem/s]
    change:
    time: [+7.3942% +7.5098% +7.6268%] (p = 0.00 < 0.05)
    thrpt: [-7.0863% -6.9853% -6.8851%]
    💔 Performance has regressed.

Client/server transfer results

Transfer of 134217728 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 746.9 ± 307.5 381.0 1345.0 1.00
neqo msquic reno on 1141.7 ± 342.0 780.7 1792.2 1.00
neqo msquic reno 1075.2 ± 200.6 845.8 1411.5 1.00
neqo msquic cubic on 1042.5 ± 213.0 816.5 1386.7 1.00
neqo msquic cubic 1002.9 ± 176.0 787.5 1208.7 1.00
msquic neqo reno on 3452.5 ± 251.8 3290.2 3983.7 1.00
msquic neqo reno 3436.3 ± 252.0 3248.8 3803.9 1.00
msquic neqo cubic on 3361.1 ± 225.5 3168.7 3768.9 1.00
msquic neqo cubic 3365.0 ± 239.3 3223.8 3821.0 1.00
neqo neqo reno on 3243.7 ± 248.7 2931.1 3541.6 1.00
neqo neqo reno 3058.6 ± 177.8 2930.2 3486.5 1.00
neqo neqo cubic on 3325.8 ± 332.7 3090.4 4094.3 1.00
neqo neqo cubic 3209.2 ± 274.8 3054.3 3977.2 1.00

⬇️ Download logs

@mxinden
Copy link
Collaborator Author

mxinden commented Apr 7, 2024

  • 100-seq-conn/1-1b-resp (aka. HPS)/client
    time: [3.6314 s 3.6334 s 3.6354 s]
    thrpt: [27.507 elem/s 27.522 elem/s 27.537 elem/s]
    change:
    time: [+7.3942% +7.5098% +7.6268%] (p = 0.00 < 0.05)
    thrpt: [-7.0863% -6.9853% -6.8851%]
    💔 Performance has regressed.

Regression here should be addressed once #1795 is merged.


Today, neqo-client and neqo-server only do GRO, not GSO, sendmmsg, nor recvmmsg (all 3 will come with #1741). Thus the optimization of this pull request is not yet visible in the benchmarks.

That said, when benchmarking neqo-client against msquic as a server, we see a 2x improvement:

main branch

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 693.3 ± 256.7 448.7 1287.7 1.00
neqo msquic reno on 2229.4 ± 316.3 1904.6 2694.4 1.00
neqo msquic reno 2075.1 ± 203.2 1876.3 2442.8 1.00
neqo msquic cubic on 2053.5 ± 254.7 1818.8 2554.8 1.00
neqo msquic cubic 2037.2 ± 205.5 1881.1 2441.2 1.00

https://github.com/mozilla/neqo/actions/runs/8565335284

client-process-input (this) branch
Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 746.9 ± 307.5 381.0 1345.0 1.00
neqo msquic reno on 1141.7 ± 342.0 780.7 1792.2 1.00
neqo msquic reno 1075.2 ± 200.6 845.8 1411.5 1.00
neqo msquic cubic on 1042.5 ± 213.0 816.5 1386.7 1.00
neqo msquic cubic 1002.9 ± 176.0 787.5 1208.7 1.00

https://github.com/mozilla/neqo/actions/runs/8588223826

@mxinden mxinden marked this pull request as ready for review April 7, 2024 16:39
Copy link
Collaborator

@larseggert larseggert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to refactor http09.rs and http3.rs to further reduce code duplication in process_output and process_multiple_input?

@mxinden
Copy link
Collaborator Author

mxinden commented Apr 8, 2024

Is it possible to refactor http09.rs and http3.rs

Yes. I have a couple of ideas. Thought not directly related to this pull request.

to further reduce code duplication in process_output and process_multiple_input?

What duplication are you referring to @larseggert? The http09.rs process_* and http3.rs process_* are each one-liners, delegating to neqo_transport::Connection and neqo_http3::Http3Client respectively.

@larseggert
Copy link
Collaborator

Can we have one one-liner? :-)

@mxinden
Copy link
Collaborator Author

mxinden commented Apr 8, 2024

I think I am missing something. How would that be possible?

Long term I have in mind for there to be a single process method, that takes both a batch of input and output datagrams. Though that is still a long way to go and likely many iterations away.

fn process<'a>(&mut self, input: impl Iterator<Item = &'a Datagram>, output: impl Iterator<Item = &'a mut Datagram>, now: Instant) -> Option<Duration>

See #1693 for a rough overview.

@larseggert larseggert added this pull request to the merge queue Apr 9, 2024
Merged via the queue into mozilla:main with commit 5f9c3e7 Apr 9, 2024
27 of 28 checks passed
mxinden added a commit to mxinden/neqo that referenced this pull request May 4, 2024
There are two server implementations based on neqo:

1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server
  - http3 and http09 implementation
  - used for manual testing and QUIC Interop

2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs
  - used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O,
event loop, ... Since then, the two implementations diverged significantly.
Especially (1) saw a lot of improvements in recent months:

- mozilla#1564
- mozilla#1569
- mozilla#1578
- mozilla#1581
- mozilla#1604
- mozilla#1612
- mozilla#1676
- mozilla#1692
- mozilla#1707
- mozilla#1708
- mozilla#1727
- mozilla#1753
- mozilla#1756
- mozilla#1766
- mozilla#1772
- mozilla#1786
- mozilla#1787
- mozilla#1788
- mozilla#1794
- mozilla#1806
- mozilla#1808
- mozilla#1848
- mozilla#1866

At this point, bugs in (2) are hard to fix, see e.g.
mozilla#1801.

This commit merges (2) into (1), thus removing all duplicate logic and
having (2) benefit from all the recent improvements to (1).
KershawChang pushed a commit to KershawChang/neqo that referenced this pull request May 7, 2024
There are two server implementations based on neqo:

1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server
  - http3 and http09 implementation
  - used for manual testing and QUIC Interop

2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs
  - used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O,
event loop, ... Since then, the two implementations diverged significantly.
Especially (1) saw a lot of improvements in recent months:

- mozilla#1564
- mozilla#1569
- mozilla#1578
- mozilla#1581
- mozilla#1604
- mozilla#1612
- mozilla#1676
- mozilla#1692
- mozilla#1707
- mozilla#1708
- mozilla#1727
- mozilla#1753
- mozilla#1756
- mozilla#1766
- mozilla#1772
- mozilla#1786
- mozilla#1787
- mozilla#1788
- mozilla#1794
- mozilla#1806
- mozilla#1808
- mozilla#1848
- mozilla#1866

At this point, bugs in (2) are hard to fix, see e.g.
mozilla#1801.

This commit merges (2) into (1), thus removing all duplicate logic and
having (2) benefit from all the recent improvements to (1).
github-merge-queue bot pushed a commit that referenced this pull request May 8, 2024
* refactor(bin): introduce server/http3.rs and server/http09.rs

The QUIC Interop Runner requires an http3 and http09 implementation for both
client and server. The client code is already structured into an http3 and an
http09 implementation since #1727.

This commit does the same for the server side, i.e. splits the http3 and http09
implementation into separate Rust modules.

* refactor: merge mozilla-central http3 server into neqo-bin

There are two server implementations based on neqo:

1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server
  - http3 and http09 implementation
  - used for manual testing and QUIC Interop

2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs
  - used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O,
event loop, ... Since then, the two implementations diverged significantly.
Especially (1) saw a lot of improvements in recent months:

- #1564
- #1569
- #1578
- #1581
- #1604
- #1612
- #1676
- #1692
- #1707
- #1708
- #1727
- #1753
- #1756
- #1766
- #1772
- #1786
- #1787
- #1788
- #1794
- #1806
- #1808
- #1848
- #1866

At this point, bugs in (2) are hard to fix, see e.g.
#1801.

This commit merges (2) into (1), thus removing all duplicate logic and
having (2) benefit from all the recent improvements to (1).

* Move firefox.rs to mozilla-central

* Reduce HttpServer trait functions

* Extract constructor

* Remove unused deps

* Remove clap color feature

Nice to have. Adds multiple dependencies. Hard to justify for mozilla-central.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants