Add command to benchmark Web Vitals via `web-vitals` library #40

felixarntz · 2023-02-17T01:02:09Z

This introduces a new benchmark-web-vitals command that, similar to the existing benchmark-url command allows making 1 or more requests to 1 or more URLs, capture the results and compute the medians for each metric. However, the notable difference is that this one measures Core Web Vitals and other key client metrics, using https://github.com/GoogleChrome/web-vitals.

As part of this PR, the benchmark-url command is renamed to a more appropriate benchmark-server-timing, to clearly differentiate the two commands and clarify the different purposes. They also function quite differently, as the existing command simply makes a curl request to the URL while the new one here uses a headless browser.

The initial iteration implemented in this PR only includes load time metrics (FCP, LCP, and TTFB) as those are the most critical for us to measure, and it appears that they are easier to implement. An early draft PR for adding other Web Vitals CLS, FID, and INP exists at #41 and can be explored afterwards as an iteration.

Testing

You can test this command by following the instructions included in this PR (see https://github.com/GoogleChromeLabs/wpp-research/tree/add/benchmark-web-vitals/cli#benchmark-web-vitals), which also provide examples.

…ady documented anyway.

tunetheweb · 2023-02-27T15:18:14Z

cli/commands/benchmark-web-vitals.mjs

+	 * In the future, additional Web Vitals like CLS, FID, and INP should be
+	 * added, however they are slightly more complex to retrieve through an
+	 * automated headless browser test.
+	 * See https://github.com/GoogleChromeLabs/wpp-research/pull/41.


FID and INP are more complex as require an interaction, but the load CLS should be similar to FCP/LCP/TTFB shouldn't it?

Granted, that often isn't the full CLS of a page (as it can be impacted by scroll, or other interactions), but it's still usually a good part of CLS. And that "load CLS" is what Lighthouse and PSI already captures despite it's limitations.

The only thing is that, kind of like LCP, CLS is not emitted by web-vitals.js, until the page is hidden (including navigating away from it). For the web-vitals.js library we dummy this visibility change:
https://github.com/GoogleChrome/web-vitals/blob/main/test/utils/stubVisibilityChange.js

And then call it like this:

// Wait until all images are loaded and rendered, then change to hidden. await imagesPainted(); await stubVisibilityChange('hidden');

Maybe that's enough to make it too complicated here?

The alternative is to use the reportAllChanges flag and only report at the end of the test.

Thanks @tunetheweb, that it great context. I am a bit unsure about CLS being more similar to the load time metrics; based on what you're saying it makes sense, but when I initially tried using all of them, I always got data for FID actually but not CLS.

In any case, I think due to those additional complexities it makes sense to separate the work into more incremental steps, and I think starting with only FCP/LCP/TTFB works well as those are all load time metrics and they are simplest to implement here. #41 covers the follow up work, so I'd prefer to handle CLS, FID, and INP in that (or maybe even broken down into further separated PRs).

Yeah sounds like it’s due to not changing the visibility so CLS not being “finalised”. Which I realised and edited after writing that first sentence (but before submitting the review).

cli/commands/benchmark-web-vitals.mjs

felixarntz · 2023-02-27T19:23:17Z

@oandregal Would you mind giving this PR a review? Have you been able to test the command yet?

vishalkakadiya

I tested the command with all the options, and it is working great. Approving PR now!

10upsimon

I've tested all commands as listed in the branches README, and all are functioning correctly for me. Approved from my end.

cli/commands/benchmark-web-vitals.mjs

mukeshpanchal27

Thanks @felixarntz, Quick review and approve for merge.

oandregal · 2023-03-07T13:44:58Z

@oandregal Would you mind giving this PR a review? Have you been able to test the command yet?

Sorry for the late response. I had a use case in mind and only now was able to create a test case. WordPress/gutenberg#48728 (comment)

This command is very useful, thanks!

Some feedback:

LCP-TTFB.

I miss having data for the LCP-TTFB median based on the of each result instead of subtracting LCP and TTFB medians, as I had to do in the test case above.

Quality check.

This command has shown better data quality than the other approaches I've used. Nice!

I saw you are adding percentiles at #43 and that will be useful! I am still unsure about how best to address the fact that performance analysis contains high variability. For example, using 10 runs, I've seen:

differences up to 15ms between the p25 and p75 for TTFB and 25ms between the p25 and p75 for LCP
differences up to 9ms in the mean of TTFB (123 vs 132)
differences up to 10ms in the mean of LCP (210 vs 220)

The data has more variability than I'd like to. Finding ways to reduce it would be nice. Otherwise, executing many runs (hundreds) is still my preferred mechanism.

Access to the full dataset.

If I have access to the full dataset, I can do analysis after the fact.

felixarntz · 2023-03-08T03:25:45Z

Thanks @oandregal for the feedback!

Regarding LCP - TTFB, that's a great idea to add as a metric, I'll take a note to do that (though I'm AFK for the next few days, so it'll take a couple of days). Feel free to open a PR if you have capacity, but otherwise I'll get to it once I'm back.

For the variability, are those ms differences you're sharing between different runs of the same command? For example, one time you ran it with 10 runs, you got p25 set to some value, and the other time the p25 was 15ms different? I'm definitely interested in exploring this further to see how we can get those numbers down, but partially of course it will be impossible due to varying machine resources etc. For example I'd be curious how the results differed for the other percentiles in those specific runs you did. Maybe we need to conduct a more detailed analysis on how the results differ between different runs with the tool. Would be great if we could use that to find the "sweet spot" for how many runs to do per each invocation of the command.

felixarntz added 5 commits February 16, 2023 14:27

Rename benchmark-url to benchmark-server-timing.

f3c7047

Remove example link since the benchmark-server-timing command is alre…

d4d2731

…ady documented anyway.

Make getURLs a reusable function.

378f8b6

Install Puppeteer.

a4b2c7b

Implement first working version of benchmark-web-vitals command.

a71c443

felixarntz requested a review from eugene-manuilov February 17, 2023 01:02

felixarntz added 3 commits February 17, 2023 10:40

Make JS import dynamic.

e773bd4

Limit metrics to only load time metrics for now.

1d160f4

Add documentation for new command.

a141164

felixarntz mentioned this pull request Feb 17, 2023

Enhance benchmark-web-vitals command with additional metrics #41

Draft

felixarntz marked this pull request as ready for review February 17, 2023 19:02

felixarntz requested review from 10upsimon, joemcgill and mukeshpanchal27 February 17, 2023 19:03

felixarntz added 2 commits February 17, 2023 11:04

Add link to follow up PR.

52f5570

Add link to follow up PR.

536f4f8

felixarntz requested review from ThierryA, philipwalton and tunetheweb February 17, 2023 19:09

tunetheweb reviewed Feb 27, 2023

View reviewed changes

Clarify comment.

582aaa6

felixarntz mentioned this pull request Feb 28, 2023

Add support for new --show-percentiles arg to get more granular percentiles instead of just median to all commands #43

Merged

vishalkakadiya previously approved these changes Feb 28, 2023

View reviewed changes

10upsimon previously approved these changes Feb 28, 2023

View reviewed changes

10upsimon reviewed Feb 28, 2023

View reviewed changes

cli/commands/benchmark-web-vitals.mjs Outdated Show resolved Hide resolved

Fix grammar mistake in param docs.

afa6239

felixarntz dismissed stale reviews from 10upsimon and vishalkakadiya via afa6239 February 28, 2023 21:05

felixarntz requested a review from 10upsimon February 28, 2023 21:06

felixarntz requested a review from vishalkakadiya February 28, 2023 21:06

10upsimon approved these changes Feb 28, 2023

View reviewed changes

mukeshpanchal27 approved these changes Mar 1, 2023

View reviewed changes

vishalkakadiya approved these changes Mar 1, 2023

View reviewed changes

mukeshpanchal27 merged commit d315f59 into main Mar 1, 2023

oandregal mentioned this pull request Mar 7, 2023

Experiment: disable per-block stylesheet to measure the impact on TTFB and LCP WordPress/gutenberg#48728

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add command to benchmark Web Vitals via `web-vitals` library #40

Add command to benchmark Web Vitals via `web-vitals` library #40

felixarntz commented Feb 17, 2023 •

edited

Loading

tunetheweb Feb 27, 2023

felixarntz Feb 27, 2023

tunetheweb Feb 27, 2023

felixarntz commented Feb 27, 2023

vishalkakadiya left a comment

10upsimon left a comment

mukeshpanchal27 left a comment

oandregal commented Mar 7, 2023 •

edited

Loading

felixarntz commented Mar 8, 2023

Add command to benchmark Web Vitals via web-vitals library #40

Add command to benchmark Web Vitals via web-vitals library #40

Conversation

felixarntz commented Feb 17, 2023 • edited Loading

Testing

tunetheweb Feb 27, 2023

Choose a reason for hiding this comment

felixarntz Feb 27, 2023

Choose a reason for hiding this comment

tunetheweb Feb 27, 2023

Choose a reason for hiding this comment

felixarntz commented Feb 27, 2023

vishalkakadiya left a comment

Choose a reason for hiding this comment

10upsimon left a comment

Choose a reason for hiding this comment

mukeshpanchal27 left a comment

Choose a reason for hiding this comment

oandregal commented Mar 7, 2023 • edited Loading

felixarntz commented Mar 8, 2023

Add command to benchmark Web Vitals via `web-vitals` library #40

Add command to benchmark Web Vitals via `web-vitals` library #40

felixarntz commented Feb 17, 2023 •

edited

Loading

oandregal commented Mar 7, 2023 •

edited

Loading