Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram backed timer #46

Merged
merged 5 commits into from
Apr 30, 2021
Merged

Conversation

avolokhov
Copy link
Contributor

@avolokhov avolokhov commented Apr 19, 2021

Hey @MrLotU,

Thanks for considering this PR. In my project I'm running multiple replicas of my backend service. For a proper monitoring I needed aggregatable timer metrics to get cumulative percentiles across the whole fleet. Here's the solution I came up with. Since the project is in alpha, I decided to start with breaking a few APIs in order to keep the code simple. Please let me know if you think we should approach this differently.

Fixes #44

Checklist

  • [+] The provided tests still run.
  • [+] I've created new tests where needed.
  • [+] I've updated the documentation if necessary.

Motivation and Context

Server side applications are never run in as single instance. Summary backed timer doesn't allow aggregation and therefore doesn't allow to render accumulated latency percentiles across multiple instances. Histogram backed timer provides a solution with it.
Timer implementation is yet another fact that only makes sense for bridging Prometheus to swift-metrics (in addition to Labels that are init parameters in swift-metrics world and call parameters in Prometheus world, and therefore LabelSanitizer). It feels natural to separate non-trivial swift-metrics bridge from Prometheus core, keeping either of them simpler and more testable.

Description

  • MetricsFactory conformance is now a wrapper around PrometheusClient. This allows us to be much more flexible and configurable when bridging swift-metrics api to Prometheus backend (i.e. picking a Timer implementation, setting default quantiles/buckets, in future potentially setting base labels).
  • Introduced configuration object.
  • Introduced Histogram backed Timer implementation.

Copy link
Member

@fabianfett fabianfett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make the API a little more dense?

Sources/Prometheus/PrometheusMetrics.swift Outdated Show resolved Hide resolved
Sources/Prometheus/PrometheusMetrics.swift Outdated Show resolved Hide resolved
Sources/Prometheus/PrometheusMetrics.swift Outdated Show resolved Hide resolved
@grennis
Copy link

grennis commented Apr 21, 2021

+1 for this.

I tried the branch and it seemed to work great. The only problem I saw was that in Vapor using:

        try MetricsSystem.prometheus().collect(into: promise)

gave me Fatal error: leaking promise. But I worked around by using the client I had created:

        client.collect(into: promise)

And this worked.

In any case would love to see this get merged, as providing a histogram backed timer is really a necessity.

@weissi weissi requested a review from MrLotU April 28, 2021 14:05
Copy link
Collaborator

@MrLotU MrLotU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a great patch. Thanks a lot!

I left a few style & nit comments, as well as the request to add timeUnit support to Histogram in the same way Summary supports this.

Other than that, LGTM!

Comment on lines 67 to 69
/// This is a `swift-metrics` timer backed by a Prometheus' `PromHistogram` implementation.
/// This is superior to `Summary` backed timer as `Summary` emits a set of quantiles, which will be impossible to correctly aggregate when one wants to render a percentile for a set of multiple instances.
/// `Histogram` aggregation is possible with some fancy server-side math.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This is a `swift-metrics` timer backed by a Prometheus' `PromHistogram` implementation.
/// This is superior to `Summary` backed timer as `Summary` emits a set of quantiles, which will be impossible to correctly aggregate when one wants to render a percentile for a set of multiple instances.
/// `Histogram` aggregation is possible with some fancy server-side math.

While in general I think it's a good thing to add doc blocks to classes & structs, seeing this is an internal class going along with a handful of other ones without docs, I think we should omit them here too.

Comment on lines 73 to 74
// this class is a lightweight wrapper around heavy prometheus metric type. This class is not cached and each time
// created anew. This allows us to use variable timeUnit without locking.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// this class is a lightweight wrapper around heavy prometheus metric type. This class is not cached and each time
// created anew. This allows us to use variable timeUnit without locking.

Comment on lines 86 to 90
// this is questionable as display unit here affects how the data is stored, and not how it's observed.
// should we delete it and tell preferDisplayUnit is not supported?
func preferDisplayUnit(_ unit: TimeUnit) {
self.timeUnit = unit
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of somewhat supporting displayUnit, instead Histogram should be updated to take a displayUnit into account similar to Summary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be much more difficult as Summary produces aggregates on demand, while Histogram precomputes values when metric is reported. So changing timeUnit in Summary will allow you to produce different time units on the fly, while changing it in Histogram will force you to modify data during the reporting. That's why I didn't do it in a first place: there's simply no way to copy and paste Summary approach into Histogram.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. That's my bad. In that case I would remove the functionality in its entirety. As per swift-metrics:

    /// Set the preferred display unit for this TimerHandler.
    ///
    /// - parameters:
    ///     - unit: A hint to the backend responsible for presenting the data of the preferred display unit. This is not guaranteed to be supported by all backends.


func recordNanoseconds(_ duration: Int64) {
// histogram can't be configured with timeUnits, so we have to modify incoming data
histogram.observe(duration / Int64(timeUnit?.scaleFromNanoseconds ?? 1), labels)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment

}

public static func summary(defaultQuantiles: [Double] = Prometheus.defaultQuantiles) -> TimerImplementation {
TimerImplementation(.summary(defaultQuantiles: defaultQuantiles))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a return statement for Swift 5

Suggested change
TimerImplementation(.summary(defaultQuantiles: defaultQuantiles))
return TimerImplementation(.summary(defaultQuantiles: defaultQuantiles))

}

public static func histogram(defaultBuckets: Buckets = Buckets.defaultBuckets) -> TimerImplementation {
TimerImplementation(.histogram(defaultBuckets: defaultBuckets))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a return statement for Swift 5

Suggested change
TimerImplementation(.histogram(defaultBuckets: defaultBuckets))
return TimerImplementation(.histogram(defaultBuckets: defaultBuckets))


public init(labelSanitizer: LabelSanitizer = PrometheusLabelSanitizer(),
timerImplementation: PrometheusMetricsFactory.TimerImplementation = .summary(),
defaultHistogramBuckets: Buckets = .defaultBuckets) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest renaming this to defaultRecorderBuckets to indicate these are for Recorder instances, not for Timers

@MrLotU
Copy link
Collaborator

MrLotU commented Apr 28, 2021

+1 for this.

I tried the branch and it seemed to work great. The only problem I saw was that in Vapor using:

        try MetricsSystem.prometheus().collect(into: promise)

gave me Fatal error: leaking promise. But I worked around by using the client I had created:

        client.collect(into: promise)

And this worked.

In any case would love to see this get merged, as providing a histogram backed timer is really a necessity.

@grennis The issue you mention about leaking promises seems unrelated to this PR, but should not be happening for sure. Would you mind filing a bug-report with some extra information like system setup, full error message and reproduction steps? You can do so here

Thanks!

@avolokhov
Copy link
Contributor Author

@MrLotU thanks for the review! I believe, I've addressed all the comments.

Copy link
Collaborator

@MrLotU MrLotU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for the PR 😄

@MrLotU MrLotU merged commit a5671ba into swift-server:master Apr 30, 2021
@MrLotU
Copy link
Collaborator

MrLotU commented Apr 30, 2021

Released in 1.0.0-alpha.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support a histogram backed timer.
4 participants