Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a histogram backed timer. #44

Closed
avolokhov opened this issue Apr 6, 2021 · 0 comments · Fixed by #46
Closed

Support a histogram backed timer. #44

avolokhov opened this issue Apr 6, 2021 · 0 comments · Fixed by #46
Assignees

Comments

@avolokhov
Copy link
Contributor

Feature Request

Motivation Behind Feature

Hi, @MrLotU.
When adapting prometheus data types to swift-metrics api, there's multiple ways to back API types with prometheus types. In particular, both Summary and Histogram can serve as a backend for Timer. In their docs, prometheus compares Summary and Histogram implementations: https://prometheus.io/docs/practices/histograms/#quantiles.
One major difference there is that Summary backed values are not aggregatable. For server side application it's common to have multiple replicas of your backend. This makes Summary based timer a show stopper.

Feature Description

For the reference (it contradicts my point, but I still have to mention it
One solution I see is to make MetricsFactory conformance a wrapper and not an extension to PrometheusClient. This will make the code much more flexible, allow users to pick/write their own MetricsFactory conformance and potentially help us with more elaborate Prometheus -> swift-metrics adapters (see my other issue #36; being able to store baseLabels separately from PrometheusClient will simplify the solution).

Alternatives or Workarounds

It contradicts my point, but I have to mention that java dropwizard adapter uses Summary as a backed for Timer. However, it still forces an anti pattern

avg(http_request_duration_seconds{quantile="0.95"}) // BAD!

Prometheus mentions on their docs.
Python prometheus-flask-exporter uses Histogram as backing type for timer.

In my local fork I wrap PrometheusClient with a custom metrics factory that does this:

    private func makeHistogramBackedTimer(label: String, dimensions: [(String, String)]) -> TimerHandler {
        let createHandler = { (histogram: PromHistogram) -> TimerHandler in
            HistogramBackedTimer(histogram: histogram, dimensions: dimensions)
        }
        if let histogram: PromHistogram<Int64, DimensionHistogramLabels> = prometheus.getMetricInstance(with: label, andType: .histogram) {
            return createHandler(histogram)
        }
        return createHandler(prometheus.createHistogram(forType: Int64.self, named: label, buckets: defaultExponentialBuckets, labels: DimensionHistogramLabels.self))
    }

/// This is a `swift-metrics` timer backed by a Prometheus' `PromHistogram` implementation.
/// This is superior to `Summary` backed timer as `Summary` emits a set of quantiles, which is impossible to correctly aggregate when one wants to render a percentile for a set of instances.
/// `Histogram` aggregation is possible with server-side math magic.
class HistogramBackedTimer: TimerHandler {
    let histogram: PromHistogram<Int64, DimensionHistogramLabels>
    let labels: DimensionHistogramLabels?
    // this class is a lightweight wrapper around heavy prometheus metric type. This class is not cached and each time
    // created anew. This allows us to use variable timeUnit without locking.
    var timeUnit: TimeUnit?

    init(histogram: PromHistogram<Int64, DimensionHistogramLabels>, dimensions: [(String, String)]) {
        self.histogram = histogram
        if !dimensions.isEmpty {
            self.labels = DimensionHistogramLabels(dimensions)
        } else {
            self.labels = nil
        }
    }

    func preferDisplayUnit(_ unit: TimeUnit) {
        self.timeUnit = unit
    }

    func recordNanoseconds(_ duration: Int64) {
        // histogram can't be configured with timeUnits, so we have to modify incoming data
        histogram.observe(duration / Int64(timeUnit?.scaleFromNanoseconds ?? 1), labels)
    }
}

It works ok, but it's clumsy and forces every user with similar use case to wrap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants