Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus: Out-of-order samples #471

Open
tr3mor opened this issue Aug 19, 2024 · 4 comments
Open

Prometheus: Out-of-order samples #471

tr3mor opened this issue Aug 19, 2024 · 4 comments

Comments

@tr3mor
Copy link

tr3mor commented Aug 19, 2024

Hello,
We are using this exporter in HA mode (2 replicas) and scrape metrics using Prometheus (installed with kube-prometheus-stack)
We are seeing the following warns in prometheus logs due to duplicated metrics

ts=2024-08-16T12:59:56.213Z caller=scrape.go:1741 level=warn component="scrape manager" scrape_pool=serviceMonitor/kyverno-system/kyverno-policy-reporter-monitoring/0 target=http://172.26.2.131:8080/metrics msg="Error on ingesting out-of-order samples" num_dropped=1776

The main reason for this is that in ServiceMonitor any label, which distinguish two targets, is dropped.

relabelings:
- action: labeldrop
regex: pod|service|container
- targetLabel: instance
replacement: policy-reporter
action: replace
{{- with .Values.serviceMonitor.relabelings }}
{{- toYaml . | nindent 4 }}
{{- end }}

And since labeldrop declared before .Values.serviceMonitor.relabelings used, user cant override this behavior.
Looking at the code, I guess it was done this way for dashboards to work with any number of pods, but it triggers alerts on kube-prometheus-stack side https://github.com/prometheus-community/helm-charts/blob/68ba986b2a6283efd3f743f0cf7859d93b615b64/charts/kube-prometheus-stack/templates/prometheus/rules-1.14/prometheus.yaml#L339
I understand how current behaviour works for most cases, but I would like to have the ability to override it/disable labeldrop.
I can create PR if needed.
Thank you!

@fjogeleit
Copy link
Member

Hey, yes it was introduced to have correct metrics independent of the pod size.

If you need to customize this behavior I would appreciate your contributions.

@tr3mor
Copy link
Author

tr3mor commented Aug 20, 2024

Hey,
I was able to achieve what I wanted with the current version of the chart, but it required manual changes to the dashboards. Not sure if you want to make these changes defaults or just keep it as a workaround.
I did:

  1. Kept pod name as a label to distinguish data from 2 pods:
serviceMonitor:
  relabelings:
    - action: replace
      sourceLabels:
        - __meta_kubernetes_pod_name
      targetLabel: pod
  1. Update Grafana dashboards to work with any number of pods (group by pod first, then choose max).
    For example,

replacing

sum(policy_report_result{policy=~"$policy", category=~"$category", severity=~"$severity", source=~"$source", kind=~"$kind", exported_namespace=~"$namespace" } > 0) by (status, exported_namespace)

with

max(sum(policy_report_result{policy=~"$policy", category=~"$category", severity=~"$severity", source=~"$source", kind=~"$kind", exported_namespace=~"$namespace" } > 0) by (status, exported_namespace, pod)) by (status, exported_namespace)

While Grafana change is fully backward compatible (it will work the same if you dont have pod label) and will work with any number of pods, I dont believe pod label should be kept by default, since it might break a lot of dashboards/alerts people created.
So I see three options here:

  1. Update Grafana dashboards, and introduce a switch to keep pod label on metrics to avoid metric duplication.
  2. Update Grafana dashboards, and make a note on how to keep pod label using serviceMonitor.relabelings.
  3. Change nothing, and explain how it could be achieved with serviceMonitor.relabelings and manual changes to Grafana dashboards (if you don't expect people to use it).

Please let me know which option sounds better for you. I can bring PR if needed =)

@fjogeleit
Copy link
Member

Hey, thanks a lot for your effort and details about solutions, very appreciate it. I will take a deeper look in the next days and reach back to you.

@fjogeleit
Copy link
Member

fjogeleit commented Sep 6, 2024

I think we could introduce a "breaking change" in the 3.x major version and update metrics and dashboards accordingly to keep the pod label, also to make use of the other go metrics about the actual policy-reporter pod metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants