Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standard dashboard work incompletly on rke2 with cilium #1376

Open
didlawowo opened this issue Mar 21, 2024 · 14 comments
Open

standard dashboard work incompletly on rke2 with cilium #1376

didlawowo opened this issue Mar 21, 2024 · 14 comments
Assignees
Labels
bug Something isn't working

Comments

@didlawowo
Copy link

didlawowo commented Mar 21, 2024

Describe the bug

i'm using helm with vm K8s stack chart
grafana come with dashboard but some are not working correctly

image image image

To Reproduce

just install stack k8s

Version

latest

Logs

No response

Screenshots

No response

Used command-line flags

No response

Additional information

No response

@didlawowo didlawowo added the bug Something isn't working label Mar 21, 2024
@dmitryk-dk
Copy link
Contributor

Hi @didlawowo ! What of the dashboard are you using?
VictoriaMetrics has its dashboards, and you can find them here.

@dmitryk-dk
Copy link
Contributor

If you want to use this dashboard you should check the metrics which are used in that dashboard and probably correct them

@dmitryk-dk dmitryk-dk added question Further information is requested and removed bug Something isn't working labels Mar 21, 2024
@dmitryk-dk dmitryk-dk self-assigned this Mar 21, 2024
@didlawowo
Copy link
Author

i'm using dashboard provided by the k8s vm stack

@dmitryk-dk
Copy link
Contributor

i'm using dashboard provided by the k8s vm stack

In the k8s stack, VictoriaMetrics exposes the dashboards that I shared before. As far as I can see from the domain you are using tailscale. So I think you should check which stack you are using

@dmitryk-dk dmitryk-dk added bug Something isn't working and removed question Further information is requested labels Mar 25, 2024
@dmitryk-dk
Copy link
Contributor

Hi @didlawowo ! I reproduces your bug, need to check how to fix it

@dmitryk-dk
Copy link
Contributor

Hi @didlawowo ! Can you check the vmagent on your installation? It will show you where is the problem with scrapes targets. If you fix it you should see all information into your dashboards.
Screenshot 2024-03-25 at 12 47 10

@didlawowo
Copy link
Author

thx you but could be more specific ? i'm not sure to understand

@didlawowo
Copy link
Author

i'm using dashboard provided by the k8s vm stack

In the k8s stack, VictoriaMetrics exposes the dashboards that I shared before. As far as I can see from the domain you are using tailscale. So I think you should check which stack you are using

the tailscale service its just for exposing. no impact

@dmitryk-dk
Copy link
Contributor

dmitryk-dk commented Mar 27, 2024

thx you but could be more specific ? i'm not sure to understand

Hi! We found a bug, and the dashboard should be updated. It happens because some kubernetes setup may missing image or container label
dotdc/grafana-dashboards-kubernetes#18 (comment)

As a small workaround you can install you kubelet with next configuration and check what the panels will have no data.

kubelet:
  spec:
    # drop high cardinality label and useless metrics for cadvisor and kubelet
    metricRelabelConfigs:
      - action: labeldrop
        regex: (uid)
      - action: labeldrop
        regex: (id|name)
      - action: drop
        source_labels: [__name__]
        regex: (rest_client_request_duration_seconds_bucket|rest_client_request_duration_seconds_sum|rest_client_request_duration_seconds_count)
      - target_label: image
        replacement: placeholder

Screenshot 2024-03-27 at 16 47 27

@dmitryk-dk
Copy link
Contributor

@Haleygo or @zekker6, can you take a look into the issue, please?

@didlawowo
Copy link
Author

nice answer, i'm not sure how to config kubelet in rke2

https://docs.rke2.io/reference/windows_agent_config?_highlight=kubelet&_highlight=conf#windows-rke2-agent-cli-help

i take a look

@AndrewChubatiuk
Copy link
Contributor

hey @didlawowo
what kubernetes version are you on?
what args are you passing to rke2 agent now?

@didlawowo
Copy link
Author

i'm using rke2

with these parameters

write-kubeconfig-mode: "0600"
server: https://192.168.1.200:9345
token: 
tls-san:
  - "192.168.1.200"
# Make a etcd snapshot every 6 hours
etcd-snapshot-schedule-cron: "0 */6 * * *"
# Keep 56 etcd snapshorts (equals to 2 weeks with 6 a day)
etcd-snapshot-retention: 56
etcd-expose-metrics: true
cni:
  - cilium
disable:
  - rke2-ingress-nginx
  - rke2-canal
  - rke2-kube-proxy
disable-cloud-controller: true
disable-kube-proxy: true

v1.27.12+rke2r1

@AndrewChubatiuk AndrewChubatiuk transferred this issue from VictoriaMetrics/VictoriaMetrics Aug 29, 2024
@AndrewChubatiuk
Copy link
Contributor

AndrewChubatiuk commented Sep 14, 2024

hey @didlawowo
finally found time to test a case in this issue locally as we are not using RKE2 at all
was able to reproduce issues with scraping kube-scheduler, kube-controller-manager and etcd metrics.
All these services required additional configurations to become scrapable by vmagent

  1. In server's /etc/rancher/rke2/config.yaml had to add additional values
etcd-expose-metrics: true
kube-scheduler-arg:
  - bind-address=0.0.0.0               # haven't checked how to pass there address from pod metadata instead
kube-controller-manager-arg:
  - bind-address=0.0.0.0               # haven't checked how to pass there address from pod metadata instead
  1. also updated default scrape configs in k8s-stack chart
kubeControllerManager:
  vmScrape:
    spec:
      endpoints:
        - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
          port: http-metrics
          scheme: https
          tlsConfig:
            caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            serverName: localhost          # maybe I've misconfigured something, but there was an issue until this value was set
            insecureSkipVerify: true        # haven't tried to pass automatically generated certificates in agent nodes
kubeEtcd:
  service:
    port: 2381
    targetPort: 2381
  vmScrape:
    spec:
      endpoints:  
        - port: http-metrics
          scheme: http
kubeScheduler:
  vmScrape:
    spec:
      endpoints:
        - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
          tlsConfig:
            caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            serverName: 127.0.0.1
            insecureSkipVerify: true                # haven't tried to pass automatically generated certificates in agent nodes

@AndrewChubatiuk AndrewChubatiuk changed the title standard dashboard work incompletly standard dashboard work incompletly on rke2 with cilium Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants