Fix: Do not emit health_status event for each health check attempt. #24005

harish2704 · 2024-09-18T17:24:09Z

Emit health_status event only if there is a change in health_status
Fixes #24003

Does this PR introduce a user-facing change?

Emit `health_status` event only when there a status/state change for `healthy` flag

Emit event only if there is a change in health_status Fixes containers#24003 Signed-off-by: Harish Karumuthil <[email protected]>

openshift-ci · 2024-09-18T17:24:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harish2704
Once this PR has been reviewed and has the lgtm label, please assign baude for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

packit-as-a-service · 2024-09-18T17:54:10Z

Cockpit tests failed for commit 2771216. @martinpitt, @jelly, @mvollmer please check.

martinpitt · 2024-09-19T03:18:30Z

This "breaks" cockpit-podman's testHealthcheckSystem, which assumes the current behavior of getting regular "health check ran and passed" events. I don't have a good gut feeling whether the regular "pings of life" are expected/by design or considered noise. So I'll defer to the review of the podman developers, and if this change is approved, I'll adjust cockpit-podman's tests.

Luap99

I don't have a good gut feeling whether the regular "pings of life" are expected/by design or considered noise

I don't think there is a formal design on how it should work but the way it works today is how the current users will expect it to work. The fact that it breaks cockpit testing is a good sign that we have users depending on it. As such I consider this a breaking change that is not suitable for a minor version so this would have to wait for podman 6.0 if we want to do that at all. I think reducing the event spam is a good idea in general.

Now one thing we should consider if docker doesn't behave this way our docker compat api should not behave this way either. One way would be to a new field to the health_status event that is set when the status changed and then we can filter out the events that did not have this set to make the docker clients work correctly at least.

cc @mheon @Honny1 In case you have opinions as you have been working on other healthcheck events work

libpod/healthcheck.go

Honny1

Why not, reduce the noise in the logs, but I would wait for Podman 6.0. I'd also be in favor of adding a flag to enable log "saving" for each run, for podman healtcheck run (probably also for podman run), for debugging purposes or if something goes wrong.

harish2704 · 2024-09-19T13:39:10Z

@Luap99

I don't think there is a formal design on how it should work but the way it works today is how the current users will expect it to work.

I have to agree on this because I searched for such a documentation ( https://docs.docker.com/reference/api/engine/version/v1.41/#tag/System/operation/SystemEvents ) and there is no description about exact behavior of those events . They have only provided the list of valid events

The fact that it breaks cockpit testing is a good sign that we have users depending on it. As such I consider this a breaking change that is not suitable for a minor version so this would have to wait for podman 6.0 if we want to do that at all

IMHO, Podman has a huge opportunity as secure docker replacement and in that sense most of the users ( and applications targeting docker eg: Traefik ) will expect it work similar to docker. To address the user base who is looking to switch to Podman, it should be considered as a bug. Please note that, here i am not referring the podman cli tool. I am talking about docker compatible API which podman provides.

Now one thing we should consider if docker doesn't behave this way our docker compat api should not behave this way either. One way would be to a new field to the health_status event that is set when the status changed and then we can filter out the events that did not have this set to make the docker clients work correctly at least.

Exactly. This is the point I was trying to convey. I checked how podman-docker works but it is found to be a simple bash wrapper script for podman cli ( at-least in my Fedora-40 ) .
So, If my above understanding is correct ( ie, podman-docker is just an alias/wrapper of podman ) then, podman-docker is not exposing a separate docker compatible API ( either via unix socket or via TCP ) .
All we have is an API exposed by podman and it is expected to be docker compatible.

So, in short, my bug report is not about behavior of podma-cli tool. it is about the docker-compatible API which podman exposes.
if any of you can provide some hints on how to fix this issue without changing behaviour podman cli I am happy to try that. To be specific, my question is , at which point, Golang event is converted to HTTP API data?
I will try to add the additional flag as you mentioned in the comment

mheon · 2024-09-19T14:38:49Z

IMO, this should not be default. I would not mind adding a config field in containers.conf to enable this more minimal events output but we should not (and, as @Luap99 pointed out, cannot without a major version) do this by default.

Luap99 · 2024-09-19T16:54:10Z

@harish2704 Our API socket is split in two parts the normal docker api endpoint and the our libpod endpoints that all start with /version/libpod/... so all the other docker compatiable endpoints can and should be changed to match docker api as closely as possible.

Look into pkg/api/handlers/compat/events.go there we use the code for both endpoints but if you look there into the logic you will find !utils.IsLibpodRequest(r) usage there so based on that you can change the behavior there. Now of course because the event stream cannot known when such a healthcheck state change happens you need to add this info into the event itself so that you can filter it there based on that which is why I suggest added this field or attribute to the event type.

packit-as-a-service · 2024-09-19T18:12:38Z

Cockpit tests failed for commit 5b8d32d. @martinpitt, @jelly, @mvollmer please check.

Fix: Do not emit health_status event for each health check attempt.

2771216

Emit event only if there is a change in health_status Fixes containers#24003 Signed-off-by: Harish Karumuthil <[email protected]>

openshift-ci bot added the release-note label Sep 18, 2024

Luap99 requested changes Sep 19, 2024

View reviewed changes

libpod/healthcheck.go Outdated Show resolved Hide resolved

Honny1 reviewed Sep 19, 2024

View reviewed changes

Resolves containers#24005 (comment)

5b8d32d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Do not emit health_status event for each health check attempt. #24005

Fix: Do not emit health_status event for each health check attempt. #24005

harish2704 commented Sep 18, 2024

openshift-ci bot commented Sep 18, 2024

packit-as-a-service bot commented Sep 18, 2024

martinpitt commented Sep 19, 2024

Luap99 left a comment

Honny1 left a comment

harish2704 commented Sep 19, 2024

mheon commented Sep 19, 2024

Luap99 commented Sep 19, 2024

packit-as-a-service bot commented Sep 19, 2024

Fix: Do not emit health_status event for each health check attempt. #24005

Are you sure you want to change the base?

Fix: Do not emit health_status event for each health check attempt. #24005

Conversation

harish2704 commented Sep 18, 2024

Does this PR introduce a user-facing change?

openshift-ci bot commented Sep 18, 2024

packit-as-a-service bot commented Sep 18, 2024

martinpitt commented Sep 19, 2024

Luap99 left a comment

Choose a reason for hiding this comment

Honny1 left a comment

Choose a reason for hiding this comment

harish2704 commented Sep 19, 2024

mheon commented Sep 19, 2024

Luap99 commented Sep 19, 2024

packit-as-a-service bot commented Sep 19, 2024