Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Do not emit health_status event for each health check attempt. #24005

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

harish2704
Copy link

Emit health_status event only if there is a change in health_status
Fixes #24003

Does this PR introduce a user-facing change?

Emit `health_status` event only when there a status/state change for `healthy` flag

Emit event only if there is a change in health_status
Fixes containers#24003


Signed-off-by: Harish Karumuthil <[email protected]>
Copy link
Contributor

openshift-ci bot commented Sep 18, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harish2704
Once this PR has been reviewed and has the lgtm label, please assign baude for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Cockpit tests failed for commit 2771216. @martinpitt, @jelly, @mvollmer please check.

@martinpitt
Copy link
Contributor

This "breaks" cockpit-podman's testHealthcheckSystem, which assumes the current behavior of getting regular "health check ran and passed" events. I don't have a good gut feeling whether the regular "pings of life" are expected/by design or considered noise. So I'll defer to the review of the podman developers, and if this change is approved, I'll adjust cockpit-podman's tests.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good gut feeling whether the regular "pings of life" are expected/by design or considered noise

I don't think there is a formal design on how it should work but the way it works today is how the current users will expect it to work. The fact that it breaks cockpit testing is a good sign that we have users depending on it. As such I consider this a breaking change that is not suitable for a minor version so this would have to wait for podman 6.0 if we want to do that at all. I think reducing the event spam is a good idea in general.

Now one thing we should consider if docker doesn't behave this way our docker compat api should not behave this way either. One way would be to a new field to the health_status event that is set when the status changed and then we can filter out the events that did not have this set to make the docker clients work correctly at least.

cc @mheon @Honny1 In case you have opinions as you have been working on other healthcheck events work

libpod/healthcheck.go Outdated Show resolved Hide resolved
Copy link
Member

@Honny1 Honny1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not, reduce the noise in the logs, but I would wait for Podman 6.0. I'd also be in favor of adding a flag to enable log "saving" for each run, for podman healtcheck run (probably also for podman run), for debugging purposes or if something goes wrong.

@harish2704
Copy link
Author

@Luap99

I don't think there is a formal design on how it should work but the way it works today is how the current users will expect it to work.

I have to agree on this because I searched for such a documentation ( https://docs.docker.com/reference/api/engine/version/v1.41/#tag/System/operation/SystemEvents ) and there is no description about exact behavior of those events . They have only provided the list of valid events

The fact that it breaks cockpit testing is a good sign that we have users depending on it. As such I consider this a breaking change that is not suitable for a minor version so this would have to wait for podman 6.0 if we want to do that at all

IMHO, Podman has a huge opportunity as secure docker replacement and in that sense most of the users ( and applications targeting docker eg: Traefik ) will expect it work similar to docker. To address the user base who is looking to switch to Podman, it should be considered as a bug. Please note that, here i am not referring the podman cli tool. I am talking about docker compatible API which podman provides.

Now one thing we should consider if docker doesn't behave this way our docker compat api should not behave this way either. One way would be to a new field to the health_status event that is set when the status changed and then we can filter out the events that did not have this set to make the docker clients work correctly at least.

Exactly. This is the point I was trying to convey. I checked how podman-docker works but it is found to be a simple bash wrapper script for podman cli ( at-least in my Fedora-40 ) .
So, If my above understanding is correct ( ie, podman-docker is just an alias/wrapper of podman ) then, podman-docker is not exposing a separate docker compatible API ( either via unix socket or via TCP ) .
All we have is an API exposed by podman and it is expected to be docker compatible.

So, in short, my bug report is not about behavior of podma-cli tool. it is about the docker-compatible API which podman exposes.
if any of you can provide some hints on how to fix this issue without changing behaviour podman cli I am happy to try that. To be specific, my question is , at which point, Golang event is converted to HTTP API data?
I will try to add the additional flag as you mentioned in the comment

@mheon
Copy link
Member

mheon commented Sep 19, 2024

IMO, this should not be default. I would not mind adding a config field in containers.conf to enable this more minimal events output but we should not (and, as @Luap99 pointed out, cannot without a major version) do this by default.

@Luap99
Copy link
Member

Luap99 commented Sep 19, 2024

@harish2704 Our API socket is split in two parts the normal docker api endpoint and the our libpod endpoints that all start with /version/libpod/... so all the other docker compatiable endpoints can and should be changed to match docker api as closely as possible.

Look into pkg/api/handlers/compat/events.go there we use the code for both endpoints but if you look there into the logic you will find !utils.IsLibpodRequest(r) usage there so based on that you can change the behavior there. Now of course because the event stream cannot known when such a healthcheck state change happens you need to add this info into the event itself so that you can filter it there based on that which is why I suggest added this field or attribute to the event type.

Copy link

Cockpit tests failed for commit 5b8d32d. @martinpitt, @jelly, @mvollmer please check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

health_status events are too noisy/redundant
5 participants