-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HealthCheck log output options #23900
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Honny1 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
331ce42
to
ebadb8b
Compare
ee1d02a
to
bf7e1cb
Compare
4d0668f
to
976c7b3
Compare
libpod/container_config.go
Outdated
@@ -413,6 +413,14 @@ type ContainerMiscConfig struct { | |||
HealthCheckConfig *manifest.Schema2HealthConfig `json:"healthcheck"` | |||
// HealthCheckOnFailureAction defines an action to take once the container turns unhealthy. | |||
HealthCheckOnFailureAction define.HealthCheckOnFailureAction `json:"healthcheck_on_failure_action"` | |||
// HealthLogDestination defines the destination where the log is stored | |||
HealthLogDestination string `json:"healthLogDestination"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mark all of these as omitempty
- we shouldn't bother including them in the JSON by default
libpod/container_internal.go
Outdated
if path := c.healthCheckLogPath(); path != "" { | ||
path, err := c.healthCheckLogPath() | ||
if err != nil { | ||
logrus.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this shouldn't be fatal? I think this seems like a hard-stop.
libpod/healthcheck.go
Outdated
@@ -156,7 +148,18 @@ func (c *Container) runHealthCheck(ctx context.Context, isStartup bool) (define. | |||
hcl := newHealthCheckLog(timeStart, timeEnd, returnCode, eventLog) | |||
logStatus, err := c.updateHealthCheckLog(hcl, inStartPeriod, isStartup) | |||
if err != nil { | |||
return hcResult, "", fmt.Errorf("unable to update health check log %s for %s: %w", c.healthCheckLogPath(), c.ID(), err) | |||
path, err := c.healthCheckLogPath() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, from reading this - do we support logging to file and journald at the same time? That seems unnecessary - one or the other should be sufficient.
libpod/healthcheck.go
Outdated
} | ||
|
||
// HealthCheckLogPath returns the path for where the health check log is | ||
func (c *Container) healthCheckLogPath() string { | ||
return filepath.Join(filepath.Dir(c.state.RunDir), "healthcheck.log") | ||
func (c *Container) healthCheckLogPath() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider replacing this with a more generic readHealthcheckLog()
to better handle the fact that they can go in journald now?
libpod/healthcheck_linux.go
Outdated
@@ -186,3 +187,21 @@ func (c *Container) hcUnitName(isStartup, bare bool) string { | |||
} | |||
return unitName | |||
} | |||
|
|||
func (c *Container) sendHealthCheckLogToJournal(status string, log string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an existing healthcheck event, right? Can you extend that with optional fields instead of creating a new event for the log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but then the target could be affected by events_logger
. I don't think this is a problem, but I wonder why HealthCheckResults
were originally stored in a separate file.
976c7b3
to
d38a94e
Compare
1232dfc
to
bb12c14
Compare
@mheon PTAL |
### `HealthLogDestination=` | ||
|
||
Set the destination of the HealthCheck log. Directory path or events_logger (none use container state file) | ||
Equivalent to the Podman `--health-log-destination` option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a sentence at the end here saying "the default is none
" so it's very clear - I initially missed the (default) bit below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, none
doesn't feel like the right name. Maybe local
?
### `HealthMaxLogCount=` | ||
|
||
Set maximum number of attempts in the HealthCheck log file. ('0' value means an infinite number of attempts in the log file) | ||
Equivalent to the Podman `--Health-max-log-count` option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the default here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 attempts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put these in the manpage, make it clear to users what the expectation is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added default values to man pages.
### `HealthMaxLogSize=` | ||
|
||
Set maximum length in characters of stored HealthCheck log. ("0" value means an infinite log length) | ||
Equivalent to the Podman `--Health-max-log-size` option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the default here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
500 characters
@Luap99 PTAL, particularly at the events bits. I don't really mind but we're getting a lot of feedback about how we're handling events. |
Signed-off-by: Jan Rodák <[email protected]>
Signed-off-by: Jan Rodák <[email protected]>
bb12c14
to
7ae053c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your PR description is great, but I would love to see that information to fins its way into the git log. I don't see a lot of point in splitting you changes in three different commits without any description in them on its own. I would rather have you squash them and add the PR info to the commit message.
And yes in general I rather would write one event instead of a second one just add the msg to the existing heal_status event sounds fine to me.
|
||
Set the destination of the HealthCheck log. Directory path or events_logger (none use container state file) (Default: none) | ||
|
||
* `none`: (default) HealthCheck logs are stored in overlay containers. (For example: `./run/containers/storage/overlay-containers/<container-ID>/healthcheck.log`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logically looking at this I fine the none name extremely confusing, this makes it sound like there will be no logs at all. But it uses the default current location in this case.
Second the mentioned path is not correct, it is only the default runroot for root but then can be changed via many ways so this should say under $runroot/...
Set the destination of the HealthCheck log. Directory path or events_logger (none use container state file) (Default: none) | ||
|
||
* `none`: (default) HealthCheck logs are stored in overlay containers. (For example: `./run/containers/storage/overlay-containers/<container-ID>/healthcheck.log`) | ||
* `directory`: creates a log file named `<container-ID>-healthcheck.log` with HealthCheck logs in the specified directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I should have asked on the design doc but is there reason to use a directory here? I think it would make sense to just use the full path as file instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case that there are multiple containers with the same path, to avoid mixing health check logs, the log file name is generated automatically. Probably the container name could be used instead of the container-id if it is specific, to improve the user experience. @Luap99 WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is properly fine like this then. Prevent using the same path on multiple container like this seems reasonable. Using the ID will be better than the name as the name may be larger than the 255 char file name limit which would break us.
libpod/events.go
Outdated
e.HealthStatus = healthCheckResult.Status | ||
if status == events.HealthLog { | ||
if len(healthCheckResult.Log) > 0 { | ||
log_, err := json.Marshal(healthCheckResult.Log[len(healthCheckResult.Log)-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use underscore in a variable name
libpod/healthcheck.go
Outdated
sort.SliceStable(containerEvents, func(i, j int) bool { | ||
return containerEvents[i].Time.After(containerEvents[j].Time) | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Events should be in order why are you sorting them again?
@@ -642,6 +642,14 @@ func createContainerOptions(rt *libpod.Runtime, s *specgen.SpecGenerator, pod *l | |||
options = append(options, libpod.WithHealthCheckOnFailureAction(s.ContainerHealthCheckConfig.HealthCheckOnFailureAction)) | |||
} | |||
|
|||
if healthCheckSet { | |||
if s.ContainerHealthCheckConfig.HealthLogDestination != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a user supplied value here and never validate before we create the container which means we get random runtime failures instead so there must be validation here.
In general we should check that the string is either one of the special values or an existing absolute path, the absolute path is import because otherwise things just break weirdly if we or the users changes the cwd.
This also mean we keep the option to add new special value later that do not start with /
because with your current version defining a new value could mean breaking a users who used that as relative dir (I know super unlikely).
libpod/healthcheck.go
Outdated
"type=container", | ||
} | ||
|
||
containerEvents, err := c.runtime.GetEvents(context.Background(), filters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This in general is very inefficient if you have many events as we have to reading every single one of them for every single inspect and healtchcheck update, this will kill performance. On a system with a ton of events this option will be unusable
$ date; bin/podman events --stream=false --filter container=abc >/dev/null; date
Thu 19 Sep 17:50:48 CEST 2024
Thu 19 Sep 17:51:10 CEST 2024
20s on my laptop, and yes I have close to 300k events but I don't think that is to unusual.
Looking at your design doc you seem to mention that you log to both the journal and file but I don't see that here anymore? I would have assumed that the journal should be write only for this use case. And you keep using the file to read it for inspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, from reading this - do we support logging to file and journald at the same time? That seems unnecessary - one or the other should be sufficient.
I did it according to @mheon's comment. But yes. I will extend the healt_status
event with a log and use a file so that it doesn't affect performance as I wrote it in the design doc.
test/system/220-healthcheck.bats
Outdated
|
||
|
||
run_podman inspect $ctrname --format "{{.State.Health.Log}}" | ||
[ $(echo "$output" | grep -o "$msg" - | wc -l) -eq 5 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not acceptable per @edsantiago's rules, have a look at other tests, the template file and the Readme file in the test/system dir to see how to match things
|
||
* `none`: (default) HealthCheck logs are stored in overlay containers. (For example: `./run/containers/storage/overlay-containers/<container-ID>/healthcheck.log`) | ||
* `directory`: creates a log file named `<container-ID>-healthcheck.log` with HealthCheck logs in the specified directory. | ||
* `journal`: The log will be written to the journal. A log with HealthCheck logs will also be written to the default path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not match the description from the cli option.
a7320a7
to
33d3069
Compare
Signed-off-by: Jan Rodák <[email protected]>
33d3069
to
1c85f0b
Compare
This PR creates three new flags that can affect the output of the HealtCheck log.
Currently, when a container is configured with HealthCheck, the output from the HealthCheck command is only logged to the container status file, which is accessible via
podman inspect
. It is also limited to the last five executions and the first 500 characters per execution.This makes debugging past problems very difficult, since the only information available about the failure of the HealthCheck command is the generic
healthcheck service failed
record.The
--health-log-destination
flag sets the destination of the HealthCheck log.none
: (default behavior)HealthCheckResults
are stored in overlay containers. (For example:./run/containers/storage/overlay-containers/<container-ID>/healthcheck.log
)directory
: creates a log file named<container-ID>-healthcheck.log
with JSONHealthCheckResults
in the specified directory.events_logger
: The log will be written with logging mechanism set by events_logger.The
--health-max-log-count
flag sets the maximum number of attempts in the HealthCheck log file.0
indicates an infinite number of attempts in the log file.5
attempts in the log file.The
--health-max-log-size
flag sets the maximum length of the log stored.0
indicates an infinite log length.500
log characters.Does this PR introduce a user-facing change?
Fixes: RHEL-24623