CI: make 090-events parallel-safe #23987

edsantiago · 2024-09-17T16:21:10Z

...or at least as much as possible. Some tests cannot
be run in parallel due to #23750: "--events-backend=file"
does not actually work the way a naïve user would intuit.
Stop/die events are asynchronous, and can be gathered
by ANY OTHER podman process running after it, and if
that process has the default events-backend=journal,
that's where the event will be logged. See #23987 for
further discussion.

Signed-off-by: Ed Santiago [email protected]

None

Luap99

Some tests cannot
be run in parallel due to #23750 ("--events-backend=file"
does not actually mean "events-backend=file").

Well I think this is just misleading, --events-backend=file works fine like this. What happens is that another process not configured to use the file logger can also see your contianer and sync the state there which then creates the died event. The event logger is per process and NOT per container as such this seem totally fine to me.
In general it is not really sane to mix the event loggers like this anyway and no user should do that.

Luap99 · 2024-09-17T17:03:52Z

test/system/090-events.bats

+    # Wait for container to truly be gone.
+    # 99% of the time this will return immediately with a "no such container" error,
+    # which is fine. Under heavy load, it might actually catch the container while
+    # it's being cleaned up. Either way, this guarantees the "died" event is logged.
+    PODMAN_TIMEOUT=4 run_podman '?' wait $id


Looking at this you could just rmeove the -d from podman run as the if podman run runs in the foreground it will ensure to delete the container before exit. I guess then we need another way to fetch the cid so I guess in terms of podman command nothing changed so the wait here is fine for me.

edsantiago · 2024-09-17T17:30:57Z

The event logger is per process and NOT per container

That's an internal distinction that makes sense to you, but as you admitted in the bug page it violates POLA for everyone else.

In general it is not really sane to mix the event loggers like this anyway and no user should do that.

This is something that's very hard for me to understand, even with your explanations. I'm sure in six months I will forget it, and run podman with --events-backend because it's an existing documented flag that's very handy for some cases. If it should not be used, maybe there should be a warning emitted every time it's seen on the command line?

edsantiago · 2024-09-17T17:31:21Z

test/system/090-events.bats

    run_podman run --name=$cname --rm $IMAGE true
+    # FIXME FIXME FIXME, please confirm that my 'container=' change is correct


Eyeballs requested on this change, please

I don't know what the indentation was but yes this seems correct status=$cname clearly makes no sense

edsantiago · 2024-09-17T17:31:57Z

Do not merge until my FIXME question is answered

Luap99 · 2024-09-17T17:49:18Z

That's an internal distinction that makes sense to you, but as you admitted in the bug page it violates POLA for everyone else.

Let me put it another way if you do podman --events-backend journald run --name c1 ... and then podman --events-backend file stop c1` then which events should be generated where? In practice we always race between the foreground podman stop (file) and the background podman container cleanup (journald), both of them can create the died/cleanup and possible other events and that just depends on who gets there first.

And --event-backend is a global option not an option when creating a container so it applies only to the current libpod runtime. It really is no different like --db-backend, --network-backend and other which should just not be mixed without knowing the consequences.

And yes I realize that in your parallel tests it is a bit more complicated because you do not interact with the container directly from another test but rather just a operation that runs on all container such as podman mount causes the died event to be generated. The reason for this is that the died event should be created as soon as we notice the exit and this means each command that acts on this container and syncs the state can and will cause this event.

So if you want to run this in parallel you cannot have other podman process in parallel that act on all containers and of course this is much more difficult then not using --event-backend in parallel

If it should not be used, maybe there should be a warning emitted every time it's seen on the command line?

you can use it, but it must be consistently for all commands like we do in e2e. Not only for some.

...or at least as much as possible. Some tests cannot be run in parallel due to containers#23750: "--events-backend=file" does not actually work the way a naïve user would intuit. Stop/die events are asynchronous, and can be gathered by *ANY OTHER* podman process running after it, and if that process has the default events-backend=journal, that's where the event will be logged. See containers#23987 for further discussion. Signed-off-by: Ed Santiago <[email protected]>

edsantiago · 2024-09-18T00:27:58Z

if you do podman --events-backend journald run --name c1 ... and then podman --events-backend file stop c1 then which events should be generated where?

Thank you, that is a good illustration that helps me understand. However... I think we're in a middle situation because there is no podman operation on c1 that does not also have --events-backend=file. What I think is happening (based on your comment in the issue) is something closer to:

podman --events-backend=file run c1
(c1 runs, then completes, and generates the die event)
a completely unrelated test runs podman something, right after c1 finishes but before that podman command writes the event. This podman something command uses the default journal, and it sees this event, and "helpfully" writes it to the journal

Is that a sensible explanation of what is happening?

Anyhow, I updated the commit message, is that more acceptable?

Luap99 · 2024-09-18T08:10:03Z

Is that a sensible explanation of what is happening?

Yes

Anyhow, I updated the commit message, is that more acceptable?

sounds good

Luap99

LGTM

openshift-ci · 2024-09-18T08:11:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99,edsantiago]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rhatdan · 2024-09-18T16:03:05Z

/lgtm

openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 17, 2024

Luap99 reviewed Sep 17, 2024

View reviewed changes

edsantiago commented Sep 17, 2024

View reviewed changes

edsantiago marked this pull request as draft September 17, 2024 17:31

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 17, 2024

edsantiago force-pushed the safename-090 branch from 655ee3e to 5468718 Compare September 18, 2024 00:22

Luap99 approved these changes Sep 18, 2024

View reviewed changes

edsantiago marked this pull request as ready for review September 18, 2024 15:55

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 18, 2024

openshift-ci bot assigned rhatdan Sep 18, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2024

openshift-merge-bot bot merged commit 04d193d into containers:main Sep 18, 2024
56 checks passed

edsantiago deleted the safename-090 branch September 18, 2024 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: make 090-events parallel-safe #23987

CI: make 090-events parallel-safe #23987

edsantiago commented Sep 17, 2024 •

edited

Loading

Luap99 left a comment

Luap99 Sep 17, 2024

edsantiago commented Sep 17, 2024

edsantiago Sep 17, 2024

Luap99 Sep 17, 2024

edsantiago commented Sep 17, 2024

Luap99 commented Sep 17, 2024

edsantiago commented Sep 18, 2024

Luap99 commented Sep 18, 2024

Luap99 left a comment

openshift-ci bot commented Sep 18, 2024

rhatdan commented Sep 18, 2024

		run_podman run --name=$cname --rm $IMAGE true
		# FIXME FIXME FIXME, please confirm that my 'container=' change is correct

CI: make 090-events parallel-safe #23987

CI: make 090-events parallel-safe #23987

Conversation

edsantiago commented Sep 17, 2024 • edited Loading

Luap99 left a comment

Choose a reason for hiding this comment

Luap99 Sep 17, 2024

Choose a reason for hiding this comment

edsantiago commented Sep 17, 2024

edsantiago Sep 17, 2024

Choose a reason for hiding this comment

Luap99 Sep 17, 2024

Choose a reason for hiding this comment

edsantiago commented Sep 17, 2024

Luap99 commented Sep 17, 2024

edsantiago commented Sep 18, 2024

Luap99 commented Sep 18, 2024

Luap99 left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Sep 18, 2024

rhatdan commented Sep 18, 2024

edsantiago commented Sep 17, 2024 •

edited

Loading