Chore: Verify SIGTERM handling behavior #1291

bernardhalas · 2024-03-22T16:43:23Z

Description

By default, SIGTERM is an important signal for the processes which represents a request for graceful shutdown. It's used by Kubernetes whenever there's a pod termination requested (e.g. pod eviction, pod restarts upon liveness probe failure, etc.).

We need to ensure that Claudie services react properly to SIGTERM signal on all levels. This means:

Ensure that all Docker images propagate SIGTERM to the running processes (namely PID 0 within container). Tini can be used for that purpose.
Services themselves respect SIGTERM.
terraformer, upon receiving SIGTERM, should issue exactly one SIGTERM to the terraform process (because multiple SIGTERM or SIGINT signals to terraform process represent a request for a non-graceful shutdown; ref <<-- I couldn't find the src code, but at least referring to the issues that confirm this behavior)

Exit criteria

Ensure all Claudie images forward SIGTERM do processes
Services respect SIGTERM (if not, a new Issue can be opened to services which don't respect SIGTERM)
terraform process receives just one SIGTERM upon one SIGTERM sent to terraformer pod

The text was updated successfully, but these errors were encountered:

bernardhalas · 2024-03-22T16:43:55Z

As agreed, during today's grooming, passing directly to Jakub.

JKBGIT1 · 2024-03-27T12:07:40Z

Ensure all Claudie images forward SIGTERM do processes

All Claudie images forward SIGTERM to main the process of the container, but not to the child processes of this process.

Services respect SIGTERM (if not, a new Issue can be opened to services which don't respect SIGTERM)

I would say that services respect SIGTERM partially. Since the workflow heavy lifting is performed by the child processes of the Claudie services, the Claudie services have to wait until the child processes finish (No SIGTERM is forwarded to the child processes), and then the Claudie processes terminate.

terraformer, upon receiving SIGTERM, should issue exactly one SIGTERM to the terraform process (because multiple SIGTERM or SIGINT signals to terraform process represent a request for a non-graceful shutdown;

SIGTERM isn't forwarded to the terraform process, because it is a child process of the terraformer main container process.

Also, if you SIGTERM terraform process it is shutdown gracefully, but the terraformer re-run the terraform process as part of the retry loop. In my case it broke the TF state (error regarding already existing resource) and the workflow failed.

JKBGIT1 · 2024-03-28T08:38:57Z

I'm closing this issue in favor of #1298

We had a discussion about the current way of handling the SIGTERM calls. To summarize, Claudie's services listen to SIGTERM calls. The SIGTERM is forwarded to the main process of the container, then the call is handled programmatically by us. Besides that, we don't want to forward this call to the child processes of the main container process.

EDIT: @bernardhalas please complement me if there is something I missed or worth mentioning.

bernardhalas added groomed Task that everybody agrees to pass the gatekeeper chore A chore is updating dependencies, etc; no significant code changes labels Mar 22, 2024

bernardhalas assigned JKBGIT1 Mar 22, 2024

JKBGIT1 closed this as completed Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore: Verify SIGTERM handling behavior #1291

Chore: Verify SIGTERM handling behavior #1291

bernardhalas commented Mar 22, 2024

bernardhalas commented Mar 22, 2024

JKBGIT1 commented Mar 27, 2024

JKBGIT1 commented Mar 28, 2024 •

edited

Loading

Chore: Verify SIGTERM handling behavior #1291

Chore: Verify SIGTERM handling behavior #1291

Comments

bernardhalas commented Mar 22, 2024

Description

Exit criteria

bernardhalas commented Mar 22, 2024

JKBGIT1 commented Mar 27, 2024

JKBGIT1 commented Mar 28, 2024 • edited Loading

JKBGIT1 commented Mar 28, 2024 •

edited

Loading