You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, SIGTERM is an important signal for the processes which represents a request for graceful shutdown. It's used by Kubernetes whenever there's a pod termination requested (e.g. pod eviction, pod restarts upon liveness probe failure, etc.).
We need to ensure that Claudie services react properly to SIGTERM signal on all levels. This means:
Ensure that all Docker images propagate SIGTERM to the running processes (namely PID 0 within container). Tini can be used for that purpose.
Services themselves respect SIGTERM.
terraformer, upon receiving SIGTERM, should issue exactly one SIGTERM to the terraform process (because multiple SIGTERM or SIGINT signals to terraform process represent a request for a non-graceful shutdown; ref <<-- I couldn't find the src code, but at least referring to the issues that confirm this behavior)
Exit criteria
Ensure all Claudie images forward SIGTERM do processes
Services respect SIGTERM (if not, a new Issue can be opened to services which don't respect SIGTERM)
terraform process receives just one SIGTERM upon one SIGTERM sent to terraformer pod
The text was updated successfully, but these errors were encountered:
bernardhalas
added
groomed
Task that everybody agrees to pass the gatekeeper
chore
A chore is updating dependencies, etc; no significant code changes
labels
Mar 22, 2024
Ensure all Claudie images forward SIGTERM do processes
All Claudie images forward SIGTERM to main the process of the container, but not to the child processes of this process.
Services respect SIGTERM (if not, a new Issue can be opened to services which don't respect SIGTERM)
I would say that services respect SIGTERM partially. Since the workflow heavy lifting is performed by the child processes of the Claudie services, the Claudie services have to wait until the child processes finish (No SIGTERM is forwarded to the child processes), and then the Claudie processes terminate.
terraformer, upon receiving SIGTERM, should issue exactly one SIGTERM to the terraform process (because multiple SIGTERM or SIGINT signals to terraform process represent a request for a non-graceful shutdown;
SIGTERM isn't forwarded to the terraform process, because it is a child process of the terraformer main container process.
Also, if you SIGTERM terraform process it is shutdown gracefully, but the terraformer re-run the terraform process as part of the retry loop. In my case it broke the TF state (error regarding already existing resource) and the workflow failed.
We had a discussion about the current way of handling the SIGTERM calls. To summarize, Claudie's services listen to SIGTERM calls. The SIGTERM is forwarded to the main process of the container, then the call is handled programmatically by us. Besides that, we don't want to forward this call to the child processes of the main container process.
EDIT: @bernardhalas please complement me if there is something I missed or worth mentioning.
Description
By default, SIGTERM is an important signal for the processes which represents a request for graceful shutdown. It's used by Kubernetes whenever there's a pod termination requested (e.g. pod eviction, pod restarts upon liveness probe failure, etc.).
We need to ensure that Claudie services react properly to SIGTERM signal on all levels. This means:
terraformer
, upon receiving SIGTERM, should issue exactly one SIGTERM to theterraform
process (because multiple SIGTERM or SIGINT signals toterraform
process represent a request for a non-graceful shutdown; ref <<-- I couldn't find the src code, but at least referring to the issues that confirm this behavior)Exit criteria
terraform
process receives just one SIGTERM upon one SIGTERM sent toterraformer
podThe text was updated successfully, but these errors were encountered: