Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore: Verify SIGTERM handling behavior #1291

Closed
3 tasks
bernardhalas opened this issue Mar 22, 2024 · 3 comments
Closed
3 tasks

Chore: Verify SIGTERM handling behavior #1291

bernardhalas opened this issue Mar 22, 2024 · 3 comments
Assignees
Labels
chore A chore is updating dependencies, etc; no significant code changes groomed Task that everybody agrees to pass the gatekeeper

Comments

@bernardhalas
Copy link
Member

Description

By default, SIGTERM is an important signal for the processes which represents a request for graceful shutdown. It's used by Kubernetes whenever there's a pod termination requested (e.g. pod eviction, pod restarts upon liveness probe failure, etc.).

We need to ensure that Claudie services react properly to SIGTERM signal on all levels. This means:

  1. Ensure that all Docker images propagate SIGTERM to the running processes (namely PID 0 within container). Tini can be used for that purpose.
  2. Services themselves respect SIGTERM.
  3. terraformer, upon receiving SIGTERM, should issue exactly one SIGTERM to the terraform process (because multiple SIGTERM or SIGINT signals to terraform process represent a request for a non-graceful shutdown; ref <<-- I couldn't find the src code, but at least referring to the issues that confirm this behavior)

Exit criteria

  • Ensure all Claudie images forward SIGTERM do processes
  • Services respect SIGTERM (if not, a new Issue can be opened to services which don't respect SIGTERM)
  • terraform process receives just one SIGTERM upon one SIGTERM sent to terraformer pod
@bernardhalas bernardhalas added groomed Task that everybody agrees to pass the gatekeeper chore A chore is updating dependencies, etc; no significant code changes labels Mar 22, 2024
@bernardhalas
Copy link
Member Author

As agreed, during today's grooming, passing directly to Jakub.

@JKBGIT1
Copy link
Contributor

JKBGIT1 commented Mar 27, 2024

Ensure all Claudie images forward SIGTERM do processes

All Claudie images forward SIGTERM to main the process of the container, but not to the child processes of this process.

Services respect SIGTERM (if not, a new Issue can be opened to services which don't respect SIGTERM)

I would say that services respect SIGTERM partially. Since the workflow heavy lifting is performed by the child processes of the Claudie services, the Claudie services have to wait until the child processes finish (No SIGTERM is forwarded to the child processes), and then the Claudie processes terminate.

terraformer, upon receiving SIGTERM, should issue exactly one SIGTERM to the terraform process (because multiple SIGTERM or SIGINT signals to terraform process represent a request for a non-graceful shutdown;

SIGTERM isn't forwarded to the terraform process, because it is a child process of the terraformer main container process.

Also, if you SIGTERM terraform process it is shutdown gracefully, but the terraformer re-run the terraform process as part of the retry loop. In my case it broke the TF state (error regarding already existing resource) and the workflow failed.

@JKBGIT1
Copy link
Contributor

JKBGIT1 commented Mar 28, 2024

I'm closing this issue in favor of #1298

We had a discussion about the current way of handling the SIGTERM calls. To summarize, Claudie's services listen to SIGTERM calls. The SIGTERM is forwarded to the main process of the container, then the call is handled programmatically by us. Besides that, we don't want to forward this call to the child processes of the main container process.

EDIT: @bernardhalas please complement me if there is something I missed or worth mentioning.

@JKBGIT1 JKBGIT1 closed this as completed Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore A chore is updating dependencies, etc; no significant code changes groomed Task that everybody agrees to pass the gatekeeper
Projects
None yet
Development

No branches or pull requests

2 participants