Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken ClusterConfig with dynamic config results in wiping system services #4721

Open
4 tasks done
jnummelin opened this issue Jul 5, 2024 · 1 comment
Open
4 tasks done

Comments

@jnummelin
Copy link
Member

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

No response

Version

No response

Sysinfo

`k0s sysinfo`
➡️ Please replace this text with the output of `k0s sysinfo`. ⬅️

What happened?

In case we end up having broken config in dynamic ClusterConfig, e.g. both externalAddress and NLLB enabled, the additional controller can wipe out the system services such as kube-router.

In this case the second controller joining had empty stacks for all system services. At some point in time it had, supposedly, been the leader for a bit and thus applied some of the stacks. The stacks being empty, kube-router and some other stacks were completely removed.

Steps to reproduce

  1. Create a controller with --dynamic-config and some config
  2. Once the controller is up, create invalid ClusterConfig. Invalid from k0s point of view but valid (enough) to be accepted by the API
  3. Bootup second controller with --dynamic-config
  4. Depending on timing, and leader elections, second controller can wipe out system stacks

Expected behavior

When we receive invalid dynamic config we would need to stop reconciling it completely to not possibly borking already functioning cluster completely.

Actual behavior

No response

Screenshots and logs

No response

Additional context

No response

Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Aug 15, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 22, 2024
@twz123 twz123 reopened this Aug 23, 2024
@github-actions github-actions bot removed the Stale label Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants