-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not reconcile egressIPPod objects that are being deleted when the namespace no longer exists #3751
Conversation
0fe5713
to
13e05ad
Compare
13e05ad
to
ea3cd78
Compare
hey @flavio-fernandes : dualstack is failing again on IC: https://github.com/ovn-org/ovn-kubernetes/actions/runs/5483719976/jobs/9990837646?pr=3751 PTAL! I really don't want to reopen our tracker card again HAHA |
… and the namespace no longer exists When a namespace with pods gets removed and the pod removal event is handled after the namespace is already gone, we should ignore the NotFound error in reconcileEgressIPPod. Any potential configuration will get removed in reconcileEgressIPNamespace. Signed-off-by: Patryk Diak <[email protected]>
ea3cd78
to
ab2776b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope we have coverage for this somewhere in our UTs? I am afraid each if condition we are adding is now for a bug fix (another sample: #3692 ; these should have tests...) and don't want someone in future (like i did for IC ROFL) to come and change these :D - won't hold the PR though for missing UT.
@@ -327,6 +327,12 @@ func (oc *DefaultNetworkController) reconcileEgressIPPod(old, new *v1.Pod) (err | |||
oldPod = old | |||
namespace, err = oc.watchFactory.GetNamespace(oldPod.Namespace) | |||
if err != nil { | |||
// when the whole namespace gets removed, we can ignore the NotFound error here | |||
// any potential configuration will get removed in reconcileEgressIPNamespace | |||
if new == nil && apierrors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyrtapz: what about the other paths? reconcileEgressIP might get called and let's say pod is not found or namespace is not found.. are there other places where this protection needs to be added? (Having said that each if in egressIP code at this point is a specific bug fix so its like walking on shells if we ignore something we shouldn't be..at long as at least one path handles the deletion we should be safe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reconcileEgressIPPod
doesn't fetch a pod so it won't be directly affected by pod not existing anymore.
As for namespace not found I think we should still return an error on update since this seems like a messed up setup if there are pods running in non-existing namespaces.
I can try think of how to trigger this specific issue in our tests but I think we might need to get that PR in first as it is affecting CI. |
So we don't forget... test could Then use an Eventually() calling oc.getNamespaceLocked("foo", false) and wait for a Then delete the pod and verify that it doesn't get retried. |
@tssurya I think the test is not being patient enough. Check this out:
The |
When a namespace with pods gets removed and the pod removal event is handled after the namespace is already gone, we should ignore the NotFound error in reconcileEgressIPPod. Any potential configuration will get removed in reconcileEgressIPNamespace.
Found in downstream CI, tracked in https://issues.redhat.com/browse/OCPBUGS-15804
Example error:
/cc @tssurya @jcaamano