Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config flag 'enable-multi-external-gateway' #3715

Merged
merged 1 commit into from
Jul 7, 2023

Conversation

pliurh
Copy link
Contributor

@pliurh pliurh commented Jun 28, 2023

- What this PR does and why is it needed
Add config flag 'enable-multi-external-gateway'. It allows users to enable/disable the feature.

- Special notes for reviewers

- How to verify it

- Description for the changelog

@tssurya
Copy link
Member

tssurya commented Jun 28, 2023

qq, specially since this came up in our team meeting yesterday @pliurh : Since microshift doesn't run the CNO, how do you use these flags to disable features on the microshift side? cc @jcaamano and @kyrtapz

@pliurh
Copy link
Contributor Author

pliurh commented Jun 28, 2023

qq, specially since this came up in our team meeting yesterday @pliurh : Since microshift doesn't run the CNO, how do you use these flags to disable features on the microshift side? cc @jcaamano and @kyrtapz

If a flag is unset, it means disabled. We use the microshift binary to render ovnkube manifests, we don't need to generate them dynamically.

@pliurh pliurh changed the title WIP: Add config flag 'enable-multi-external-gateway' Add config flag 'enable-multi-external-gateway' Jun 28, 2023
dist/images/ovnkube.sh Outdated Show resolved Hide resolved
Comment on lines 139 to 147
if config.OVNKubernetesFeature.EnableMultiExternalGateway {
nc.apbExternalRouteNodeController, err = apbroute.NewExternalNodeController(
cnnci.apbExternalRouteClient,
nc.watchFactory.PodCoreInformer(),
nc.watchFactory.NamespaceInformer(),
stopChan)
if err != nil {
return nil, err
}
Copy link
Contributor

@jcaamano jcaamano Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe this can be initialized closer to where is used as nc.apbExternalRouteNodeController.Run so we just check EnableMultiExternalGateway once but this is just a thought I had, no problems if it is left as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the controller initialization to where nc.apbExternalRouteNodeController.Run is invoked. But for the master code in default_network_controller.go, I keep the apbroutecontroller.NewExternalMasterController, since there are many other codes depends on the objects in the controller instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pliurh can you explain what happens in those situations where the code depends on apbroutecontroller but EnableMultiExternalGateway is not set?

For example:

  • addGWRoutesForPod seems to be doing something when it might not have to
  • Also apbExternalRouteController.Repair might be doing something different that what it would be supposed to do, if for example, we restart from a situation where EnableMultiExternalGateway was set to a being unset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When EnableMultiExternalGateway is unset. The apbExternalRouteController will not be running. So all the informers in the controller will not be started. I suppose addGWRoutesForPod will do nothing for the apbRoutePolicies.

apbExternalRouteController.Repair is more complicated, since it will also do things based on the annotation of pods and namespaces. If a cluster is always with EnableMultiExternalGateway flag unchanged, it's safe. However with the case you talked about, if the associated annotations for pods and namespace are not cleaned, I am not sure what will happen.

Maybe @jordigilh can shed some light on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that func (oc *DefaultNetworkController) addGWRoutesForPod still looks like it does something.
Is the plan to keep legacy egress gw functionality operating through pod & namespace annotations even if this flag is not set?

In regards to the repair, we decided we should not really be concerned with cleaning up if the flag transitioned from enabled to disabled, which is comparable to what we do with other toggle features. However, it looks like the repair might still be creating things even if the flag is unset, from processOVNRoute called in Repair.

So should we just consider that the flag also applies to the legacy egress gw support and just check for this flag in all these places and just disable everything?

Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess my only real concern here is that now that the sync for ecmp/exgw stuff has been moved from the default network controller to apb controller, by running ovnk without the apb controller enabled, it wont sync that stuff. I guess in microshift we dont care since we never used the legacy feature in the first place and in openshift we will always have the flag enabled, so I guess this isn't really a concern.

@jcaamano
Copy link
Contributor

@pliurh It looks lñike there are several places where we try to use apbExternalRouteNodeController where we will need a check now if EnableMultiExternalGateway is set.

Consequently some tests that depend on it will need to set that as well.

@pliurh pliurh force-pushed the external_gateway_flag branch 6 times, most recently from c72aa93 to 1d0ab36 Compare June 30, 2023 03:30
@pliurh
Copy link
Contributor Author

pliurh commented Jun 30, 2023

/retest

@ovn-robot
Copy link
Collaborator

Oops, something went wrong:

Must have admin rights to Repository.

@pliurh pliurh force-pushed the external_gateway_flag branch 3 times, most recently from 91f2a8b to 4a93a85 Compare July 3, 2023 01:59
@coveralls
Copy link

coveralls commented Jul 3, 2023

Coverage Status

coverage: 53.483% (+0.04%) from 53.444% when pulling 4a93a85 on pliurh:external_gateway_flag into 6870575 on ovn-org:master.

@pliurh pliurh closed this Jul 3, 2023
@pliurh pliurh reopened this Jul 3, 2023
@tssurya
Copy link
Member

tssurya commented Jul 3, 2023

2023-07-03T06:48:41.7227032Z �[91m�[1mSummarizing 4 Failures:�[0m
2023-07-03T06:48:41.7232010Z 
2023-07-03T06:48:41.7232864Z �[91m�[1m[Fail] �[0m�[90me2e ingress traffic validation �[0m�[0mValidating ingress traffic �[0m�[91m�[1m[It] Should be allowed by nodeport services �[0m
2023-07-03T06:48:41.7233568Z �[37m/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/e2e.go:1844�[0m
2023-07-03T06:48:41.7233933Z 
2023-07-03T06:48:41.7234701Z �[91m�[1m[Fail] �[0m�[90me2e ingress traffic validation �[0m�[0mValidating ingress traffic �[0m�[91m�[1m[It] Should be allowed by nodeport services �[0m
2023-07-03T06:48:41.7235959Z �[37m/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/e2e.go:1844�[0m
2023-07-03T06:48:41.7236556Z 
2023-07-03T06:48:41.7237088Z �[91m�[1m[Fail] �[0m�[90me2e IGMP validation �[0m�[91m�[1m[It] can retrieve multicast IGMP query �[0m
2023-07-03T06:48:41.7237792Z �[37m/home/runner/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113�[0m
2023-07-03T06:48:41.7238083Z 
2023-07-03T06:48:41.7245509Z �[91m�[1m[Fail] �[0m�[90me2e IGMP validation �[0m�[91m�[1m[It] can retrieve multicast IGMP query �[0m
2023-07-03T06:48:41.7246223Z �[37m/home/runner/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113�[0m
2023-07-03T06:48:41.7246619Z 
2023-07-03T06:48:41.7246987Z �[1m�[91mRan 82 of 228 Specs in 3571.601 seconds�[0m
2023-07-03T06:48:41.7252357Z �[1m�[91mFAIL!�[0m -- �[32m�[1m80 Passed�[0m | �[91m�[1m2 Failed�[0m | �[33m�[1m0 Flaked�[0m | �[33m�[1m0 Pending�[0m | �[36m�[1m146 Skipped�[0m
2023-07-03T06:48:41.7257607Z 
2023-07-03T06:48:41.7258052Z �[38;5;228mYou're using deprecated Ginkgo functionality:�[0m
2023-07-03T06:48:41.7258670Z �[38;5;228m=============================================�[0m

ah again?! this is the second time i am running into this.. we need to track this as an issue: https://github.com/ovn-org/ovn-kubernetes/actions/runs/5440807530/jobs/9894410748?pr=3715
First occurrence where I saw this: #3663 (comment)

@pliurh pliurh reopened this Jul 4, 2023
@pliurh pliurh closed this Jul 4, 2023
@pliurh pliurh reopened this Jul 4, 2023
@jcaamano
Copy link
Contributor

jcaamano commented Jul 6, 2023

So I guess my only real concern here is that now that the sync for ecmp/exgw stuff has been moved from the default network controller to apb controller, by running ovnk without the apb controller enabled, it wont sync that stuff. I guess in microshift we dont care since we never used the legacy feature in the first place and in openshift we will always have the flag enabled, so I guess this isn't really a concern.

It won't sync whatever is sync'ing but after talking with @jordigilh that should not fundamentally affect the legacy extgw support through annotations.

So I guess the approach we take here is to just disable support through the CRD. So we might be good to go.

@pliurh
Copy link
Contributor Author

pliurh commented Jul 6, 2023

I tested disabling the flag and configuring MEG with annotations. This flag cannot block the annotation from working. So yes, it can only disable the support of the CRD approach. Disabling the annotation approach is more complicated. I suggest we take it as a tech debt, and resolve it in the future.

@jcaamano jcaamano merged commit d3b10e8 into ovn-org:master Jul 7, 2023
24 of 25 checks passed
@pliurh pliurh deleted the external_gateway_flag branch July 7, 2023 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants