Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use libovsdb for node-side operations #3616

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Conversation

dcbw
Copy link
Contributor

@dcbw dcbw commented May 25, 2023

Instead of doing a lot of ovs-vsctl in the hotpath of pod setup, create an ovsdb client for the node's vswitch database once at startup, and use that to make changes to OVS on the local node.

@dcbw dcbw force-pushed the node-libovsdb branch 2 times, most recently from 1eb8744 to 718e4e9 Compare June 5, 2023 18:08
@dcbw dcbw force-pushed the node-libovsdb branch 10 times, most recently from 688cc62 to 532ab33 Compare July 6, 2023 21:03
@dcbw dcbw changed the title [wip] use libovsdb for node-side operations Use libovsdb for node-side operations Jul 7, 2023
@dcbw dcbw changed the title Use libovsdb for node-side operations [wip] Use libovsdb for node-side operations Jul 10, 2023
@dcbw dcbw force-pushed the node-libovsdb branch 2 times, most recently from 21c8691 to f4acc30 Compare July 11, 2023 20:14
@dcbw dcbw changed the title [wip] Use libovsdb for node-side operations Use libovsdb for node-side operations Jul 12, 2023
@dcbw
Copy link
Contributor Author

dcbw commented Jul 12, 2023

We verified these changes don't fail on both DPU modes too (DPUHost and regular DPU)

@dcbw
Copy link
Contributor Author

dcbw commented Jul 19, 2023

Backporting these patches to OCP 4.13 downstream (to isolate the changes to ovnkube before all the IC stuff landed) shows the following P99 PodReadyLatency for the "density-light" tests that just create 250ppn @ 120 nodes as quickly as possible, all in one namespace:

Before avg: 3.29s
After avg:  3.14s (~5% better)

(runs with this patchset)
5b98b510-node-density-20230719 (2.94s)
311fd91d-node-density-20230718 (3.55s)
f0fbacfa-node-density-20230719 (2.93s)
8c1b8275-node-density-20230719 (3.07s)
4dac55e7-node-density-20230719 (3.21s)

existing runs of OCP 4.13 are 3 - 3.5 seconds, usually the higher side of that range. So we are at least not regressing and perhaps have a minor improvement in latency.

What's even better than a small increase in P99 PodReady latency is the CPU savings (aggregate across 120 nodes):

Before: 1964% avg/2852% max
After:  1741% avg/2352% max (~12% better avg, 18% better max)

dcbw added 6 commits July 20, 2023 08:52
Used when we absolutely do not want to create the object (like a bridge)
if it doesn't exist, because that should be a hard error, but just want
to update it. Mostly to make sure testcases do the right thing, but we
never actually want to create 'br-int' either, so it works for both
testcases and real code.

Signed-off-by: Dan Williams <[email protected]>
Complement to Lookup() for chained operations.

Signed-off-by: Dan Williams <[email protected]>
cmdAdd/cmdDel use similar setup code; consolidate it. Also make
cmdAdd use the same error logging mechanism as cmdDel does.

Signed-off-by: Dan Williams <[email protected]>
foundID, ok := iface.ExternalIDs["iface-id"]
return ok && foundID == ifaceID
}
if err := libovsdbops.DeleteInterfacesWithPredicate(vsClient, p); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you define trhe predicate before the function call where just below you define the predicate in the function call. Why not set this up the same both times?

return ops, err
}

func (m *modelClient) createOrUpdateOps(ops []ovsdb.Operation, opModels ...operationModel) (interface{}, []ovsdb.Operation, error) {
func (m *modelClient) createOrUpdateOps(ops []ovsdb.Operation, doCreate bool, opModels ...operationModel) (interface{}, []ovsdb.Operation, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does doCreate define if we want to actually create it or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacobTanenbaum whether creation is allowed or not since the function is now called from both CreateOrUpdate() and just plain Update(). I think I'll change to allowCreate, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good to me

@@ -437,6 +437,11 @@ func (m *modelClient) Lookup(opModels ...operationModel) error {
return err
}

func (m *modelClient) LookupOps(ops []ovsdb.Operation, opModels ...operationModel) ([]ovsdb.Operation, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lookup will never return any ops, because lookup doesn't change db state. And that is why we have (m *modelClient) Lookup(opModels ...operationModel) error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering the same thing

Comment on lines +256 to +258
a) performs a lookup of the models in the cache by ModelPredicate if provided,
or by Model otherwise. If the models do not exist and ErrNotFound is set,
it returns ErrNotFound
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have copied this from an old comment.

I think currently we do lookups by model first. We do predicates if a model was not provided or a model without indexes was provided.

I also wonder what is going to happen when ErrNotFound is not set. It might become a noop with no error and perhaps we would like to return ErrNotFound regardless.

@@ -437,6 +437,11 @@ func (m *modelClient) Lookup(opModels ...operationModel) error {
return err
}

func (m *modelClient) LookupOps(ops []ovsdb.Operation, opModels ...operationModel) ([]ovsdb.Operation, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering the same thing

}

// FindInterfaceByName looks up an Interface from the cache by name
func FindInterfaceByName(vsClient libovsdbclient.Client, ifaceName string) (*vswitchdb.Interface, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

By convention, these type of methods would have this type of signature

func GetInterface(vsClient libovsdbclient.Client, interface *vswitchdb.Interface) (*vswitchdb.Interface, error) 

The idea was hat we would not have to care on this interface which data within vswitchdb.Interface serves as an index, although in cases where we have a single index, where more are unlikely to be added, matters less.


// CreateQoS deletes QoS records with a "sandbox" ExternalID that
// matches the given sandbox ID
func CreateQoS(vsClient libovsdbclient.Client, sandboxID string, maxRateBPS int64) (*vswitchdb.QoS, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

By convention, these type of methods would have this signature

func CreateQoS(vsClient libovsdbclient.Client, qos *vswitchdb.QoS) error


// CreateOrUpdatePortAndAddToBridge creates or updates the provided Interface
// Interfae template, provided Port template, and adds the Port to the given bridge
func CreateOrUpdatePortAndAddToBridge(vsClient libovsdbclient.Client, bridgeUUID string, portTemplate *vswitchdb.Port, ifaceTemplate *vswitchdb.Interface) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

By convention, these type of methods would have this signature

func CreateOrUpdateInterfacePortOnBridge(nbClient libovsdbclient.Client, bridge *vswitchdb.Bridge, port *vswitchdb.Port, interface *vswitchdb.Interface) error

I guess that we don't care whether we are provided a bridge UUID or a bridge name.

Perhaps we could panic if ifaceTemplate.Name != portTemplate.Name

Comment on lines +222 to +236
portNames := []string{}
opModels := make([]operationModel, 0, 2)
opModels = append(opModels, operationModel{
ExistingResult: &[]*vswitchdb.Interface{},
ModelPredicate: func(item *vswitchdb.Interface) bool {
foundID, ok := item.ExternalIDs["sandbox"]
if ok && foundID == sandboxID {
portNames = append(portNames, item.Name)
return true
}
return false
},
ErrNotFound: false,
BulkOp: true,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use FindInterfacesWithPredicate?

Also this looks like a second order function, or a composition of ops (search on some place, and act in some other place). A nit would be to have a simpler re-usable func DeleteQoSFromPort(vsClient libovsdbclient.Client, port *vswitchdb.Port, qos *vswitchdb.QoS) and move this composition to the user package or, if it is to be shared, to the new libovsdb/util package

go-controller/pkg/libovsdbops/vswitchd.go Show resolved Hide resolved
go-controller/pkg/cni/cnishim.go Show resolved Hide resolved
go-controller/pkg/cni/ovs.go Show resolved Hide resolved
go-controller/pkg/cni/ovs.go Show resolved Hide resolved
@tssurya tssurya self-assigned this May 13, 2024
@tssurya tssurya added area/code-cleanup Issues related to code clean ups, FUPs, refactors, addressing review comments in FUPs area/scale&performance All issues related to scale & performance labels May 13, 2024
@trozet
Copy link
Contributor

trozet commented May 16, 2024

I'll take this PR over for @dcbw as I dont think he has any plans to update it.

@trozet trozet marked this pull request as draft May 29, 2024 13:43
@trozet
Copy link
Contributor

trozet commented May 29, 2024

@jcaamano will take this over

@trozet
Copy link
Contributor

trozet commented May 29, 2024

Related PR: #3784

Copy link

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or reach out to maintainers for code reviews or consider closing this if you do not plan to work on it.

@github-actions github-actions bot added the lifecycle/stale All issues (> 60 days) and PRs (>90 days) with no activity. label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/code-cleanup Issues related to code clean ups, FUPs, refactors, addressing review comments in FUPs area/scale&performance All issues related to scale & performance lifecycle/stale All issues (> 60 days) and PRs (>90 days) with no activity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants