Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing ct zone entries for container ports #204

Closed
rchicoli opened this issue Jul 19, 2023 · 2 comments
Closed

Missing ct zone entries for container ports #204

rchicoli opened this issue Jul 19, 2023 · 2 comments

Comments

@rchicoli
Copy link

rchicoli commented Jul 19, 2023

Lately I was trying to find out why some pods enter a CrashLoopBackOff state in a Kubernetes cluster running with the ovn-controller and ovnkube. It turns out that this issue was caused by a missing ct-zone on the bridge:

That happens when the system is overloaded and the interfaces could not be created because some internal errors (ofport=-1), then a new pod will reuse the same iface-id:

// Monday, July 17, 2023 7:35:21.982 AM
{"Interface":{"39fa6f4e-c67d-449b-aa88-c7586cf6ab9c":{"name":"8824323d5ff74ba","external_ids":["map",[["attached_mac","01:02:03:04:05:06"],["iface-id","stress-test-density-706_nginx-1-5745dddb7c-ldjkl"],["iface-id-ver","01c5fa41-83d0-439d-9289-b27abe137c87"],["ip_addresses","172.16.0.99/22"],["sandbox","8824323d5ff74bab767acf132914c8edd36c22fcef81c26a40a51a8b670cc7a3"]]]}},"Port":{"db765055-6020-43d2-8a22-35d9d18d9b1f":{"name":"8824323d5ff74ba","interfaces":["uuid","39fa6f4e-c67d-449b-aa88-c7586cf6ab9c"],"other_config":["map",[["transient","true"]]]}},"_date":1689579321982,"Bridge":{"4805dd32-af2a-4ac6-8917-3a27975c1ab5":{"ports":["uuid","db765055-6020-43d2-8a22-35d9d18d9b1f"]}},"_is_diff":true,"_comment":"ovs-vsctl (invoked by /usr/bin/ovnkube): /usr/bin/ovs-vsctl --timeout=30 add-port br-int 8824323d5ff74ba other_config:transient=true -- set interface 8824323d5ff74ba external_ids:attached_mac=01:02:03:04:05:06 external_ids:iface-id=stress-test-density-706_nginx-1-5745dddb7c-ldjkl external_ids:iface-id-ver=01c5fa41-83d0-439d-9289-b27abe137c87 external_ids:ip_addresses=172.16.0.99/22 external_ids:sandbox=8824323d5ff74bab767acf132914c8edd36c22fcef81c26a40a51a8b670cc7a3","Open_vSwitch":{"11d16522-1170-1121-b42b-49f93ee66dae":{"next_cfg":142523}}}
// Monday, July 17, 2023 7:35:25.957 AM
{"Interface":{"0fe370c5-d6a7-4aa4-9243-e818fdf629ba":{"ofport":22553},"38791b57-6f42-42c0-bf79-92ce8d7bc11f":{"ofport":22556},"41619eb1-ed6a-4365-8b48-dbeb0214e31a":{"ofport":22557},"c6b90d74-c38a-4ac5-8904-c43ff6a753db":{"ofport":22554},"28c13ef7-2277-4d4e-88ce-f4ae30a3b610":{"ofport":22555},"39fa6f4e-c67d-449b-aa88-c7586cf6ab9c":{"ofport":-1,"error":"could not open network device 8824323d5ff74ba (No such device)"}},"_date":1689579325957,"_is_diff":true,"Open_vSwitch":{"11d16522-1170-1121-b42b-49f93ee66dae":{"cur_cfg":142555}}}
// Monday, July 17, 2023 7:35:28.124 AM
{"Port":{"5502f07d-d1bb-42ee-b090-00af944a5f88":{"name":"300cf7d291f454f","interfaces":["uuid","39ba0dea-406d-489e-8757-deb799c757f0"],"other_config":["map",[["transient","true"]]]}},"Interface":{"39ba0dea-406d-489e-8757-deb799c757f0":{"name":"300cf7d291f454f","external_ids":["map",[["attached_mac","01:02:03:04:05:06"],["iface-id","stress-test-density-706_nginx-1-5745dddb7c-ldjkl"],["iface-id-ver","01c5fa41-83d0-439d-9289-b27abe137c87"],["ip_addresses","172.16.0.99/22"],["sandbox","300cf7d291f454fee923f5c58dad64411e4c982ccc4e916b6a1681e8667b2a50"]]]}},"_date":1689579328124,"Bridge":{"4805dd32-af2a-4ac6-8917-3a27975c1ab5":{"ports":["uuid","5502f07d-d1bb-42ee-b090-00af944a5f88"]}},"_is_diff":true,"_comment":"ovs-vsctl (invoked by /usr/bin/ovnkube): /usr/bin/ovs-vsctl --timeout=30 add-port br-int 300cf7d291f454f other_config:transient=true -- set interface 300cf7d291f454f external_ids:attached_mac=01:02:03:04:05:06 external_ids:iface-id=stress-test-density-706_nginx-1-5745dddb7c-ldjkl external_ids:iface-id-ver=01c5fa41-83d0-439d-9289-b27abe137c87 external_ids:ip_addresses=172.16.0.99/22 external_ids:sandbox=300cf7d291f454fee923f5c58dad64411e4c982ccc4e916b6a1681e8667b2a50","Open_vSwitch":{"11d16522-1170-1121-b42b-49f93ee66dae":{"next_cfg":142571}}}
// Monday, July 17, 2023 7:35:28.348 AM
{"Interface":{"cf3cc598-a384-4d01-b91e-ebc9da69351f":{"ofport":22563},"ad0ddbdc-faec-4d3e-806d-96285da88ed2":{"ofport":22564},"39ba0dea-406d-489e-8757-deb799c757f0":{"ofport":22565}},"_date":1689579328348,"_is_diff":true,"Open_vSwitch":{"11d16522-1170-1121-b42b-49f93ee66dae":{"cur_cfg":142572}}}
// Monday, July 17, 2023 7:35:49.909 AM
{"Interface":{"39ba0dea-406d-489e-8757-deb799c757f0"... "Bridge":{"4805dd32-af2a-4ac6-8917-3a27975c1ab5":{"external_ids":["map",["ct-zone-ichp-kubelet-density-706_nginx-1-5745dddb7c-ldjkl","183"]...}
// Monday, July 17, 2023 7:36:22.762 AM
{"Interface":{"7ff3cc64-2a9e-46bc-994f-58d363e8e9ac":null,"2ae5e60a-8412-48c9-9f20-253b2b536ce6":null,"5b0f8f5d-bc9e-4732-ac50-e67de72172f7":null,"6bec85ea-a09b-4d6d-98f8-0eaa8e85544e":null,"cbf29c8c-8d1b-4adb-8161-28f8efb7ff0c":null,"3b25eb5a-8a95-49a0-9a31-090ca404938b":null,"8e732e10-76c4-4849-ad1d-8b0df7cffc41":null,"39fa6f4e-c67d-449b-aa88-c7586cf6ab9c":null},"Port":{"8193b121-3d6c-4bb5-acba-c7be2b6bed88":null,"db765055-6020-43d2-8a22-35d9d18d9b1f":null,"d5471baf-764e-4dbc-aa2d-96869b7061d6":null,"388e3385-8573-4240-b914-a08aa3a623d2":null,"61336f6e-3691-435d-9dca-2445301240f1":null,"cb4fcab1-a80c-4819-8342-97b700622bc8":null,"132951aa-3e10-410c-b5d0-1b5b1341bef8":null,"bf2eb754-38a9-479a-a3ca-6f876315830b":null},"_date":1689579382762,"Bridge":{"4805dd32-af2a-4ac6-8917-3a27975c1ab5":{"ports":["set",[["uuid","132951aa-3e10-410c-b5d0-1b5b1341bef8"],["uuid","388e3385-8573-4240-b914-a08aa3a623d2"],["uuid","61336f6e-3691-435d-9dca-2445301240f1"],["uuid","8193b121-3d6c-4bb5-acba-c7be2b6bed88"],["uuid","bf2eb754-38a9-479a-a3ca-6f876315830b"],["uuid","cb4fcab1-a80c-4819-8342-97b700622bc8"],["uuid","d5471baf-764e-4dbc-aa2d-96869b7061d6"],["uuid","db765055-6020-43d2-8a22-35d9d18d9b1f"]]]}},"_is_diff":true,"_comment":"ovs-vsctl (invoked by /usr/bin/ovnkube): /usr/bin/ovs-vsctl --timeout=15 --if-exists --with-iface del-port b2d117d5fc0c4ec -- --if-exists --with-iface del-port 63cfedb3828707c -- --if-exists --with-iface del-port 8824323d5ff74ba -- --if-exists --with-iface del-port 58bdfe3ae2dacc8 -- --if-exists --with-iface del-port 3743fd52d7a73b4 -- --if-exists --with-iface del-port de100db60e6b221 -- --if-exists --with-iface del-port fe7b9333cb2f5ed -- --if-exists --with-iface del-port ddbeccc99518185","Open_vSwitch":{"08d1652-0470-4421-b42b-49f93ee66dae":{"next_cfg":142712}}}
// Monday, July 17, 2023 7:36:27.008 AM
{"_date":1689579387008,"Bridge":{"4805dd32-af2a-4ac6-8917-3a27975c1ab5":{"external_ids":["map",[["ct-zone-stress-test-density-706_nginx-1-5745dddb7c-ldjkl","183"]]]}},"_is_diff":true,"_comment":"ovn-controller\novn-controller: modifying OVS tunnels '55fc8491-3fff-4894-808b-937302978c36'"}

After removing the old port, I can see the logical port being released, what should not happen:

[ovn-controller] 2023-07-17T07:35:25.294Z|36242|binding|INFO|Claiming lport stress-test-density-706_nginx-1-5745dddb7c-ldjkl for this chassis.
[ovn-controller] 2023-07-17T07:35:25.294Z|36243|binding|INFO|stress-test-density-706_nginx-1-5745dddb7c-ldjkl: Claiming 01:02:03:04:05:06 172.16.0.99
[ovn-controller] 2023-07-17T07:35:25.961Z|36252|binding|INFO|Releasing lport stress-test-density-706_nginx-1-5745dddb7c-ldjkl from this chassis (sb_readonly=0)
--->
[ovn-controller] 2023-07-17T07:35:28.353Z|36280|binding|INFO|Claiming lport stress-test-density-706_nginx-1-5745dddb7c-ldjkl for this chassis.
[ovn-controller] 2023-07-17T07:35:28.353Z|36281|binding|INFO|stress-test-density-706_nginx-1-5745dddb7c-ldjkl: Claiming 01:02:03:04:05:06 172.16.0.99
[ovn-controller] 2023-07-17T07:35:49.878Z|36729|binding|INFO|Setting lport stress-test-density-706_nginx-1-5745dddb7c-ldjkl ovn-installed in OVS
[ovn-controller] 2023-07-17T07:35:49.878Z|36730|binding|INFO|Setting lport stress-test-density-706_nginx-1-5745dddb7c-ldjkl up in Southbound
--->
[ovn-controller] 2023-07-17T07:36:22.767Z|37013|binding|INFO|Releasing lport stress-test-density-706_nginx-1-5745dddb7c-ldjkl from this chassis (sb_readonly=0) # this is BAD <--

It is important to notice, if the old interface has been remove with iface-id, before running the del-port command, then it seems to work:

ovs-vsctl --timeout=30 remove Interface 39fa6f4e-c67d-449b-aa88-c7586cf6ab9c external-ids iface-id

At the end, a pod crashes only if the port ID of the corresponding new interface is listed on the bridge:

sh-4.4# ovs-vsctl list bridge | grep -c 5502f07d-d1bb-42ee-b090-00af944a5f88
1

But the required ct-zone is missing, so the logical flows with the priority=120 related to table 7 and 12 will be missing too:

sh-4.4# ovs-vsctl list bridge | grep -c stress-test-density-706_nginx-1-5745dddb7c-ldjkl
0

The new created pod has internal networking, but the network packages cannot be routed properly and the healthchecks will fail causing the container to enter the CrashLoopBackOff state.

I've created a fix for the ovn-kubernetes code, but I believe this is a bug on the ovn-controller, because the ct-zone should not be deleted if there is a corresponding interface attached to the bridge. If not, please feel free to close this issue.

Thanks in advanced and let me know, if I should provide more information to that.

Here is the related PR for ovn-org/ovn-kubernetes#3784

A similar topic found:

@dceara
Copy link
Collaborator

dceara commented Jan 23, 2024

@rchicoli sorry for the delay in replying, does this still happen with the latest version ovn-kubernetes uses upstream (I think that's ovn23.09.x from Fedora)?

@rchicoli
Copy link
Author

I am not actively taking care of the platform anymore. It is a little upset that required "retry" option wasn't taking in consideration, although it had fixed a huge problem when the system was overloaded. Anyway I've heard the performance is better with the latest releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants