Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport E2E fixes #4742

Open
wants to merge 14 commits into
base: release-1.0
Choose a base branch
from

Conversation

kyrtapz
Copy link
Contributor

@kyrtapz kyrtapz commented Sep 19, 2024

Fixes: #4733
Fixes: #4719

cherry picked from commit d6f145e
clean backport of #4385
cherry picked from commit 45abadb with slight changes, see commit message.
cherry picked from commit 511e9c6

Fixes: ovn-org#4733

Signed-off-by: Patryk Diak <[email protected]>
(cherry picked from commit d6f145e)
@kyrtapz kyrtapz requested a review from a team as a code owner September 19, 2024 16:37
@kyrtapz kyrtapz requested review from flavio-fernandes and removed request for a team September 19, 2024 16:37
@github-actions github-actions bot added area/e2e-testing feature/services&endpoints All issues related to the Servces/Endpoints API labels Sep 19, 2024
Copy link
Contributor

@dceara dceara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@tssurya
Copy link
Member

tssurya commented Sep 20, 2024

I will hold merge till we figure out the new flakes that are appearing: the only known flakes I am aware of in release-1.0 are: #4714 (comment)

@dceara
Copy link
Contributor

dceara commented Sep 20, 2024

I will hold merge till we figure out the new flakes that are appearing: the only known flakes I am aware of in release-1.0 are: #4714 (comment)

shard-conformance fails with (we see this often in ovn-org/ovn CI too):

Error: The action 'Run Tests' has timed out after 120 minutes.

The control-plane failures are:

 Summarizing 1 Failure:
  [FAIL] Services of type NodePort [It] should work on secondary node interfaces for ETP=local and ETP=cluster when backend pods are also served by EgressIP
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/e2e.go:351

 Summarizing 2 Failures:
  [FAIL] e2e egress firewall policy validation [AfterEach] Should validate the egress firewall DNS does not deadlock when adding many dnsNames
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/egress_firewall.go:113
  [FAIL] e2e egress firewall policy validation [AfterEach] Should validate the egress firewall allows inbound connections
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/egress_firewall.go:113

The last one seems to be because of:

2024-09-19T18:04:26.7701454Z   �[38;5;9m[FAILED] failed to remove ip address 172.18.1.13 from node ovn-control-plane, err: "failed to run \"docker exec ovn-control-plane ip addr delete 172.18.1.13/32 dev breth0\": exit status 2 (Error: ipv4: Address not found.\n)"�[0m

Which is what the original fix was trying to fix but for egress_firewall.. It seems to me we're just missing this commit on release-1.0:
217ec65

Should we cherry pick that one too?

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit ab00a4b)
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit ba8766a)
Update connectivity timeout to 3 seconds and allow 2 retries both for
positive and negative cases

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 09b08ec)
work with ipv6, since github runners don't have any routes for IPv6.
Split current test that checks allow IP and allow CIDR+port into 2
tests to limit the amount of required external containers.
Bonus: the only test that used external containers doesn't need to
create them anymore, as they are created in beforeEach

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 2372a08)
avoid unneeded container creation. No extra changes

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 82b2bf1)
affected by deny all.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit ded63dd)
To verify no deadlock, we need an intensive follow up workload.
Node-selector testing work the best, as node events handling includes
iterating over all egress firewalls internally.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit e2f4c74)
that needs it. Use defer to cleanup instead of afterEach, as afterEach
should cleanup resources created by beforeEach.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 217ec65)
Remove unneeded external IPs from the deadlock test, as multiple
unresolvable ds names is the main ingredient.
Fix ip:port formatting for ipv6.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 3fac0f3)
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 243c48d)
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 8c1e9ed)
@github-actions github-actions bot added the feature/egress-firewall All issues related to egress firewall label Sep 20, 2024
Test opens a TCP connection that simulates a GCP LB environment where
the packet is redirected via iptables to a local server on a node. Note,
in GCP the LB does not DNAT the VIP, so the packet arrives to the node
with the GCP VIP on it. In OCP, we then redirect that packet to the
local kapi server running on the node.

Once the test opens the TCP connection, it leaves it open for 2 minutes
while ovnkube-node is then deleted. Post ovn-controller starting it
should not flush the conntrack in zone 0, and the test ensures that the
conntrack entry still exists.

Recent OVN regression that prompted this E2E: https://issues.redhat.com/browse/FDP-773

NOTE: The release-1.0 backport generated conflicts because on that
branch the test didn't check IPv6.

Signed-off-by: Tim Rozet <[email protected]>
(cherry picked from commit 45abadb)
Signed-off-by: Dumitru Ceara <[email protected]>
@kyrtapz kyrtapz changed the title Services E2Es: Reset test variables before each test Backport E2E fixes Sep 20, 2024
The previous mask was invalid and docker was failing with:
invalid subnet 2001:db8:abcd:1234:c000::/64: it should be 2001:db8:abcd:1234::/64

Signed-off-by: Patryk Diak <[email protected]>
(cherry picked from commit 511e9c6)
@github-actions github-actions bot added the feature/egress-ip Issues related to EgressIP feature label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/e2e-testing feature/egress-firewall All issues related to egress firewall feature/egress-ip Issues related to EgressIP feature feature/services&endpoints All issues related to the Servces/Endpoints API
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

5 participants