Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1571] Fix flaky test - pushdata timeout will add to pushExcludedWorker #2697

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cxzl25
Copy link
Contributor

@cxzl25 cxzl25 commented Aug 20, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Because the worker port is in use, the driver's worker status may change from shutdown status to unknown, causing the test to fail.

https://github.com/apache/celeborn/actions/runs/10465286274/job/28980278764

- celeborn spark integration test - pushdata timeout will add to pushExcludedWorkers *** FAILED ***
  WORKER_UNKNOWN did not equal PUSH_DATA_TIMEOUT_PRIMARY, and WORKER_UNKNOWN did not equal PUSH_DATA_TIMEOUT_REPLICA (PushDataTimeoutTest.scala:150)

unit-tests.log

24/08/20 05:28:30,400 INFO [celeborn-dispatcher-7] Master: Receive ReportNodeFailure [
Host: localhost
RpcPort: 41487
PushPort: 34259
FetchPort: 45713
ReplicatePort: 35107
InternalPort: 41487

24/08/20 05:29:29,414 WARN [celeborn-client-lifecycle-manager-change-partition-executor-3] WorkerStatusTracker: 
Reporting failed workers:
Host:localhost:RpcPort:42267:PushPort:43741:FetchPort:46483:ReplicatePort:43587   PUSH_DATA_TIMEOUT_PRIMARY   2024-08-19T22:29:29.414-0700
Current unknown workers:
Host:localhost:RpcPort:41487:PushPort:34259:FetchPort:45713:ReplicatePort:35107:InternalPort:41487   2024-08-19T22:29:29.108-0700
Current shutdown workers:
Host:localhost:RpcPort:41487:PushPort:34259:FetchPort:45713:ReplicatePort:35107:InternalPort:41487

Does this PR introduce any user-facing change?

No

How was this patch tested?

GA

@cxzl25 cxzl25 force-pushed the CELEBORN-1571 branch 2 times, most recently from ee1a95a to 5ca14c9 Compare August 29, 2024 05:47
Copy link

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale label Sep 18, 2024
@cxzl25 cxzl25 removed the stale label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant