Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make deliberately scaled-in unstarted blocks not be failures #3594

Merged
merged 14 commits into from
Aug 26, 2024

Conversation

benclifford
Copy link
Collaborator

@benclifford benclifford commented Aug 19, 2024

This PR adds a new terminal job state, SCALED_IN. None of the existing providers will return it, but the scaling layer will use it to mark a job as deliberately scaled in, so that error handling code will not regard it as failed.

Fixes #3568

@benclifford benclifford changed the title [not for merge] test some WIP in CI Make deliberately scaled-in unstarted blocks not be failures Aug 23, 2024
@benclifford benclifford requested review from khk-globus, yadudoc and rjmello and removed request for khk-globus August 26, 2024 09:55
@benclifford benclifford marked this pull request as ready for review August 26, 2024 09:56
Copy link
Collaborator

@khk-globus khk-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block comment is unfortunately necessary, so thank you for including it. The rest are window dressing suggestions for clarity.

# permanent htex failure, and so the task execution below would raise
# a BadStateException rather than attempt to run the task.

assert htex.provider.launcher.prepend != "", "Pre-req: prepend attribute should exist and be non-empty"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would use of .startswith() work? If so, it makes for a stronger assertion / good faith in the test. Perhaps:

_inf_sleep = "sleep inf ; "
...
    launcher=WrappedLauncher(prepend=_inf_sleep),
...
    assert htex.provider.launcher.prepend.startswith(_inf_sleep), "..."

Comment on lines +68 to +69
try_assert(lambda: len(htex.status_facade) == 1 and htex.status_facade['0'].terminal,
timeout_ms=10000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the fail_msg argument to explain that this is a block. (Analogous to assert ..., "fail_msg")

Comment on lines +84 to +85
htex.provider.launcher.prepend = ""
assert task().result() == 7
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you distill the comment above into the fail msg for this assert? Not an easy task, I know, but when this does fail, makes it that easier to dig in.

@benclifford benclifford merged commit 4ea3fbc into master Aug 26, 2024
7 checks passed
@benclifford benclifford deleted the benc-talk branch August 26, 2024 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MISSING job detection vs scale in of unstarted blocks
2 participants