Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BlockIdManagerSelector #3560

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

matthewc2003
Copy link
Contributor

@matthewc2003 matthewc2003 commented Jul 31, 2024

Description

Adds the BlockIdManagerSelector to the list of available manager selectors. This selector returns a sorted list of managers by their block id, from greatest (newest) to least (oldest). Will be used to more effectively distribute tasks. Eventually, this selector will be used in conjunction with some method of sorting the task queue based on runtime to a) ensure tasks won't get assigned to managers that have a shorter remaining wall time than that of the task's execution time and b) put tasks with similar execution times on the same block, thereby resulting in a higher likelihood that the block completely empties of tasks, allowing the scale down logic to kick in.

Changed Behavior

Users can now choose the BlockIdManagerSelector() in the HTEX config.

Fixes

Work as part of #3323

Type of change

  • New feature

@benclifford
Copy link
Collaborator

you might elaborate on "Will be used to more effectively distribute tasks."

@benclifford benclifford marked this pull request as draft August 6, 2024 10:10
@benclifford
Copy link
Collaborator

switching this to draft status because @matthewc2003 is working on gathering evidence on when this does and does not help.

@matthewc2003
Copy link
Contributor Author

As a result of some experimentation I've found the following:

  1. BlockID manager selector helps with workloads that see a varying amount of tasks over time. Example: Cholesky factorization workload run through the TaPS benchmarking suite
    Random Manager Selector:
    random-lg
    Block Id Manager Selector:
    seesaw-lg
    We see new blocks are prioritized with the blockID manager selector, when used with 'htex_auto_scaling', results in compute cost savings.
  2. Doesn't really work with bag-of-tasks workloads. When all the tasks are put into the queue upfront, all blocks operate at near full utilization for the majority of the workload, which task goes where doesn't really matter.

@matthewc2003 matthewc2003 marked this pull request as ready for review August 21, 2024 20:07
@benclifford
Copy link
Collaborator

some documentation ideas:

the same summary you posted above, put into a docstring for the block id manager selector, and then add to docs/reference.rst (perhaps in a new section)

add a description of manager_selector to the HighThroughputExecutor selector, with a link to the docstring on block id manager selector.

@@ -19,6 +19,14 @@ def sort_managers(self, ready_managers: Dict[bytes, ManagerRecord], manager_list

class RandomManagerSelector(ManagerSelector):

"""Returns a shuffled list of interesting_managers

Maintains the behavior of the original interchange. Works well
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docs should primarily be about the version now, rather than about changes since some other version the user isn't using, so you could get rid of this first sentence.

a related thing to express would be that this is (I think) the default - see how some other docstrings on HighThroughputExecutor talk about defaults

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but the commit message that this gets merged under (which hopefully will come from the description text of this PR) is the right place to describe how the new code is different/similar to the old code - so describing how the default behaviour now is the same or different to the previous behaviour before this is merged is relevant there: the audience for that is people who want to know what has changed in parsl, vs audience for the user guide is what is stuff like now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded to identify the random manager selector as the default

@@ -78,6 +78,16 @@ Executors
parsl.executors.FluxExecutor
parsl.executors.radical.RadicalPilotExecutor

Executors Miscellaneous
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could call this Manager selectors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@benclifford
Copy link
Collaborator

It would be good to get some testing in here for the newly introduced code.

There are a few things you could do:

make a test that invokces BlockIdManagerSelector.sort_managers with some example inputs and check that it sorts them the right way

launch a DFK with the BlockIdManagerSelector configured and check that a simple no-op task runs.

level 2 of that: run a configuration with 2 blocks, no scaling, run a few tasks and check that they all end up in the 2nd block and never in the first block: you should see that in the PARSL_WORKER_BLOCK_ID environment variable inside a task

@khk-globus might have some other ideas too around this space

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants