Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distrib #370

Open
wants to merge 135 commits into
base: master
Choose a base branch
from
Open

Distrib #370

wants to merge 135 commits into from

Conversation

NicolasDenoyelle
Copy link

This branch is an addition to exisitng hwloc_distrib() method to distribute cpusets of the topology.
It adds a new way to iterate over topology objects of a single level with a hierarchical policy.
Utility hwloc-distrib has been modified to reflect the new capabilities.
This branch purpose is to bring thread binding policies to hwloc toolset by using it with hwloc-thread-bind branch.

bgoglin and others added 29 commits October 22, 2019 13:41
No code change, just add comments about things being official instead of assumptions.

The Gen5 specs is pretty-much finalized, things won't change anymore.

Signed-off-by: Brice Goglin <[email protected]>
Check that Linux can add NUMA nodes to x86 CPU information.

And check that Linux can annotate x86 AMD topoext NUMA nodes.

Signed-off-by: Brice Goglin <[email protected]>
If the workspace clone ever ran on another branch (e.g. in my zbgoglin jobs),
git branch returns multiple lines, which causes the 2nd branch name
to be ran as a command-line after the only expected line "job-0-tarball.sh <firstbranch>"

Use git rev-parse --abbrev-ref HEAD instead.

Signed-off-by: Brice Goglin <[email protected]>
This tells the code not to ever merge that group with structurally-identical
parent or children.

This is useful for Groups implementing new "types" that cannot be backported
to stable releases. New types won't be merged by default, but Groups would.

Requested by Intel for Die objects.

This doesn't break the ABI because the attribute structure has always been
calloc'ed, which means this attribute was "0", which matches the default
"merge group" behavior.

Signed-off-by: Brice Goglin <[email protected]>
Update CPUID.1f x86 test case not to merge Die groups anymore.
Hence there's no need to ignore Caches anymore.

Signed-off-by: Brice Goglin <[email protected]>
…ile/module types

Make them groups.

Signed-off-by: Brice Goglin <[email protected]>
I managed to convince Intel that adding another foo_siblings
between core_siblings and thread_siblings would break userspace
and situation could be even worse if they ever add another
intermediate level in the future.

So they are finally renaming to filenames whose semantics doesn't
depend on intermediate levels: core_cpus and package_cpus.

Signed-off-by: Brice Goglin <[email protected]>
Linux 5.3 will have new "die_cpus" and "die_id" sysfs files
for upcoming architectures with multiple dies per packages.

When the die cpuset is different from the package, add a "Die" group.

Don't add it when there's a single Die per package because
most CPUs don't want to show a useless additional Die level.

We don't want to set the Die level to keep_structure because
it would get automerged in L3 caches on CLX, and lstopo displays
everything by default anyway.

Set the "dont_merge" group flag if HWLOC_DONT_MERGE_DIE_GROUPS
is set in the environment, just like in the x86 backend.

Signed-off-by: Brice Goglin <[email protected]>
Old kernels exposed two packages on E5v3 in Cluster-on-Die mode
because the package core_siblings was wrong.
We detected that case when two packages had the same physical_package_id.

This was fixed in Linux 3.18, backported in RHEL7.
Other important distros use a more recent kernel now.

Signed-off-by: Brice Goglin <[email protected]>
Signed-off-by: Brice Goglin <[email protected]>
Just like for cores and packages.

Signed-off-by: Brice Goglin <[email protected]>
Otherwise the matrix would be wrong.

Further fixes commit c1c34a6

Signed-off-by: Brice Goglin <[email protected]>
…vailable

Hence, we don't have to run both on Linux/x86 anymore,
and we don't have to manually tarball the CPUID files.

Refs open-mpi#186

Signed-off-by: Brice Goglin <[email protected]>
Use realpath so that we can change the current directory
without breaking the destination relative directory.

Signed-off-by: Brice Goglin <[email protected]>
Reported by Intel from the output of klocwork.

Signed-off-by: Brice Goglin <[email protected]>
Reported by Intel from the output of klocwork.

Signed-off-by: Brice Goglin <[email protected]>
Reported by Intel from the output of klocwork.

Signed-off-by: Brice Goglin <[email protected]>
bgoglin and others added 26 commits October 22, 2019 13:42
…objects during build

Thanks to Eloi Gaudry for the patch.

Signed-off-by: Brice Goglin <[email protected]>
Instead of having all of them in the main solution file.

Thanks to Eloi Gaudry for the patch.

Signed-off-by: Brice Goglin <[email protected]>
Defined with recent VS.

Signed-off-by: Brice Goglin <[email protected]>
Thanks to Eloi Gaudry for the patch.

We force retarget to an old vs110 for ci.inria.fr.

Signed-off-by: Brice Goglin <[email protected]>
Thanks to Eloi Gaudry for the patch.

Signed-off-by: Brice Goglin <[email protected]>
Move idea of hwloc-ps to a github issue.
Update some comments, add details for command-line build.

Thanks to Eloi Gaudry for the suggestion.

Signed-off-by: Brice Goglin <[email protected]>
Will run in the extended nightly tests.

Runs only on master on the main repo by default.

Signed-off-by: Brice Goglin <[email protected]>
Signed-off-by: Brice Goglin <[email protected]>
They are renamed to PREFIX_hwloc_FOO instead of PREFIX_HWLOC_FOO
We could fix it but it doesn't matter much (people aren't supposed to
use those renamed names anyway) and it could break existing hacks
(if anybody actually depends on such renamed name).

Thanks to Samuel K. Gutierrez for the report.

Signed-off-by: Brice Goglin <[email protected]>
Don't AND(normal, topology_allowed) in the normal (v2) case
to avoid hiding internal allowed set bugs.

Signed-off-by: Brice Goglin <[email protected]>
In some (old?) corner cases, Linux cpusets may return offline PUs
in the allowed sets of cpusets/cgroups.

Signed-off-by: Brice Goglin <[email protected]>
…ectory

fsroot and cpuid are implemented in tools using environment variables
(those debug cases are not in the API since v2).
Those backends forced by environment variable override the normal
topology thissystem flag that may be set with set_flags() in the API
and with --flags or --thissystem in cli tools. One must use the
HWLOC_THISSYSTEM envvar to force the this system flag.
Implement this automatically in the tools (common helpers).

Signed-off-by: Brice Goglin <[email protected]>
Signed-off-by: ndenoyelle <[email protected]>
Signed-off-by: ndenoyelle <[email protected]>
Signed-off-by: ndenoyelle <[email protected]>
Signed-off-by: ndenoyelle <[email protected]>
@bgoglin
Copy link
Contributor

bgoglin commented Dec 4, 2019

@NicolasDenoyelle can you rebase/squash these commits to ease review?

@NicolasDenoyelle
Copy link
Author

NicolasDenoyelle commented Dec 4, 2019 via email

@bgoglin
Copy link
Contributor

bgoglin commented Mar 24, 2020

Note to open pull requests: some things changed in the CI yesterday, you'll need to rebase on top of master to avoid total CI failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants