Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kokkos::Impl::ParallelReduce< HIP > requested too large team size #6511

Open
trey-ornl opened this issue Jul 12, 2024 · 0 comments
Open

Kokkos::Impl::ParallelReduce< HIP > requested too large team size #6511

trey-ornl opened this issue Jul 12, 2024 · 0 comments

Comments

@trey-ornl
Copy link

trey-ornl commented Jul 12, 2024

I'm experimenting with stand-alone Homme on Frontier with Rocm 5.7.1 and 128 vertical levels, and my runs are failing with the following output.

Kokkos::Impl::ParallelReduce< HIP > requested too large team size

The core points to this line:

Kokkos::parallel_reduce("caar loop pre-boundary exchange", m_policy_pre, *this, nerr);

I added some debug output, and I found that m_policy_pre has a team_size() of 16 and a impl_vector_length() of 64, or a total of 1024 threads. That value is indeed too big for the definition of m_policy_pre:

#ifndef NDEBUG
  template<typename Tag>
  using TeamPolicyType = Kokkos::TeamPolicy<ExecSpace,Kokkos::LaunchBounds<512,1>,Tag>;
#else
  template<typename Tag>
  using TeamPolicyType = Kokkos::TeamPolicy<ExecSpace,Tag>;
#endif

  TeamPolicyType<TagPreExchange>   m_policy_pre;

Notice the Kokkos::LaunchBounds<512,1>.

I don't know why this is only showing up now. Maybe a newer version of Kokkos or Rocm checks these settings more carefully? Regardless, I think we want to allow m_policy_pre to have 1024 threads (4x4x64), so I think Kokkos::LaunchBounds<512,1> should not be used on AMD GPUs, where warps are 64 instead of 32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant