You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I added some debug output, and I found that m_policy_pre has a team_size() of 16 and a impl_vector_length() of 64, or a total of 1024 threads. That value is indeed too big for the definition of m_policy_pre:
#ifndef NDEBUG
template<typename Tag>
using TeamPolicyType = Kokkos::TeamPolicy<ExecSpace,Kokkos::LaunchBounds<512,1>,Tag>;
#else
template<typename Tag>
using TeamPolicyType = Kokkos::TeamPolicy<ExecSpace,Tag>;
#endif
TeamPolicyType<TagPreExchange> m_policy_pre;
Notice the Kokkos::LaunchBounds<512,1>.
I don't know why this is only showing up now. Maybe a newer version of Kokkos or Rocm checks these settings more carefully? Regardless, I think we want to allow m_policy_pre to have 1024 threads (4x4x64), so I think Kokkos::LaunchBounds<512,1> should not be used on AMD GPUs, where warps are 64 instead of 32.
The text was updated successfully, but these errors were encountered:
I'm experimenting with stand-alone Homme on Frontier with Rocm 5.7.1 and 128 vertical levels, and my runs are failing with the following output.
The
core
points to this line:E3SM/components/homme/src/theta-l_kokkos/cxx/CaarFunctorImpl.hpp
Line 350 in fff7243
I added some debug output, and I found that
m_policy_pre
has ateam_size()
of 16 and aimpl_vector_length()
of 64, or a total of 1024 threads. That value is indeed too big for the definition ofm_policy_pre
:Notice the
Kokkos::LaunchBounds<512,1>
.I don't know why this is only showing up now. Maybe a newer version of Kokkos or Rocm checks these settings more carefully? Regardless, I think we want to allow
m_policy_pre
to have 1024 threads (4x4x64), so I thinkKokkos::LaunchBounds<512,1>
should not be used on AMD GPUs, where warps are 64 instead of 32.The text was updated successfully, but these errors were encountered: