You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the kernel being executed in the first atomic_flag test, but I'm actually wondering what in the OpenCL spec guarnatees that this executes as expected by the CTS.
More specifically, what prevents a thread to race with a thread from a different subgroup on destMemory[cnt]. One thread could enter the iteration for e.g. cnt == 5, while another thread just reached atomic_flag_clear_explicit(&destMemory[cnt], for the same cnt, which means that two threads will enter the critical section, just the second one won't do anything.
I'm seeing this behavior with rusticl on zink on radv and at the moment it's not clear to me if that's my or the CTS' bug. Removing the atomic_flag_clear_explicit(&destMemory[cnt], makes only one thread execute the "criticial section" for each value of cnt, but that fails the test later.
So what's actually guaranteeing the correctness of this test here? Or would this test need to be rewritten? I'm also mildly wondering what clvk did in order to pass this test or if that was never a problem in the first place. I can probably come up with a fix to guarantee that behavior, I'm just wondering if there is some undefined behavior at play here.
Programsource:
__kernelvoidtest_atomic_kernel(uintthreadCount, uintnumDestItems, __globalint*finalDest, __globalint*oldValues, volatile__localatomic_flag*destMemory)
{
uinttid=get_global_id(0);
// initialize atomics not reachable from host (first thread// is doing this, other threads are waiting on barrier)if(get_local_id(0) ==0)
for(uintdstItemIdx=0; dstItemIdx<numDestItems; dstItemIdx++)
{
if(finalDest[dstItemIdx])
atomic_flag_test_and_set_explicit(destMemory+dstItemIdx,
memory_order_relaxed,
memory_scope_work_group);
elseatomic_flag_clear_explicit(destMemory+dstItemIdx,
memory_order_relaxed,
memory_scope_work_group); }
barrier(CLK_LOCAL_MEM_FENCE);
uintcnt, stop=0;
for(cnt=0; !stop&&cnt<threadCount; cnt++) // each thread must find critical section where it is the first visitor
{
boolset=atomic_flag_test_and_set_explicit(&destMemory[cnt], memory_order_relaxed, memory_scope_work_group);
atomic_work_item_fence(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE, memory_order_acquire,memory_scope_work_group);
if (!set)
{
uintcsIndex=get_enqueued_local_size(0)*get_group_id(0)+cnt;
// verify that thread is the first visitorif(oldValues[csIndex] ==1000000000)
{
oldValues[csIndex] =tid; // set the winner id for this critical sectionstop=1;
}
atomic_work_item_fence(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE, memory_order_release,memory_scope_work_group);
atomic_flag_clear_explicit(&destMemory[cnt], memory_order_relaxed, memory_scope_work_group);
}
}
// Copy final values to host reachable bufferbarrier(CLK_LOCAL_MEM_FENCE);
if(get_local_id(0) ==0) // first thread in workgroupfor(uintdstItemIdx=0; dstItemIdx<numDestItems; dstItemIdx++)
finalDest[dstItemIdx] =atomic_flag_test_and_set_explicit(destMemory+dstItemIdx,
memory_order_relaxed,
memory_scope_work_group);}
The text was updated successfully, but these errors were encountered:
mhh, no, that would mean that something with atomic_flag_test_and_set_explicit is going wrong, but that's kinda of weird.. maybe there is a coherency issue somewhere, anyway, it's probably not a CTS bug then and I initially thought something else is going on.
This is the kernel being executed in the first
atomic_flag
test, but I'm actually wondering what in the OpenCL spec guarnatees that this executes as expected by the CTS.More specifically, what prevents a thread to race with a thread from a different subgroup on
destMemory[cnt]
. One thread could enter the iteration for e.g.cnt == 5
, while another thread just reachedatomic_flag_clear_explicit(&destMemory[cnt],
for the samecnt
, which means that two threads will enter the critical section, just the second one won't do anything.I'm seeing this behavior with rusticl on zink on radv and at the moment it's not clear to me if that's my or the CTS' bug. Removing the
atomic_flag_clear_explicit(&destMemory[cnt],
makes only one thread execute the "criticial section" for each value ofcnt
, but that fails the test later.So what's actually guaranteeing the correctness of this test here? Or would this test need to be rewritten? I'm also mildly wondering what clvk did in order to pass this test or if that was never a problem in the first place. I can probably come up with a fix to guarantee that behavior, I'm just wondering if there is some undefined behavior at play here.
The text was updated successfully, but these errors were encountered: