You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some hardware supports branchless instructions like e.g. CMOV on x86 and CSEL on ARM64.
PowerPC and RISC-V seem to have less of those.
Such branchless instructions, as their name suggests, do not involve any control-flow branching and hence do not cause ctrl-dependencies. Instead, they do result in data-dependencies.
This can have subtle effects. Consider this code:
int r = load(&x);
int s = load(&y);
if (r == 0)
s = 42;
store(&x, s);
With proper branching, assume r==0 holds. then store(&x, s); has a data-dep on s=42 which has a ctrl-dep on int r = load(&x);.
However, there is no dependency chain that connects int s = load(&y); and store(&x, s);, allowing reordering of those two operations.
Now consider the branchless version:
int r = load(&x);
int s = load(&y);
s = ITE(r==0, 42, s); // e.g. a CMOV on x86
store(&x, s);
Here we have no ctrl-deps anymore, but there is a data-dep chain int s = load(&y); -> s = ITE(r==0, 42, s); -> store(&x, s) connecting load and store. This disallows any reordering no matter the result of r==0.
The difference in behavior can be observed in the full example given in #362 .
Now, LLVM may generate those ITE instructions in its IR, and currently we keep the instructions as such.
However, it is not clear that if the code is lowered to hardware that the branchless-ness is preserved.
In particular, I think for PowerPC and RISC-V this may not be the case.
We might want to revise the compilation we do to those architectures, or at least allow the user to force branch-full compilation via options.
Another option would be to always create branching code and avoid ITE altogether.
The text was updated successfully, but these errors were encountered:
As a short follow-up: We definitely need to avoid ITE for all language level compilation targets, because those certainly do not have such instructions.
Some hardware supports branchless instructions like e.g. CMOV on x86 and CSEL on ARM64.
PowerPC and RISC-V seem to have less of those.
Such branchless instructions, as their name suggests, do not involve any control-flow branching and hence do not cause ctrl-dependencies. Instead, they do result in data-dependencies.
This can have subtle effects. Consider this code:
With proper branching, assume
r==0
holds. thenstore(&x, s);
has a data-dep ons=42
which has a ctrl-dep onint r = load(&x);
.However, there is no dependency chain that connects
int s = load(&y);
andstore(&x, s);
, allowing reordering of those two operations.Now consider the branchless version:
Here we have no ctrl-deps anymore, but there is a data-dep chain
int s = load(&y); -> s = ITE(r==0, 42, s); -> store(&x, s)
connecting load and store. This disallows any reordering no matter the result ofr==0
.The difference in behavior can be observed in the full example given in #362 .
Now, LLVM may generate those
ITE
instructions in its IR, and currently we keep the instructions as such.However, it is not clear that if the code is lowered to hardware that the branchless-ness is preserved.
In particular, I think for PowerPC and RISC-V this may not be the case.
We might want to revise the compilation we do to those architectures, or at least allow the user to force branch-full compilation via options.
Another option would be to always create branching code and avoid ITE altogether.
The text was updated successfully, but these errors were encountered: