Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSA-like form for better encodings #576

Open
ThomasHaas opened this issue Nov 25, 2023 · 0 comments
Open

SSA-like form for better encodings #576

ThomasHaas opened this issue Nov 25, 2023 · 0 comments

Comments

@ThomasHaas
Copy link
Collaborator

ThomasHaas commented Nov 25, 2023

We have decided to not enforce SSA-form in our IR, however, this can cause worse encodings.
Whenever encoding a read access of register r, we need to iterate over all possible (last) assignments to r which can be O(n) many. With O(n) many registers, this can yield an encoding of size O(n^2).
This is expected, because the size of a Use-Def Graph can be quadratic for non-SSA programs.
However, it is known to be only linear for SSA programs.

Consider the following program:

int i
for (i = 0; i < N; i++) {
   if (__VERIFIER_nondet_bool()) {
       break;
   }
}
int square = 0;
for (int j; j < N; j++) {
    square += i;
}

In the second loop, each read of i may read N-many different values (one from each iteration of the first loop). The encoding is quadratic.
Introducing a temporary variable int temp = i; // ~PHI node after the first loop and reusing that one in the second loop will produce a linear encoding.

Since our parsed programs are originally in SSA form, one might think that only loop unrolling produces non-SSA parts of the program (which are rather small, given the usual loop bounds). However, we also do copy propagation which actually removes SSA form by eliminating phi nodes (it would remove the int temp = i; assignment from the above example).

So I suggest that we try to recover an SSA-like form after all processing (so that SCCP cannot destroy it again) by inserting simple register copy events (r_phi := r;) to act as phi nodes.

Edit: I quickly checked how many opportunities we have for this optimization, and the numbers are not really high.
Even in qspinlock there are just around 9 cases per thread, most of which have to do with the non-deterministic CAS I believe. Linux RCU had even less cases. Our usual LFDS benchmarks have 1-2 cases per thread (safe_stack is the biggest outlier with 8 cases per thread!).
The largest benchmark was ck_epoch with 33 cases per thread.
I believe this is the case because ck_epoch has a function that tries to find an element in a data structure (via a loop) and returns it. The value of the return register could come from any iteration, thus every read access to that register has many possible source events.
In conclusion, it seems that the room for optimizations is low and one would only expect benefit for benchmarks that search in data structures.
In case we try to implement this, safe_stack seems to be the best choice among our unit tests to measure the impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant