Untitled

This starts with lock1 = lock2 = lock3 = var = 1
Note that "wait" is a spinlock waiting for the specific lock to be equal to 0.

Thread 1:
wait(lock1)
cmp lock3, 0 # lock3 should be zero, Thread 2 already ran.
je end # Thus I take this path
mov var, 0 # And this is never run
end:

Thread 2:
mov lock3, 0
mov lock1, 0
mov ebx, var # I should know that var is 1 here.

First, consider Thread 1:
if `wait(lock1)` branch predicts that the lock isn't taken, it adds `cmp lock3, 0` to the pipeline
In the pipeline, `cmp lock3, 0` reads lock3 and finds out that it equal to 1.

Now, assume Thread 1 is taking its sweet time, and Thread 2 begins running quickly:
lock3 = 0
lock1 = 0

Now, let's go back to Thread 1:
Let's say the `wait(lock1)` reads lock1, finds out that lock1 is 0, and is happy about its branch predicting ability. This command commits, and nothing gets flushed
(Correct branch predicting means nothing is flushed, even with out-of-order reads, since the processor deduced that there is no internal dependency. lock3 isn't dependent on lock1 in the eyes of the CPU, so this all is okay)
(Hence why optimizing compilers will interleave consecutive independent for loops, so that they all get executed simultaneously)
Now, the `cmp lock3, 0`, which correctly read that lock3 was equal to 1, commits. `je end` is not taken, and `mov var, 0` executes.
In Thread 3, ebx is equal to 0. This should have been impossible.