Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Three things to understand that really make this algorithm faster:
- If A and B are binary digits (0 or 1), then:
- ~a=1-a, where ~ is the "not" function and 'a' is a binary digit.
- ~x=-x-1, where x is a binary number, modulo some base power of 2 (in assembly, this is commonly 2^8, 2^16, 2^32, or 2^64)
- a(1-a) is 0, always [it is equivalent to a&~a].
- a(1-b)+b(1-a) = a+b-2ab is the same as a^b, where ^ is the xor function.
- So then suppose you have x is a number whose bits are abcdefgh... where 'a' is the least-significant bit.
- In base 10, a product might look at the 1s place, 10s place, 100s place, etc for the digits. In binary, we looks at 1s, 2s, 4s, etc. for the bits.
- So then the 1s place of ~x is (1-a), the second digit is (1-b), etc.
- Then the product of x*~x is:
- 1s a(1-a)
- 2s a(1-b)+b(1-a)
- 4s a(1-c)+b(1-b)+c(1-a)
- 8s a(1-d)+b(1-c)+c(1-b)+d(1-a)
- 16s a(1-e)+b(1-d)+c(1-c)+d(1-b)+e(1-a)
- 32s a(1-f)+b(1-e)+c(1-d)+d(1-c)+e(1-b)+f(1-a)
- 64s a(1-g)+b(1-f)+c(1-e)+d(1-d)+e(1-c)+f(1-b)+g(1-a)
- 128s a(1-h)+b(1-g)+c(1-f)+d(1-e)+e(1-d)+f(1-c)+g(1-b)+h(1-a)
- ...
- So applying the fact that a(1-a)=0 (this works the same for b(1-b), etc.):
- 1s 0
- 2s a(1-b)+b(1-a)
- 4s a(1-c)+c(1-a)
- 8s a(1-d)+b(1-c)+c(1-b)+d(1-a)
- 16s a(1-e)+b(1-d)+d(1-b)+e(1-a)
- 32s a(1-f)+b(1-e)+c(1-d)+d(1-c)+e(1-b)+f(1-a)
- 64s a(1-g)+b(1-f)+c(1-e)+e(1-c)+f(1-b)+g(1-a)
- 128s a(1-h)+b(1-g)+c(1-f)+d(1-e)+e(1-d)+f(1-c)+g(1-b)+h(1-a)
- ...
- And now applying the trick following that:
- 1s 0
- 2s a^b
- 4s a^c
- 8s a^d+b^c
- 16s a^e+b^d
- 32s a^f+b^e+c^d
- 64s a^g+b^f+c^e
- 128s a^h+b^g+c^f+d^e
- ...
- Oh hey, now *that* looks pretty easy! In fact, we've already filled out what the first 8 bits look like!
- Since this is x*~x, this is x*(-x-1)=-x^2-x. So if we add the original x and negate, we get the first 8 bits of x^2.
- If x is an 8-bit int and you want the full 16-bit result of x*~x
- 1s 0
- 2s a^b
- 4s a^c
- 8s a^d+b^c
- 16s a^e+b^d
- 32s a^f+b^e+c^d
- 64s a^g+b^f+c^e
- 128s a^h+b^g+c^f+d^e
- 256s +b^h+c^g+d^f
- 512s +c^h+d^g+e^f
- 1024s +d^h+e^g
- 2048s +e^h+f^g
- 4096s +f^h
- 8192s +g^h
- 16384s 0
- 32768s 0
- Now if this is being applied to a circuit, you can reduce the number of additions:
- 1s 0
- 2s a^b
- 4s a^c
- 8s a^d+b^c
- 16s a^e+b^d
- 32s a^f+b^e+c^d
- 64s a^g+b^f+c^e
- 128s a^h+b^g+c^f+d^e
- 256s b^h+c^g+d^f
- 512s c^h+d^g+e^f
- 1024s d^h+e^g
- 2048s e^h+f^g
- 4096s f^h
- 8192s g^h
- 16384s 0
- 32768s 0
- a^b+b
- a+2b-2ab
- And hey, why not add in the original x in there to make it compute -x^2 :
- We'll use the identity that a+b-2ab=a^b
- 1s 0 +a =a
- 2s a^b +b =a+2b-2ab
- 4s a^c +c =a+2c-2ac
- 8s a^d+b^c +d =a+2d-2ad+b^c
- 16s a^e+b^d +e =a+2e-2ae+b^d
- 32s a^f+b^e+c^d +f =a+2f-2af+b^e+c^d
- 64s a^g+b^f+c^e +g =a+2g-2ag+b^f+c^e
- 128s a^h+b^g+c^f+d^e +h =a+2h-2ah+b^g+c^f+d^e
- 256s b^h+c^g+d^f
- 512s c^h+d^g+e^f
- 1024s d^h+e^g
- 2048s e^h+f^g
- 4096s f^h
- 8192s g^h
- Anything multiplied by 2 can get shifted to the next digit position:
- 1s a
- 2s a
- 4s a +b-ab
- 8s a+b^c +c-ac
- 16s a+b^d +d-ad
- 32s a+b^e+c^d +e-ae
- 64s a+b^f+c^e +f-af
- 128s a+b^g+c^f+d^e +g-ag
- 256s b^h+c^g+d^f +h-ah
- 512s c^h+d^g+e^f
- 1024s d^h+e^g
- 2048s e^h+f^g
- 4096s f^h
- 8192s g^h
- Now we'll use a+b-ab=a|b (| is logical OR):
- 1s a
- 2s a
- 4s a|b
- 8s a|c+b^c
- 16s a|d+b^d
- 32s a|e+b^e+c^d
- 64s a|f+c^e+b^f
- 128s a|g+d^e+b^g+c^f
- 256s c^g+d^f-a +b
- 512s d^g+f^e-bh +c
- 1024s g^e-ch +d
- 2048s f^g+e -dh
- 4096s f -eh
- 8192s g -fh
- 16384s h~g
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement