Disclaimer: I would ask that none of this code actually be used in software; it doesn't offer anything, if your CPU has an ADD instruction, use it. Otherwise this is just a teaching aid.

Addition is a basic mathematical operation which can be emulated with some bitwise operations. Lets look at a canonical implementation:

**uint8_t add(uint8_t x, uint8_t y) {**

**uint8_t a;**

**uint8_t b;**

**do {**

**a = x & y;**

**b = x ^ y;**

**x = a << 1;**

**y = b;**

**} while (a);**

**return b;**

**}**

**To understand what is going on lets follow out with a basic example:**

Let in:x = 00000011 (3)

Let in:y = 00000100 (4)

When we invoke this add function the steps taking place are:

- AND -- a = (x&y)
- XOR -- b = (x^y)
- LSHFT -- x = (a<<1)

Resulting in the following:

- x = 00000011
- y = 00000100
- a = 00000000
- b = 00000100
- x = 00000000
- y = 00000111 -- XOR a:b = 7

This canonical implementation works, but it's slow. We can make it slightly faster, to see why, lets take a look at the assembly that code translates to (note: all code is x86_64 non-optimized)

**add:**

**pushq %rbp**

**movq %rsp, %rbp**

**movl %edi, %edx**

**movl %esi, %eax**

**movb %dl -20(%rbp) ; y**

**movb %al -24(%rbp) ; x**

**.loop:**

**movzbl -24(%rbp), %eax**

**movzbl -20(%rbp), %edx**

**andl %edx, %eax**

**movb %al, -1(%rbp)**

**movzbl -24(%rbp), %eax**

**movzbl -20(%rbp), %edx**

**xorl %edx, %eax**

**movb %al, -2(%rbp)**

**movzbl -1(%rbp), %eax**

**addl %eax, %eax**

**movb %al, -20(%rbp)**

**movzbl -2(%rbp), %eax**

**movb %al, -24(%rbp)**

**cmpb $0x0, -1(%rbp)**

**jne .loop**

**movzbl -2(%rbp), %eax**

**popq %rbp**

**ret**

There is 9+(15*iterations) worth instructions here. Assuming the best case senerio, that is 24 instructions. We can refine our implementation to be slightly faster.

**void add(uint8_t *x, uint8_t *y) {**

**do {**

***y = (*x^*y);**

*x = (*x&(*x^*y)) << 1;

*x = (*x&(*x^*y)) << 1;

**} while (*x);**

**}**

In this version we've removed the return, and replaced our arguments with

**pointers**to our variables, storing the result of the addition in *y, and using no temporaries!. This is (best case) 14! instruction less:**add:**

**movzbl (%rsi), %eax**

**.loop:**

**movzbl (%rdi), %ecx**

**movl %eax, %edx**

**andl %ecx, %edx**

**xorl %ecx, %eax**

**leal (%rdx,%rdx), %ecx**

**testb %dl, %dl**

**movb %cl, (%rdi)**

**movb %al, (%rsi)**

**jne .loop**

**rep**

**ret**

Of course since this function is small we can most likely make it inline, (which could allow for better optimization). I'd personally perfer a macro before the inline keyword.

**#define add(x,y) do{**

***y=(*x^*y),*x=(*x&(*x^*y))<<1;**

**} while(*x)**

That concludes addition by bit hacking. I've played around with many other methods, but this is 3+(9*iterations) worth instructions, or best case (11 instructions). I challenge anyone to beat this for best case. Must be C code (no assembly) .. post solution in the comments. Hint: abuse stack.