Conditional Logic without Jumps

SIMD

This is an old, simple, but powerful idea-- "SIMD", which stands for Single Instruction Multiple Data:

Single, meaning just one.
Instruction, as in a machine code instruction, executed by hardware.
Multiple, as in more than one--from 2 to a thousand or so.
Data, as in floats or ints.

You can do lots interesting SIMD work without using any special instructions--plain old C++ will do, if you treat an "int" as 32 completely independent bits, because any normal "int" instructions will operate on all the bits at once. This use of bitwise operations is often called "SIMD within a register (SWAR)" or "word-SIMD"; see Sean Anderson's "Bit Twiddling Hacks" for a variety of amazing examples.

One universal aspect of SIMD programming is that you can't use jump instructions to implement conditional branches. The problem is you're doing multiple things at once, so if some should branch, and others shouldn't branch, you can't just jump to the right branch. So they've developed a whole variety of other ways to implement conditional logic without using jump instructions. The common aspect to all these approaches is that they evaluate *both* sides of the branch, and then selectively combine the results into one answer.

Conditionals via the Multiply Instruction

The simplest example of this sort of non-branch switching is just a multiply instruction. For example, here "doit" determines whether the output will be 4, or 6:

for (int doit=0;doit<=1;doit++) {
	int blended=4+doit*2;
	std::cout<<"doit=="<<doit<<"  -> "<<blended<<"\n";
}

(Try this in NetRun now!)

This outputs:

doit==0  -> 4
doit==1  -> 6
Program complete.  Return 0 (0x0)

More generally, if you want to return A if doit is false (0), and B if doit is true (1), then you can do:

for (int doit=0;doit<=1;doit++) {
	int A=4;
	int B=6;
	int blended=A+doit*(B-A);
	std::cout<<"doit=="<<doit<<"  -> "<<blended<<"\n";
}

(Try this in NetRun now!)

You also see this written as the algebraically equivalent "blended = (1-doit)*A+doit*B;". This has four operations instead of the three above, so you don't see it as much anymore.

There are some interesting advantages of the multiply approach to conditionals. One is you can do "fuzzy conditionals", where your code blends between A and B if doit is somewhere between 0 and 1:

for (float doit=0.0;doit<=1.0;doit+=0.25) {
	int A=4;
	int B=6;
	float blended=A+doit*(B-A);
	std::cout<<"doit == "<<doit<<"	->	"<<blended<<"\n";
}

(Try this in NetRun now!)

This outputs:

doit == 0	->	4
doit == 0.25	->	4.5
doit == 0.5	->	5
doit == 0.75	->	5.5
doit == 1	->	6
Program complete.  Return 0 (0x0)

Multiply-based conditionals work fine for a single digit, but they don't scale well to multiple digits. For example, "0x4444 + 0x0100*(0x6666-0x4444)" gives you "0x226644", not "0x4644".

Conditionals via Bitwise Operations

You can compute a conditional via a bitwise AND in a manner almost identical to multiplication. In fact, for a single bit value, bitwise AND and multiplication are the same operation! (Think about the "multiplication table" for one bit. It's AND.)

for (int doit=-1;doit<=0;doit++) {
	int A=4;
	int B=6;
	int blended=A+(doit&(B-A));
	std::cout<<"doit == "<<doit<<"	->	"<<blended<<"\n";
}

(Try this in NetRun now!)

This outputs:

doit == -1	->	6
doit == 0	->	4
Program complete.  Return 0 (0x0)

Note that:

0 is stored in binary as 000000 ... 0000 (all zeros). Bitwise AND with zero bits: always outputs zero.
-1 is stored in binary as 111111 ... 1111 (all ones). Bitwise AND with one bits: works like a copy.
The parenthesis are really important. If you forget them, the compiler treats "A+doit&(B-A)" like "(A+doit)&(B-A)", which will give you a WRONG, confusing answer!

One *huge* advantage of bitwise conditionals is that they apply to each bit independently. So, for example, if my "doit" value is half ones, and half zeros, then half the output bits will have the one output, and half the zero output.

For example:

void blend(int doit)
{
	int A=0x444444;
	int B=0x666666;
	int blended=A+(doit&(B-A));
	std::cout<<"doit == "<<std::hex<<doit<<"	->	"<<blended<<"\n";
}
int foo(void) {
	blend(0x000000);
	blend(0xffffff);
	blend(0xfff000);
	blend(0xff00ff);
	blend(0xf0f0f0);
	blend(0xff0f00);
	return 0;
}

(Try this in NetRun now!)

doit == 0	->	444444
doit == ffffff	->	666666
doit == fff000	->	666444
doit == ff00ff	->	664466
doit == f0f0f0	->	646464
doit == ff0f00	->	664644
Program complete.  Return 0 (0x0)

The basic idea here is we're just turning the appropriate bits on and off, to get the answer we need. Here I've printed the results in hex (base 16), but you can actually work with any group of bits you like, by chosing your mask correctly.

One caveat: the subtraction based method above can give really weird values if B-A is negative:

	int blended=A+(doit&(B-A));

This formulation doesn't do any subtraction, so it works for any A and B:

	int blended=((~doit)&A)+(doit&B);

Beyond If-then-else

Similarly to using bitwise AND for if-then-else, you can do SIMD versions of other operations:

Bitwise XOR can be used to perform per-bit inversion. XOR by 0 does nothing, XOR by 1 flips that bit.
Ordinary addition and subtraction actually work fine on a per-bit basis, except for the carry (or borrow) bits. You can stop carry by making sure there's at least a single zero bit as a "guard bit", and can stop borrow by making sure there's at least a single one bit as a "guard bit".

The only trouble with guard bits is you need to make space to store the guard bits. Sometimes you can throw out some of the input data to make space for guard bits--for example, I used this trick to make 8-bit RGB blending much faster, by actually doing 7-bit blending (I threw away the low bit of each channel, to make room for the guard bit).

If your input values are all good data, and you can't get away with trashing even a few bits to make space for guards, you need to use the "even odd trick", where you make two overlapping passes on the data:

long A=0x4444444488888888; // if doit==0, return this
long B=0x6666666666666666; // if doit==1, return this
long pick_4_or_6(long doit)
{
/* first, compute even hex digits, 
   using the odd digits as guards against borrows */
	long guards=0x1010101010101010;
	long goodma=0x0f0f0f0f0f0f0f0f;
	long gA=A&goodma;
	long gB=(B&goodma)+guards;
	long blendedE=gA+(doit&(gB-gA));

/* same thing, for odd hex digits */
	long guardsO=0x101010101010101;
	long goodmaO=0xf0f0f0f0f0f0f0f0;
	long gAO=A&goodmaO;
	long gBO=(B&goodmaO)+guardsO;
	long blendedO=gAO+(doit&(gBO-gAO));

/* recombine even and odd digits to get entire answer */
	return (goodma&blendedE)|(goodmaO&blendedO);
}

int foo(void) {
	long doit=0xffff000ff0f000ff;
	long blended=pick_4_or_6(doit);
        std::cout<<"doit=="<<doit<<"  -> "<<std::hex<<blended<<"\n";
	return 0;
}

(Try this in NetRun now!)

Of course, this particular value is much easier to compute with "long blended=((~doit)&A)+(doit&B);", but it illustrates how to do a subtraction if you need it. It's a total of fifteen bitwise operations, which sounds like too many, but that's for *sixteen* if-then-else operations, which are way more than one instruction each!

There's a beautiful way to do per-bitfield comparisons, where you clear out some zeros ahead of the values you want to compare, add a single one bit as a guard, and subtract the values. If the difference is zero or positive, the zeros stay zeros and can be used as a mask. If the difference is negative, the zeros get borrowed into ones right up to the guard bit, and you can use those borrows as a bitmask.

UAF's own Jim Long wrote a portable version of the Cray Bioinformatics Library using SWAR techniques. He could pack 32 nucleotides (2-bit ACGT DNA "letters") into a 64-bit "long"!