Conditional Logic without Jumps
CS 301 Lecture, Dr. Lawlor
SIMD
This is an old, simple, but powerful idea-- "SIMD", which stands for Single Instruction Multiple Data:
- Single, meaning just one.
- Instruction, as in a machine code instruction, executed by hardware.
- Multiple, as in more than one--from 2 to a thousand or so.
- Data, as in floats or ints.
You can do lots interesting SIMD work without using
any special instructions--plain old C++ will do, if you treat an "int" as
32 completely independent bits, because any normal "int" instructions
will operate on all the bits at once. This use of
bitwise operations is often called "SIMD within a register (SWAR)" or
"word-SIMD"; see Sean Anderson's "Bit Twiddling Hacks" for a variety of amazing examples.
One universal aspect of SIMD programming is that you can't use jump
instructions to implement conditional branches. The problem is
you're doing multiple things at once, so if some should branch, and
others shouldn't branch, you can't just jump to the right branch.
So they've developed a whole variety of other ways to implement
conditional logic without using jump instructions. The common
aspect to all these approaches is that they evaluate *both* sides of
the branch, and then selectively combine the results into one answer.
Conditionals via the Multiply Instruction
The simplest example of this sort of non-branch switching is just a
multiply instruction. For example, here "doit" determines whether
the output will be 4, or 6:
for (int doit=0;doit<=1;doit++) {
int blended=4+doit*2;
std::cout<<"doit=="<<doit<<" -> "<<blended<<"\n";
}
(Try this in NetRun now!)
This outputs:
doit==0 -> 4
doit==1 -> 6
Program complete. Return 0 (0x0)
More generally, if you want to return A if doit is false (0), and B if doit is true (1), then you can do:
for (int doit=0;doit<=1;doit++) {
int A=4;
int B=6;
int blended=A+doit*(B-A);
std::cout<<"doit=="<<doit<<" -> "<<blended<<"\n";
}
(Try this in NetRun now!)
You also see this written as the algebraically equivalent "blended =
(1-doit)*A+doit*B;". This has four operations instead of the
three above, so you don't see it as much anymore.
There are some interesting advantages of the multiply approach to
conditionals. One is you can do "fuzzy conditionals", where your
code blends between A and B if doit is somewhere between 0 and 1:
for (float doit=0.0;doit<=1.0;doit+=0.25) {
int A=4;
int B=6;
float blended=A+doit*(B-A);
std::cout<<"doit == "<<doit<<" -> "<<blended<<"\n";
}
(Try this in NetRun now!)
This outputs:
doit == 0 -> 4
doit == 0.25 -> 4.5
doit == 0.5 -> 5
doit == 0.75 -> 5.5
doit == 1 -> 6
Program complete. Return 0 (0x0)
Multiply-based conditionals work fine for a single digit, but they
don't scale well to multiple digits. For example, "0x4444 +
0x0100*(0x6666-0x4444)" gives you "0x226644", not "0x4644".
Conditionals via Bitwise Operations
You can compute a conditional via a bitwise AND in a manner almost
identical to multiplication. In fact, for a single bit value,
bitwise AND and multiplication are the same operation! (Think
about the "multiplication table" for one bit. It's AND.)
for (int doit=-1;doit<=0;doit++) {
int A=4;
int B=6;
int blended=A+(doit&(B-A));
std::cout<<"doit == "<<doit<<" -> "<<blended<<"\n";
}
(Try this in NetRun now!)
This outputs:
doit == -1 -> 6
doit == 0 -> 4
Program complete. Return 0 (0x0)
Note that:
- 0 is stored in binary as 000000 ... 0000 (all zeros). Bitwise AND with zero bits: always outputs zero.
- -1 is stored in binary as 111111 ... 1111 (all ones). Bitwise AND with one bits: works like a copy.
- The parenthesis are really important. If you forget them, the compiler treats "A+doit&(B-A)" like "(A+doit)&(B-A)", which will give you a WRONG, confusing answer!
One *huge* advantage of bitwise conditionals is that they apply to each
bit independently. So, for example, if my "doit" value is half ones,
and half zeros, then half the output bits will have the one output, and
half the zero output.
For example:
void blend(int doit)
{
int A=0x444444;
int B=0x666666;
int blended=A+(doit&(B-A));
std::cout<<"doit == "<<std::hex<<doit<<" -> "<<blended<<"\n";
}
int foo(void) {
blend(0x000000);
blend(0xffffff);
blend(0xfff000);
blend(0xff00ff);
blend(0xf0f0f0);
blend(0xff0f00);
return 0;
}
(Try this in NetRun now!)
doit == 0 -> 444444
doit == ffffff -> 666666
doit == fff000 -> 666444
doit == ff00ff -> 664466
doit == f0f0f0 -> 646464
doit == ff0f00 -> 664644
Program complete. Return 0 (0x0)
The basic idea here is we're just turning the appropriate bits on and
off, to get the answer we need. Here I've printed the results in
hex (base 16), but you can actually work with any group of bits you
like, by chosing your mask correctly.
One caveat: the subtraction based method above can give really weird values if B-A is negative:
int blended=A+(doit&(B-A));
This formulation doesn't do any subtraction, so it works for any A and B:
int blended=((~doit)&A)+(doit&B);
Beyond If-then-else
Similarly to using bitwise AND for if-then-else, you can do SIMD versions of other operations:
- Bitwise XOR can be used to perform per-bit inversion. XOR by 0 does nothing, XOR by 1 flips that bit.
- Ordinary addition and subtraction actually work fine on a per-bit
basis, except for the carry (or borrow) bits. You can stop carry
by making sure there's at least a single zero bit as a "guard bit", and
can stop borrow by making sure there's at least a single one bit as a
"guard bit".
The only trouble with guard bits is you need to make space to store the
guard bits. Sometimes you can throw out some of the input data to
make space for guard bits--for example, I used this trick to make 8-bit
RGB blending much faster, by actually doing 7-bit blending (I threw
away the low bit of each channel, to make room for the guard
bit).
If your input values are all good data, and you can't get away with
trashing even a few bits to make space for guards, you need to use the
"even odd trick", where you make two overlapping passes on the data:
long A=0x4444444488888888; // if doit==0, return this
long B=0x6666666666666666; // if doit==1, return this
long pick_4_or_6(long doit)
{
/* first, compute even hex digits,
using the odd digits as guards against borrows */
long guards=0x1010101010101010;
long goodma=0x0f0f0f0f0f0f0f0f;
long gA=A&goodma;
long gB=(B&goodma)+guards;
long blendedE=gA+(doit&(gB-gA));
/* same thing, for odd hex digits */
long guardsO=0x101010101010101;
long goodmaO=0xf0f0f0f0f0f0f0f0;
long gAO=A&goodmaO;
long gBO=(B&goodmaO)+guardsO;
long blendedO=gAO+(doit&(gBO-gAO));
/* recombine even and odd digits to get entire answer */
return (goodma&blendedE)|(goodmaO&blendedO);
}
int foo(void) {
long doit=0xffff000ff0f000ff;
long blended=pick_4_or_6(doit);
std::cout<<"doit=="<<doit<<" -> "<<std::hex<<blended<<"\n";
return 0;
}
(Try this in NetRun now!)
Of course, this particular value is much easier to compute with "long
blended=((~doit)&A)+(doit&B);", but it illustrates how to do a
subtraction if you need it. It's a total of fifteen bitwise
operations, which sounds like too many, but that's for *sixteen*
if-then-else operations, which are way more than one instruction each!
There's a beautiful way to do per-bitfield comparisons, where you clear
out some zeros ahead of the values you want to compare, add a single
one bit as a guard, and subtract the values. If the difference is zero
or positive, the zeros stay zeros and can be used as a mask. If the
difference is negative, the zeros get borrowed into ones right up to
the guard bit, and you can use those borrows as a bitmask.
UAF's own Jim Long wrote a portable version of the Cray Bioinformatics Library using SWAR techniques. He could pack 32 nucleotides (2-bit ACGT DNA "letters") into a 64-bit "long"!