Optimizing Code in General

CS 301 Lecture, Dr. Lawlor

(Also start reading Chapter 5 of the Bryant and O'Hallaron textbook for more info on performance and optimization)

So you need to make your code go faster.  Here are the steps you go through to do this:
  1. Understand what the code does.  This is really important.  There's a great story about a bunch of maintainance programmers that got handed a huge database system that was running too slow.  They found this one piece of code that was running all the time, and doing a bunch of silly slow stuff, and they worked for a few months to speed it up.  In the end, that part of the code got 10 times faster, but the overall database performance was exactly the same.  Why?  Because they'd sped up the "idle loop", the part of the program that only runs when nothing is happening!
  2. Understand where the code spends its time.  Usually, 99% of the code can be totally ignored--it runs once, or it's error-handling code, or it's not being used anymore.  It's the remaining 1% of the code that takes 90% of the runtime.  Find that 1% by understanding the code (step 1), using timers (see below), using debug statements (like std::cout<<"foobar.c is finally running!\n"), and using good profiling tools like gprof or VTune.
  3. Use a better algorithm (like you cover in CS 311 and 411).  You can't ever tweak bubblesort (an n-squared algorithm) to run faster than quicksort (an n-log(n) algorithm) for big arrays, so start your optimization session by switching to quicksort.  Use search trees instead of brute-force searching a whole array; loop over the data once instead of a zillion times; and so on.  The STL makes this sort of thing trivial, so do it.  Stupid algorithms are a really common source of inefficiency; a good algorithm written sloppily will usually beat even the tightest implementation of a dumb algorithm.  Joel on Software has an entertaining illustration of this with his Shlemeil the painter story.  Shlemeil doesn't need to run faster in order to get more work done!
  4. Let the compiler help you.  Make constants really constant, so the compiler can lift them out of loops.  Use "inline" to expose more subroutines to the optimizer.
  5. Replace slow calls with fast calls.  See the speed of stuff below.
  6. Read the assembly code the compiler generates.  If you're sure the compiler's optimized away all the "+=117" calls, but you see the constant 117 all over the assembly, you know it hasn't optimized it.  If you wanted everything inlined, but the compiler's calling subroutines left and right, you know you need to make your code inlineable.  If right in the middle of your fast inner loop, the compiler's putting in a call to "__moddi3", you need to figure out what the heck that is (a gcc compiler builtin routine), and how to get rid of it (use "int" instead of "long long", or don't use % on long longs).
  7. If you can't get the compiler to spit out the code you want (which is rare!), write small bits of assembly code.  It does take a fair amount of effort to beat the compiler, so this should not be your first choice!
  8. If you can't write assembly code for all the cases you need to handle, consider some sort of dynamic code generation (coming before the end of the semester).

Understanding Time

So optimization is about reducing the amount of time your code uses.  Step 1 of this process is understanding how to measure time.

Name
Second
Millisecond
Microsecond
Nanosecond
Abbrev.
1s
1ms
1us  (or µs)
1ns
Duration
1.0e0s
1.0e-3s
1.0e-6s
1.0e-9s
Same as
1 heartbeat
1 bullet-foot
1 light-block
1 light-foot
1 cpu clock cycle
1 cpu instruction

Human scales are seconds or so.  Your eyes can detect flashing from a CRT going at 50Hz refresh (50 cycles per second, or 0.02 seconds per cycle, or 20ms/cycle).  You can't see flashing in a CRT at 100Hz refresh, so your eyes can't follow motions that are faster than 10ms or so.  You can hear sounds up to 20 or 30 KHz, so your ears can detect air movements that take only 30us.

A 1GHz computer has a 1ns clock period.  This means the central circuit that drives everything else in the CPU rises and falls once every nanosecond.  Most machines can get a few instructions done per clock cycle, which is unbelievably fast!

Using Timers

See the NetRun help page for info on the timing routines built into NetRun.  In CS321, we'll talk about all the timers that are available on various machines.

What's Fast, and What's Slow?

1s
100ms
10ms
1ms
100us
10us
1us
100ns
10ns
1ns
Waiting for the human being
Reading or writing from CD, DVD, floppy
Typical remote network latency
Reading or writing from Hard Drive

Typical local network latency
fopen
cout
cin
OS call
sqrt
sin
cos
pow
malloc
new
/, %
CALL
+, -, *
&, |, ^, <<, >>
[i]