Biological Computing
CS 321 2007 Lecture, Dr. Lawlor
(Warning: Dr. Lawlor is pretty far outside his expertise here!)
Glossary
- Nucleotide:
the fundamental unit of biological information storage (DNA) and
communication (RNA). Nucleotides are to biology as bits are to normal
computing. A nucleotide is one of the four letters A, G, C, or
T/U; these stand for the chemical compounds Adenine, Guanine, Cytocine,
or Thymine (in DNA) / Uracil (in RNA). Since there are four
nucleotides, one nucleotide requires two bits of binary storage, but
it's more common to use one ASCII byte per nucleotide, and just give
the letter, like 'A'.
- Codon: a group of three nucleotides representing a command--either start, stop, or an amino acid.
This is the biological equivalent of one machine-language
instruction. The start codon is AUG. There are actually
three stop codons:UAG, UAA, and UGA.
- Amino Acid:
one fairly simple biological molecule. Specified in DNA/RNA with
one codon. Normally strung together into proteins/enzymes.
- Gene:
a sequence of codons representing amino acids that get assembled into a
single protein. The biological equivalent of one complete
program. The human genome contains only twenty or twenty-five
thousand genes. A typical human gene is tens of thousands of
nucleotides long.
- Protein: a
useful and complicated biological molecule constructed from a sequence
of amino acids (and sometimes a few metal ion "cofactors", like the
iron in hemoglobin). There areproteins that collect waste, tough
watertight proteins that form a cell's waxy coat, proteins that help
DNA
- Enzyme: a protein that helps convert one chemical to another, or "catalyzes" a chemical reaction. For example, the enzyme Amylase,
found in saliva, breaks down starches into sugars. Enzymes exist
for all the important reactions in your body--the metabolism of sugar
and oxygen into carbon dioxide and water, the reduction of free
radicals, and so on.
- DNA:
the biological program storage mechanism. Consists of a string of
matching
("complementary") nucleotides held together by a backbone. DNA
doesn't actually do any work itself--it's the master copy of the
genetic code, and it stays inside the cell's nucleus packaged in chromosomes. The only thing DNA does is get copied
("transcribed") to RNA, which migrates out into the cell proper to do work.
- RNA: a biological
loaded-and-runnable program. Like DNA, RNA consists of a string
of nucleotides, but RNA is missing the complementary double-helix--it's
a single helix, with the nucleotides dangling out ready for use.
RNA is used in one of two ways. "Coding" DNA gets copied to Messenger RNA,
which contains three-nucleotide codon groups representing a sequence of
amino acids to be assembled into a protein. "Non-coding" DNA
(formerly known as "junk" DNA!) does not contain codons, but instead
seems to use RNA's nucleotides directly to do useful work. Many of the functions performed by non-coding RNA are still being worked out.
Example: DNA to Protein
Say in your cell's nucleus, your DNA contains a gene with this unusually-short sequence of nucleotides:
...TA ATG CAC GGG GGC GGG UGG GGG CAA CCA TAG AAA G...
This will get transcribed into a short string of Messenger RNA with this sequence (replacing T - Thymine, with U - Uracil):
UA AUG CAC GGG GGC GGG UGG GGG CAA CCA UAG AAA G
Using the cell's ribosomes, this string of Messenger RNA will get executed as follows:
- U, A -> nothing happens
- AUG -> valid START sequence codon. The ribosomes bind to this spot, and begin assembling a protein.
- CAC -> codon for the Histidine amino acid (table), which is the first amino acid in the new protein.
- GGG -> codon for the Glycine amino acid, which is the second
amino acid in the protein. The Glycine sticks to the existing
Histidine, forming a two-acid "polypeptide".
- GGC -> another different codon for Glycine amino acid, which
is the third amino acid in the chain. There are 64 possible
codons, but only 20 used amino acids, so some amino acids are
represented by several different codons.
- GGG -> Glycine again, which gets added as the fourth amino acid.
- UGG -> Tryptophan, which gets stuck on as the fifth acid.
- GGG -> yet more Glycine.
- CAA -> Glutamine.
- CCA -> Proline.
- UAG -> STOP codon. The ribosome lets go of the newly
formed chain of amino acids, which is a new protein. The protein
floats away.
- A,
A, A, G, etc. -> nothing happens. Stuff outside of START and
END does not bind ribosomes, and so does not make proteins.
So this Messenger RNA has just created a new eight-amino-acid protein:
Histidine - (Glycine)3 - Tryptophan - Glycine - Glutamine - Proline
(or HGGGWGQP using the confusing amino-acid-to-letter substitution).
(I'm skipping over lots of complexity here. Real genes start with a promotor sequence that tends to attract the RNA replication machinery, and often include introns that fold themselves out of the RNA before it's executed into a protein.)
Why You Care: Disease & Bioterror
One particular folding of the protein above is human prion protein 61-68, which is the cause of the Creutzfeldt-Jakob disease, a brain-destroying disease that can either be inherited from the bad genes listed above, or aquired by eating the poorly-cooked brains of infected "mad" cows.
The problem with this protein is that it functions as an enzyme--it
converts other useful proteins into more copies of itself. Such
self-catalyzing proteins are called prions.
Prions aren't nearly as infectious as viruses (they have to be eaten in
large quantities to be infected, and take years to begin causing
problems), but they're incurable and currently mostly undetectable.
Read that again. The the gene sequence above, when executed into a protein, can kill you. For under a hundred dollars, online you can mail-order physically expressed copies of that gene sequence from a gene synthesis lab. You can order the copies as fully-assembled proteins (peptides), short RNA or DNA snippets (oglios), or even as working DNA inside living (non-human) cells like bacteria.
Bacteria are just little independent single-cell organisms living in
your body. Viruses are more interesting--they're just DNA in a
cheap protein coat. When executed, the DNA codes for... more
viruses. So a virus just hijacks the code of a working cell to start
manufacturing viruses--nanotechnology used for evil.
Here's the nucleotide sequence for smallpox
(variola virus). It's 185.5 thousand nucleotides long, or 46.4KB
in binary form. Luckily, it's currently not possible to
artificially synthesize such extremely long-chain sequences into working DNA (the
per-nucleotide error rate is too high), but in a few years these 46.4KB
of *binary* data could be converted to *physical* form and cause
horrific human suffering!
Also, cancer. Cancer is very simple--it's when your body's normal
cells stop doing what they're supposed to do, and change their own DNA
to start reproducing without bound, like little single-cell
organisms. Your genes contain all sorts of interesting hacks to
prevent this, like the ticking time-bomb of telomeres
at the end of each chromosome, but cancer (evolution at work!) is
pretty good at changing the cell DNA to evade these defenses. A
woman, Henrietta Lacks,
who died in 1951, had a cervical cancer culture taken that still lives
on to this day, having evolved into a successful experimental and wild
single-celled organism, which to this day will occasionally infect other people's cancer biopsy results.
Why You Care: Information Density
Again, online you can order flourescent probe molecules to tag a
particular protein or sequence you're interested in. These probes
are short little proteins that have one glowy end (for example, that
glows green under UV light), and one "sticky" end, where by "sticky" I
mean that end is designed to bind to whatever biological object you
like. For example, say you're interested in determining if a cow
brain contains the prions above. So you design a probe that will
stick to the prion. Then you just wash your cow brain (or plants,
or toads, or whatever) with the probes, and then shine on a UV
light--if it glows green, the probes have stuck to prions, so don't eat
it!
How expensive are these useful little probes to fabricate? Well, there's a special where $100 will buy you 1 "nano-mol" of probes. 1 mol is 6.022 x 1023 molecules (Avagadro's number). So 1 nano-mol is 10-9 moles, or 6.022 x 1014 molecules. That's 6 trillion probe molecules per dollar!
This is really cheap compared with the price of, for example, cars (0.00009 Kia Rios
per dollar) or even like fast food (2 Taco Bell tacos per
dollar). It's still cheap compared to CPU transistors (300
million transistors/$100 = 3 million transistors per dollar) or even
DRAM storage cells (1GB/$50 = 8 billion bits/$50 = 160 million bits per
dollar).
Biological information storage is so cheap, in fact, that almost every
cell in your body contains its own complete copy of your DNA.
Human DNA has about 3 billion nucleotide pairs, or 6 billion bits, or
750MB of data--about one CD-ROM worth. There are something like 5
million cells per cubic centimeter of human flesh, which means
(counting only the DNA) the information density of human flesh is over
3,000 terabytes per
cubic centimeter! And that's not even trying very hard--pure DNA
could be thousands of times more efficient than this, since DNA is only
a tiny portion of the complete cell.
The bottom line is that DNA is a spectacularly awesome information
storage mechanism--one pair of nucleotides is only a few dozen atoms
across, and stores two bits. I feel like DNA and proteins
represent amazing nanotechnology--atomic-scale fabrication done right.
Why You Care: Processing Speed
We saw above that $1 buys you six trillion (6 x 1012) probe
proteins. At room temperature, they're all wiggling all over the
place, at a speed of molecules, "trying" to react with something nearby.
For example, this page's NAMD simulations
of the cell-wall protein aquaporin shows the crucial atoms inside the
protein wiggling around. The atoms make complete wiggles on a
timescale of picoseconds (10-12 seconds).
Viewed as a computer, this means you've got trillion-way parallelism,
and your clock rate is in the terahertz. This means you're doing
trillions of trillions of total wiggles per second--in this case, something like 6 x 1024
wiggles per second--per dollar! So if you can figure out how to
express your computation in terms of atomic wiggles, you can get
absolutely insane performance.
Ecosystem Design
A single cell uses a number of interesting design principles.
First, because everything's on the scale of atoms (and wiggling around
like crazy), it's quite easy for things to get knocked out of
alignment, for crucial parts to break off, or for random unknown
molecules to arrive and disrupt the functioning of the system.
The cell has to work even in the face of all that, and it does a
wonderful job of it. The main trick is simply
replication--there's 500 copies of the Messenger RNA for every gene in
the cell that matters, so losing one of the copies is no big
deal. It's a totally different design philosophy than normal
computers are based on.
Many of these same cell-design principles are shared by healthy ecosystems, economic markets,
functioning democracies, and piles of gravel. I've come to call
these principles "ecosystem design":
- Many independent decision-makers. Examples: A cell is a
complicated collection of separate but interacting proteins. A
market is a collection of interacting buyers and sellers. A
gravel pile is a collection of interacting pebbles. Results: no
central decision-maker means no single point of failure, so the loss of
any few small pieces doesn't affect the overall result. Anything
important is decided by hundreds of independent parts--and it's just
inconceivable that they'd all get it wrong.
- Dynamic equilibrium--lots of small-scale stuff is changing all
the time, but the overall averages remain quite constant.
Examples: the metabolic rate of a cell is the result of all its pieces working,
but it's quite predictable overall. A market's overall prices
tend to remain fairly stable. A gravel pile's average slope can
be predicted to within a few degrees.
As an example, I claim an automobile, CPU, dictatorship, and orderly stack of
bricks use "machine design", not ecosystem design: these systems have
crucial decisionmaker parts (e.g., the braking system, control unit,
dictator, or bottom brick) whose failure can dramatically change the
entire system. Machine design requires somebody to go in
and fix these crucial parts now and then, which is inefficient and error-prone.
Ecosystems, by contrast, harness the power of probability--chaos--in order to get stuff done.
Fault Tolerance
A computer is really not at all a robust system--outside a very narrow
temperature and electrical voltage range, it will stop working.
Computers can be totally destroyed by dust, humidity, or even
microscopic conductive "zinc whiskers".
Mammals are, of course, also quite easy to disrupt. Mammals
depend on the circulation of air and blood to continue to operate, and
contain these fluids within quite delicate structures, such that poking
even a small .223 caliber hole in the aorta, for example, will cause virtually all mammals to stop working.
Even a single cell in some ways functions in a machinelike, non-ecosystem fashion--tearing a hole in a cell wall is called lysis, and results in death. However, note that many cells are quite difficult to kill.
For example, Deinococcus Radiodurans,
also known as "Conan the Bacterium", can survive radiation sufficient
to kill even cockroaches (by reassembling its own DNA), hard
vacuum (by forming spores), and various noxious chemicals. A
strain of this bacterium was recently engineered to reclaim mercury and
toluene-contaminated nuclear waste.
Even the tiny, recently-emerged 5kbp canine parvovirus
can survive alchol, acids, lye, freezing, and 120 degree water.
The only known way to kill it is with bleach, which dissolves its tough
coat.
But we can use ecosystem design to keep our machines running even in
the face of these threats! For example, one beautiful design your
body uses to repel invaders is a set of proteins with selectively
sticky parts. These are designed to stick to foreign material such as a
virus, and flag it for disposal by the immune system. When found,
the immune system also creates more proteins with that kind of
stickiness. Even better, each protein, called an antibody, has two
identical sticky pads, which tends to bind together antibodies and
viruses into long folded-up chains that can easily be identified and
destroyed. Within a few days, the immune system cranks up
antibody production to the point where viruses floating around in the
blood almost immediately get stuck to antibodies and eliminated.
This is why you're immune to a disease you've already been exposed to,
either through catching the disease naturally, or by having the disease
proteins artificially introduced into your body during vaccination.
So, the bottom line is that biology is nanotech on an amazing scale, and offers spectacular
possibilities for information density, processing speed, and
reliability. The downsides are that designing systems that work
is a lot trickier on the small scale, and also the possibility of a
mankind-killing genetically-engineered superflu.