Tuesday, November 29, 2016

225: A Crazy Kind of Computer

Audio Link

Before we start, just a quick reminder:  the Math Mutation book would make a perfect holiday present for the math geeks in your life!   You can find it on Amazon, or follow the link at mathmutation.com .   And if you do have the book, we could use a few more good reviews on Amazon.   Thanks!   And now, on to today’s topic.

Recently I learned about a cool trick that can enable very rapid computation of seemingly difficult mathematical operations.  This method, known as stochastic computing, makes clever use of the laws of probability to dramatically cut down the amount of logic, essentially the number of transistors, needed to compute elementary functions.   In particular, a multiplication operation, which takes hundreds or thousands of transistors on a conventional computer, can be performed by a stochastic computer with a single AND gate.   Today we’ll look at how such an amazing simplification of modern computing tasks is possible.
    
First, let’s review the basic elements of standard computation, as realized in the typical design of a modern computer.   At a logical level, a computer is essentially built out of millions of instances of three simple gate types, each of which takes one or two single-bit inputs, which can be either 0 or 1.  These are known as AND, OR, and NOT gates.   An AND gate returns a 1 if both its inputs are 1, an OR gate returns a 1 if at least one of its inputs is 1, and a NOT gate transforms a single bit from 0 to 1 or vice versa.   An operation like multiplication would occur by representing your pair of numbers in binary, as a set of 0s and 1s, and running the bits through many AND, OR, and NOT gates, more or less replicating the kind of long multiplication you did in elementary school, with a few optimizations thrown in.    That’s why it takes so many gates to implement the multiplier in a conventional computer.

So, with this being said, how can we reduce the multiplication operation down to a single AND gate?   The key is that instead of representing our numbers in binary, we send a stream of bits down each input to the gate, such that at any moment the probability of that input being 1 is equal to one of the numbers we are multiplying.   So for example, if we want to multiply 0.8 times 0.4, we would send a stream where 1s appear with 80% probability the first input, and one with 40% 1s down the second input.   The output of the AND gate would be a stream whose proportion of 1s is 0.8 times 0.4, or 0.32.
    
The reason this works is due to a basic law of probability:   the probability of two independent events both happening is equal to the product of their probabilities.   For example, if I tell you there is an 80% (or 0.8) chance I will record a good podcast next year, and a 40% (or 0.4) chance I will record a bad podcast next year, the chance I will record both a good and bad podcast is 0.8 times 0.4, or 0.32.  Thus there is a 32% chance I will do both.    The fact that this ANDing of two probabilistic events is equivalent to multiplying the probabilities is the key to making a stochastic computer work.   
    
Now if you’re listening carefully, you may have noticed a few holes in my description of this miraculous form of computation.   You may recall from elementary probability that there is one major limitation to this probability-multiplying trick:  the two probabilities must be *independent*.  So suppose you want your stochastic computer to find the square of 0.8.   Can you just connect your 80%-probability wire to both inputs of the AND gate, and expect an output result that is 0.8 time 0.8 (or 0.64) ones?   No— the output will actually just replicate the input value of 0.8, since at any given time, it will be 1 if and only if the input stream had a value of 1.   Think about my real-life example again:  if I tell you there’s an 80% chance I’ll record a good podcast next year, what’s the chance I’ll record a good podcast AND I’ll record a good podcast?   Stating it redundantly doesn’t change the probability, it’s still 80%.   To compute a square operation in a stochastic computer, I need to ensure I have two *independent* bit streams that each represent the number I’m squaring.   So I need two separately-generated streams, each with that 80% probability, and can’t take the shortcut of connecting the same input twice.   If performing a series of computations in a stochastic computer, a designer needs to be very careful to take correlations into account at each stage.  
    
To look at another challenge of this computing style, let’s ask another basic question:  why doesn’t everyone implement their computers this way?  It’s not a question of technology— stochastic computing was first suggested in the 1950s, and such devices were actually constructed starting in the late 1960s.   Most importantly, generating independent bit streams with the probabilities equal to all the inputs of your computing problem, and then decoding the resulting output stream to find the probability of a 1 in your results, are both very complex computing tasks in themselves.   So I kind of cheated when I said the multiplication was done solely with the single AND gate.   Processing the inputs and outputs requires major designs with thousands (or more) of logic gates, quickly wiping out the benefit of the simplified multiplication.    This isn’t a total barrier to the method though.   The key is to find cases where we are able to do a lot of these probabilistic operations in relation to the number of input bit streams we need to create.    This can even be an advantage in some ways:  unlike a conventional computation, if a stochastic computer spends extra time performing the same computation and observing the output, it can increase the precision of the result.   But these issues do mean that finding good applications for the method can be rather tricky, as complex input and output generation must be balanced against the major simplification of certain operations.

Although stochastic computers were first constructed almost a half-century ago, they were then mostly abandoned for several decades.   With the breakneck pace of advances in conventional computing, there just wasn’t that much interest in exploring such exotic methods.    But with the recent slowdown of Moore’s Law, the traditional doubling of computer performance every 18 months or so, there has been a renewed interest in alternate computing models.   Certain specific applications, such as low-density parity check codes and image processing, are well-suited to the stochastic style of computing, and are current subjects of ongoing research.    I’m sure that in coming years, we’ll hear about more clever solutions to common problems that can be better performed by this unusual computing method.

And this has been your math mutation for today.



References: