Tuesday, October 7, 2008

Why Would Anyone Need a Computer Cluster?

How is a cluster used and can you give a real-life example of an application? - was a question asked by a commenter in response to a previous blog post called 24-Core Linux Cluster in a $29.99 Case from IKEA.

A computer cluster simply gives you more processing power than you have with one computer alone. For normal computer use and typical user applications, one computer is all you need. But for people who desire to write their own algorithms to solve specific problems they are working on, they sometimes find that just having one computer for crunching numbers will tie up their single computer for hours, days, or weeks before it converges on a solution! Suppose I wanted to do a very nerdy thing and create a 20,000 by 20,000 pixel high-resolution image of the Mandelbrot Set to hang on my bedroom wall. Knowing that the 400 X 400 pixel Mandelbrot Set images I created with a simple Java program, here, took about 15 seconds to render, it would take roughly 37,500 seconds or 10.4 hours to finish my 400,000,000 pixel image with a single computer. Finding a printer to print that would be another story. That's a pretty impractical example, but for people who do graphical rendering for computer-generated imagery, like for the movie Toy Story, where each frame can take 3 hours to render, more computational power is needed.

A computer cluster is just a bunch of computers connected to a local network or LAN. An application written to run on a cluster needs to be designed specifically to utilize the cluster. Usually there is the "main" application running on one main computer that breaks the problem up into manageable chunks and sends them out to the "worker" applications running on the other nodes of the cluster. Using the Mandelbrot Set example from above, if I had an 11-node cluster to render my 400,000,000 pixel image, the computer running the main application would break the image space up into ten 40,000,000 pixel chunks and send a command to the workers with instructions on what part of the image to render. The workers just sit there doing nothing until they receive a command through the network to do something. When they finish, they send their rendering back to the main computer and wait for further instructions. Once all ten workers send their renderings back to the main computer, it can stitch the 10 sub-images together to create the full image. The entire rendering would be about ten times faster. In addition, if each worker computer had quad core processors and you designed your application to utilize multi-threading in the right way, you could get the job done 40 times faster! It's all about true parallel processing.

Another application I can think of that would be fun to design would be for swarm intelligence modeling. The individuals in ant swarms, bird flocks, or schools of fish for example only interact locally with each other and their immediate environment. They follow very simple rules at an individual level, but there are no global flock or swarm rules for group control. Yet out of these local interactions of hundreds or thousands of individuals, complex global behavior emerges. Maybe with some investigation, a practical application could be derived from using swarm intelligence. Other applications that try to predict a future event within highly complex environments would also benefit from a cluster. I would guess that the application that the National Forest Service uses to predict the path of forest fires uses a cluster because massive amounts of burn simulations over a 3-D surface need to be executed to make a probable guess as to where the fire will go next. And it has to be done really fast because the fire isn't going to wait for you and your single computer!

The Helmer cluster could be used to trade stocks. One could develop stock prediction algorithms that trade in real-time and are able to react to market changes the second they occur, which would come in handy in today's crazy market. The algorithms would require being optimized using evolutionary algorithms to match the unique behaviors of different stocks! Yes, that's right - an algorithm used to optimize an algorithm. Evolutionary algorithms, inspired by actual biological evolution - the tweaking and mutation of DNA at every generation - require immense computation. A starting point is defined, the traits of the final desired trading behavior is defined, and the evolutionary algorithm starts crunching numbers until it converges on a solution. It would be prohibitively slow to try to do that on just one computer, hence the need for a computer cluster.


Anonymous said...

Hi Tim, I've followed your comment on my blog about Helmer and stumbled upon this nice introductory article on clusters. We're running a research project in our university based on a cluster (7 pcs with RHE). I'm not directly involved in it apart from the fact that we searched with no luck for an implementation of Ubuntu.

Unknown said...

What would you recommend for a cluster setup? I have a mini search engine (MySQL database) thats hosted by bluehost and I've peaked their server (dual 6 core intel). This week I'm building my first AMD quad server and want to build more and make a cluster.

You can see how slow my site is running. www.webgazing.com

Tim Molter said...

Admin, I'm not an expert by any means, but when I was looking into clusters, it didn't make any sense at all to spend outrageous dollars on rack servers with cutting edge server processors. I liked the idea of building quad core AMD boxes. First off, I could do it for ~$400 each. And secondly, if one of the nodes crashed, I could cheaply fix it myself relatively quickly. I then realized that a large proportion of the $400 was spent on the computer case. Also, having all those computers would eventually take up a lot of physical space. I then decided to build 6 nodes into an IKEA cabinet to save space and money on cases. It turned out very nice. Here's the URL if you haven't seen it.


Another option that I would consider is just simply buying heavy duty shelving and just laying out the nodes on the shelves and forgo the computer cases or some kind of cabinet altogether.

Theophilus said...

So, with all this in mind, my question arises:

What do YOU use YOUR helmer cluster for?

I was on a video project last year, and in retrospect, one of these would have been glorious to be able to have... Maybe next time :P.. video seems to be one of the biggest home uses I could see from a personal cluster system... Most of these other ones seem to be better applied in theory then in real life (thus my question :P)

Tim Molter said...

@Theopphilus - We built it to run genetic algorithms to continuously optimize a stock trading algorithm that had 10 to 20 free variables. In the mean time we reduced the number of free variables down to just a few so we don't need the cluster anymore. The individual nodes of the cluster are now used as HTTP servers. Cheers!