Can Netezza Search for Aliens?

By cwardell • June 3rd, 2009

 

Can Netezza Search for Aliens?While I am on the innovation “rant” it reminds me of a few discussions I had with some of the brass at Netezza. I have been fortunate to have been invited to the Netezza campus a few times and we had a few blue-sky sessions. (This was before they got their new digs and their IPO.) My first impressions were great. The office had an entrepreneurial spirit and to my surprise the R&D team seemed to be well seasoned veterans. After talking with them for a while, you just knew they have been around the block once or twice.

My name is Charles, but my friends call me Charlie. A running joke with those who know me is that I am a terrible wine snob. Or should I say I am terrible at wine snobbery.  I know all the moves and can put on a good show but couldn’t tell the difference between 2 buck chuck and Grey Poupon.

 

Netezza and my "2-Buck-Chuck"

As a joke one evening, Netezza had brought a bottle of two-buck-chuck along to dinner and I just happened to  love it. Needless to say they left me with a wonderful collection for my wine basement as well as a new nick-name from my friends at Netezza. A great memory and a pretty funny one, but I wanted to share one of the thoughts that came out of that evening. 

Netezza is a massively parallel processing appliance. In the Netezza world, your data repository is spread out as thinly as possible across as many Spu/Blades that it can. Each Spu/Blade is responsible for a small part of the entire data repository.

Let’s keep it simple for now and forget aggregation steps, summations, group by, etc. For my illustration, lets just say I want to perform a straight full table scan query against 1 billion records. If I have even distribution of 1 billion records across 100 SPU/Blades. The MAX LOAD or the MAX # OF RECORDS that any one Spu/Blade would have to read and filter is 10 million records.

In this particular example, a 1 billion record query would responds like it was a 10 Million record query. And the MPP is linear. If I had 200 Spu/Blades the same 1 Billion Record query would respond like a 5 million record query. This is MPP in a nutshell. I know that I am cutting out a LOT of details here but those details are not important for this post.

 

The innovation

Netezza SPU Blade

Each Spu/Blade is essentially a dedicated computer that has its own dedicated:

  • • CPU
  • • RAM
  • • Cache
  • • I/O Channel
  • • Network Adapter Card
  • • Disk
  • • and FPGA.

Although Netezza is focused on crunching enormous amounts of data, there is a huge opportunity waiting to be capitalized on. Netezza has opened up the FPGA. (Field programable Gate Array) to developers. The FPGA is programmed by a subset of the C language and compiled and deployed as a user defined function on each and every SPU/Blade. When each record passes through the FPGA it can be zapped by a native compiled User Defined Function.

You may be saying so what!? User Defined Functions have been around for as long as I can remember. What’s the big deal? How about Distributed Computing? What if the Netezza performance server was used as a massively parallel compute engine. The universities/academia, health research (IE:Cancer, aids, human genome), or the ever important search for extra terrestrial and large prime numbers could all be candidates for an MPP appliance in a box.

 

The next two paragraphs are from WikiPedia

SETI@home (”SETI at home”) is a distributed computing (grid computing) project using Internet-connected computers, hosted by the Space Sciences Laboratory, at the University of California, Berkeley, in the United States. SETI is an acronym for the Search for Extra-Terrestrial Intelligence. SETI@home was released to the public on May 17, 1999.[1][2][3]

Statistics

With over 5.2 million participants worldwide, the project is the distributed computing project with the most participants to date. The original intent of SETI@home was to utilize 50,000-100,000 home computers.[citation needed] Since its launch on May 17, 1999, the project has logged over two million years of aggregate computing time. On September 26, 2001, SETI@home had performed a total of 1021 floating point operations. It is acknowledged by the Guinness World Records as the largest computation in history.[12] With over 334,155 active computers in the system (1.8 million total) in 210 countries, as of August 04, 2008, SETI@home has the ability to compute over 528 TeraFLOPS.[13] For comparison, Blue Gene (one of the world’s fastest supercomputers) peaks at just over 596 TFLOPS with sustained rate of 478 TFLOPS.

 

Certain industries, like Life Sciences, pharmaceutical, financial, can’t go distributing data across thousands of volunteer computers worldwide. Procuring large grids to deploy their computational task is a challenging and costly task. Setting up a 50,000 node grid in a data center is not something most people would want to take on. The HVAC requirements for that size grid is just enormous.

The choice of a PowerPC CPU on the Netezza SPU/Blade was no accident. It runs significantly cooler than Intel or AMD. The reduced heat at the SPU/Blade level allows for a MUCH higher density per rack, smaller foot print on the data center floor, and reduces the HVAC and electrical requirement substantially. Come on Netezza! Where are the “Going Green” campaigns?

Netezza already handles the hardware, distribution, connectivity, and compiler for the SPU’s. As a result,  this may become a very viable approach for computational heavy studies. The next time your screen saver downloads a job packet from seti@home, think of the potential opportunity that Netezza could provide in finding aliens, the Human Genome project, FightAIDS@HOME

I wonder if Netezza could remove the DBOS and Storage and go-to market with a streamlined HPC Grid. (High Performance Computing Environment). Let me know what you think and  what industry you see the potential in.

 

That’s my blue sky for tonight.

Charlie (AKA: 2buckchuck)

 

 

Leave a Comment

« Hacking Vertica? | Home | Vertica on a Stick »