Forum


You must be logged in to post Login Register

…and this database is Just Right
Read original blog post

UserPost

6:54 pm
May 26, 2009


cwardell

Admin

posts 5

1

I have been fortunate to have created data warehouses in Oracle, Teradata, Netezza, SQL Server and Vertica. (I did a POC with Greenplum when it was called Metapa but it was not quite ready for prime time.)

I have never been able to achieve performance on multi-terabyte databases using SMP (Symmetrical Multi Processing) servers and OLTP databases. In my opinion, the relational/OLTP databases like Oracle, SQLServer, MYSql, etc.. are just the wrong tool for adhoc queries, analytics and reporting of VLDB’s (very large databases). The results I obtained when utilizing MPP have been staggering when compared to the response times of query and analytics of OLTP databases.

There is no theoretical size limit on databases based on MPP architectures. (Massively parallel processing). The more nodes, SPUS, blades, or pizza boxes you have in your cluster, the more capacity your database has and the faster it will perform.  With that said, I am pretty confident that Teradata, Netezza and Vertica can scale to the Petabyte size. There seems to be no real inhibiting factor that I know of. 

 

So what does one choose?

I maintained a Teradata environment for about 6 years and have gone under the hood pretty deeply with Netezza and Vertica. I have programmed at the FPGA level for Netezza and had the opportunity to pick the brains of the senior engineers at Vertica for a while. I am pretty confident that at this point, the MPP technologies I mention below should handle the workload.

The question comes down to what is your budget and business need. 

 

Teradata

Teradata has a wide range of logical models and applications they can sell along with the WH. Teradata supports transactional processing and ad-hoc queries very nicely. Teradata also has a professional services team which contributes a large portion of their revenue, but all this comes at a premium price.

 

Netezza

Netezza is a beast of a machine that does full table scans extremely well, their FPGA technology is very impressive, and the zone indexing does a nice job in minimizing I/O. Netezza’s secret sauce is in their FPGA (Field programable gate arrays). Not to long ago, Netezza opened up the FPGA to developers and now give tremendous power by creating FPGA based UDF’s (User defined functions). These UDF’s perform exceptionally well and can put the afterburners on your query and analytics if done correctly. Netezza has proven the test of time and has a pretty impressive client list and is a very viable alternative to Teradata.

 

Vertica

Vertica is relatively a new comer to the VLDB space and they are definitely the one to watch. I like the technology and the fact that Vertica is led by Stonebraker who is a veteran in the industry. ALso quite notable is Jerry Held (Exec from Oracle and Tandem) who sits on the board.  You will need to grasp a slight paradigm shift in the way their database operates but it supports ansi standard SQL and ODBC connections so you shouldn’t have any problems. Vertica’s compression algorithms are extremely clever and when combined with a columnar data store they are breaking records. Vertica will definitely give you an A-HA! moment once you understand how the magic happens. 

By the way, Syncsort and Vertica broke the world TPC record for data loads at a fraction of a price when compared to Netezza and Teradata.

 

Summary

Technically they will all crunch the data and they will all do it MUCH faster than OLTP based databases. Some questions you will need to consider:

 

  • Q) Do I need or am I looking at other logical data models and applications to kick start my project?
  • Q) Do I want to leverage a professional services team?
  • Q) Do I need user defined functions or stored procedures?
  • Q) Does purchasing proprietary hardware scare you?
  • Q) Do I have a substantial budget or is this a warehouse on a shoestring?
  • Q) Is my DBA adaptable?
  • Q) Do I have a mixed workload?
  • Q) How many users will be hitting the database concurrently?

 

I know both Vertica and Netezza offers proof of concepts.

TAKE ADVANTAGE OF THE POCS!


One approach you may want to consider is to start with a technology like Vertica. They offer a free trial and it installs easily on commodity hardware. This will provide the lowest cost of ownership and it just may fit your needs.  In parallel, work on setting up the POC with Netezza. This may take a little longer because staging a Netezza environment within your walls has many logistics involved.  Choosing the right technology is a crucial step in the development of a Very Large Data Warehouse. You should take your time on this critical phase and choose wisely. 


Read original blog post