Welcome Guest | |
Follow Us:
    
Newsletter Signup:
Columns Better Than Rows For Data Warehouses
When are columns better than rows as the way to organize your database?The standard business database is a relational system that organizes data into tables, built with vertical columns and horizontal rows By Charles Babcock, NWC, March 01, 2008

When are columns better than rows as the way to organize your database?
The standard business database is a relational system that organizes data into tables, built with vertical columns and horizontal rows. Oracle, DB2, SQL Server, Sybase, and MySQL all are organized around the rows. But when it comes to data warehouses, said Michael Stonebraker, the former Berkeley database guru, it's better for performance reasons to focus on the columns.
He should know. He was the original architect of row-based Ingres and pioneer of relational databases used for the past 25 years in business. The row-based system is great for executing transactions, but a column-oriented system is a natural fit for data warehouses, he said. "You get a factor of 50 times better performance on all classes of queries used to run most data warehouses," he says. Stonebraker is now CTO of Vertica Systems, supplier of the column-based Vertica database system. Vertica 2.0 was unveiled at the Data Warehousing Institute's World Conference last month.
Columns cut across individual transactions, and store an element of information that is standard to each transaction, such as the customer name or address or amount of purchase.
That's because a data warehouse frequently stores information on a company's transactions, with each transaction having many parts. A row may hold the 20 to 200 different elements of the typical transaction, such as customer name, address, and amount of purchase. Columns, on the other hand, cut across transactions and represent one piece of information from all transactions. Data warehouse queries typically want to ask questions that that draw information out of all transactions, or large sets of them, rather than look at individual transactions. For example, a typical query might ask: What was the average sale during the month of February? Retrieving the Sales column gets all the data the query needs to determine the average.
A standard relational database, however, would retrieve all the rows reflecting individual sales for the month, load them into system memory, then find all sales records and generate an average from them. So being able to focus on the Sales column, the way Stonebraker and Vertica 2.0 does, leads to data warehouse query performance improvement.
Stonebraker said a second performance benefit flows from taking a column approach. Because columns contain similar information from each transaction, it's possible to derive a compression scheme for the data type, then apply it uniformly throughout the column. Rows can't be compressed as easily because the nature of the data varies widely from record to record in the row, such as name, ZIP code, and balance owed. Each would require a different compression scheme.
Compressing data in columns makes them quicker to store and retrieve and reduces the amount of disk required for storage. "In every data warehouse I see, compression is a good thing," said Stonebraker. And he claimed Vertica's compression approach is "2-3 times more effective. Columns compress better than row stores," he said.
Arnette said he expects Sonian's e-mail archive to grow rapidly from a few terabytes to 10 petabytes (10,000 TB) sometime in 2009. "We needed a system that was optimized for high-volume reporting," said Arnette. In Sonian tests, "it was a lot faster than MySQL and requires less CPU power," he claimed. He didn't test Vertica versus Oracle because of Oracle's "license fees and cost of administration. "Stonebraker said the performance advantage is most evident with large data warehouse systems. "I expect the data warehouse market to become completely column-store based, just because it is inherently higher performance," he said.



blog comments powered by Disqus
Featured Videos


 
    
 
Future Strategist Award
Who's next in line for the CIO position?
As a CIO you mentor someone in your organization for the future IT leadership role. InformationWeek would like to acknowledge and felicitate that special person at an awards ceremony at Interop
Top Stories
Interview
CIOs must leverage social media to increase their presence in the boardroom
Arun Sundararajan, NEC Faculty Fellow and Associate Professor at New York University’s Stern School of Business, discusses with InformationWeek the relevance of social media to the overall business, and how CIOs must handle social media
BankTech India - IT News for BFSI Segment
We're on Google+
InformationWeek India on Facebook