Posts From This Author
About Our Authors
Seeing Our World Through Columnar Eyes
By Steve Hoberman on September 20, 2010View Full Bio →
Glad to see you again for this blog posting. I have been talking about the Cloud the last few blog postings. In this one let’s talk about columnar databases. A columnar DBMS is a database management system (DBMS) which stores its content by column rather than by row. The analogy I frequently use for a database table is a spreadsheet. Each row in the spreadsheet represents one entity instance or database row. For example, ‘Order Number 123’, ‘Placed on April 1st, 2010’, ‘Amount of $50’ – would all appear on one row in a traditional database. In a columnar database it would also appear on one row logically, yet physically in the database it would be equivalent to turning the spreadsheet on its side, where one row might contain all of the order numbers, another row all of the order placed dates, and another row all of the amounts.
This has advantages for business analytics applications, where calculations are performed over large numbers of similar data items. For example, if someone wanted to see the total of all order amounts it would be quicker to perform a sum if all of the order amounts were stored next to each other in the database, as is the case with columnar. Scans, joins, and aggregations can be inefficient for row store (traditional) databases to perform because they must read an entire record to use any of its fields.
It’s important to note that how the data is stored is transparent to all but the DBA. That is, columnar databases behave like any other relational database such as use of SQL and primary and foreign keys. Also it is interesting how many synonyms there are for columnar databases, such as column-oriented database, read-mostly DBMS, column store database, column-vectored database, column-based relational database (CBRD) and column-wise DBMS.
You may think columnar databases are relatively new, yet the first columnar databases actually appeared back in the 1970s alongside bell bottom jeans and long hair. The most well-known in the 1970s was the RAPID system developed by Statistics Canada in 1976 and used for processing and retrieval of the Canadian Census of Population and Housing as well as several other statistical applications. For many years, the Sybase IQ product was the only commercially available column-oriented DBMS. Today there are many vendors offering columnar databases.
So what’s so exciting about columnar databases? For me as a modeler, it is possible based on the analytic requirements that I can actually deliver a fully normalized (Fifth Normal Form or pretty close to 5NF) data model as my physical data model! No need for materialized views, star schemas, non-unique indexes, denormalization – all of these techniques are compromises made to a logical data model for data retrieval performance gains in the physical database. Because columnar databases are so speedy to begin with for analytics, we don’t have to waste the time thinking about how to best compromise the design, and then after all this time and effort still wind up with a physical design that is not very robust or extensible. We can deliver an optimal model and have it perform brilliantly!
One columnar database vendor, ParAccel (PADB for short), said this in a recent white paper which supports this idea: PADB allows companies to rely on a more normalized schema which can reduce data redundancy as compared to other analytic solutions. This heightens extensibility and efficiency, and facilitates adoption of server technology improvements as they occur (e.g., Moore’s Law). With schema flexibility, database administrators can devote more time to providing new business applications and serving the ever broadening needs of users, instead of tuning existing applications for small incremental gains in performance.
Exciting stuff! Until the next blog… Glad to see you again for this blog posting. I have been talking about the Cloud the last few blog postings. In this one let’s talk about columnar databases. A columnar DBMS is a database management system (DBMS) which stores its content by column rather than by row. The analogy I frequently use for a database table is a spreadsheet. Each row in the spreadsheet represents one entity instance or database row. For example, ‘Order Number 123’, ‘Placed on April 1st, 2010’, ‘Amount of $50’ – would all appear on one row in a traditional database. In a columnar database it would also appear on one row logically, yet physically in the database it would be equivalent to turning the spreadsheet on its side, where one row might contain all of the order numbers, another row all of the order placed dates, and another row all of the amounts.
This has advantages for business analytics applications, where calculations are performed over large numbers of similar data items. For example, if someone wanted to see the total of all order amounts it would be quicker to perform a sum if all of the order amounts were stored next to each other in the database, as is the case with columnar. Scans, joins, and aggregations can be inefficient for row store (traditional) databases to perform because they must read an entire record to use any of its fields.
It’s important to note that how the data is stored is transparent to all but the DBA. That is, columnar databases behave like any other relational database such as use of SQL and primary and foreign keys. Also it is interesting how many synonyms there are for columnar databases, such as column-oriented database, read-mostly DBMS, column store database, column-vectored database, column-based relational database (CBRD) and column-wise DBMS.
You may think columnar databases are relatively new, yet the first columnar databases actually appeared back in the 1970s alongside bell bottom jeans and long hair. The most well-known in the 1970s was the RAPID system developed by Statistics Canada in 1976 and used for processing and retrieval of the Canadian Census of Population and Housing as well as several other statistical applications. For many years, the Sybase IQ product was the only commercially available column-oriented DBMS. Today there are many vendors offering columnar databases.
So what’s so exciting about columnar databases? For me as a modeler, it is possible based on the analytic requirements that I can actually deliver a fully normalized (Fifth Normal Form or pretty close to 5NF) data model as my physical data model! No need for materialized views, star schemas, non-unique indexes, denormalization – all of these techniques are compromises made to a logical data model for data retrieval performance gains in the physical database. Because columnar databases are so speedy to begin with for analytics, we don’t have to waste the time thinking about how to best compromise the design, and then after all this time and effort still wind up with a physical design that is not very robust or extensible. We can deliver an optimal model and have it perform brilliantly!
One columnar database vendor, ParAccel (PADB for short), said this in a recent white paper which supports this idea: PADB allows companies to rely on a more normalized schema which can reduce data redundancy as compared to other analytic solutions. This heightens extensibility and efficiency, and facilitates adoption of server technology improvements as they occur (e.g., Moore’s Law). With schema flexibility, database administrators can devote more time to providing new business applications and serving the ever broadening needs of users, instead of tuning existing applications for small incremental gains in performance.
Exciting stuff! Until the next blog…
Follow all Expert Blog updates by subscribing to the
RSS feed.
About the Author
Steve Hoberman is one of the world’s most well-known data modeling gurus. He understands the human side of data modeling and has evangelized “next generation” techniques. Steve taught his first data modeling class in 1992 and has educated more than 10,000 people about data modeling and business intelligence techniques since then.
There have been no comments yet.




















