HRG

 

Column Oriented Database

hrgresearch.com

 

 

Column Oriented Database

 

Databases have always been the most important asset of a company and therefore, have continued to evolve over the past few decades. Initially there were hierarchical, VSAM, BDAM, and other similar non-relational databases. These databases were usually used in transactional environments and batch processing. Their performance was excellent for business needs at that time. The problem with these databases was the amount of work required to extract and manipulate data, where the rules were constantly changing. To solve this problem, the relational database technology emerged on the scene. The relational database management (RDBM) systems were slower than their predecessors in the area of transaction processing, but were much easier to use. They allowed for the extraction and manipulation of data with one database statement (SQL – structured query language), and also allowed for ad hoc queries. These RDBM systems were row based relational databases, which meant that a database consisted of many rows or records, where a typical record or row consisted of information pertaining to a unit of information. For example, a record may consist of a person’s name, home address, telephone number, gender, SSN, etc. And when a request for the person’s information was requested by the RDBM system, it used the name as a key and read the record in the database that matched the name. As systems and technologies advanced, RDBM system performance also improved to where they were suitable for transaction processing, which is based on processing one or more records to complete a transaction.

 

RDBM systems also started to be very popular for generating data warehouses, where the data from disparate sources could be placed into a RDBM system. With the use SQL, data could be extracted, combined, and manipulated for analysis while generating reports with a minimum of effort. Ad hoc queries on the data were available to extract data very quickly and efficiently. However, these ad hoc queries had to be managed, only a few could execute at one time due to database performance and contention.

 

As in any business environment, things change. Companies grow and merge with other companies, and therefore, the data continues to grow. Business organizations also discovered the power of analyzing the data to predict trends and promote existing and new business opportunities, thus the emergence of data analytics. Unfortunately, the existing RDBM systems could not keep up with the demand for extracting and analyzing the data. Some queries could take up to three or four hours to complete, so other techniques had to be developed like data aggregation and in-memory data. Of course, these techniques took more resources, time, and management, which limited the analysis to a subset of the data.

 

To address these shortcomings of the current RDBM systems, a relatively new technology has emerged and is gaining ground, column oriented DBMS. In the column oriented DBMS, the data is stored not as rows but as columns, thus, instead of retrieving a record or row at a time, an entire column is retrieved. In our example above, instead of retrieving the record which contains only the columns for one person (name, address, telephone number, etc), the DBMS would retrieve an entire column of data, like the telephone numbers column. This becomes very powerful and efficient since data analytics is usually concerned with only one field or column of a record. The access becomes much faster, and much more relevant data can be extracted from the database in a shorter period of time.

 

Technical

 

With column oriented DBMS the data is stored differently allowing for faster retrieval of data for the purpose of analyzing the data. In addition, vendors of column oriented DBMS use data compression to speed up the access of data and reduce the size of the database. By compressing the data, more data can be read from hard disk at one time. Seek time is incredibly slow compared to the other delays in computers, and seek time is improving at a slow rate relative to CPU power, so efficiency in this area is extremely important.

 

There are other advantages of column oriented DBMS. Column data is of uniform type therefore, there are some opportunities for storage size optimizations available in column oriented data that are not available in row oriented data. For example, many popular modern compression schemes, such as LZW (Lempel-Ziv-Welch: is a universal data compression algorithm), make use of the similarity of adjacent data to compress. While the same techniques may be used on row-oriented data, a typical implementation will achieve less effective results. Further, this behavior becomes more dramatic when a large percentage of adjacent column data is either the same or not-present, such as in a sparse column. In addition, some of the column oriented DBMS vendors can perform operations on compressed data, i.e., they don’t have to uncompress the data to perform the operation.

 

Normal SQL can be used to access column oriented DBMS, so no new language is required. Column based DBMS are mostly for query only access, but they do provide update capabilities. Because they are basically query only, they allow for many more concurrent Ad hoc queries without degradation to the system. This allows for different business organizations to access the data without restriction due to performance or database contention.

 

Benefits of Column Oriented DBMS

 

Highly complex query environments that support strategic and operational decisions can be used to gain a competitive edge by better understanding customers, competition, risk positions, revenue leaks, and fraud. Column oriented DBMS allow companies to perform these data analytics. It also allows multiple organizations to access the data at the same time, thus saving time and resources. Data analytics can now be effectively used to grow the business and investigate other avenues of revenue. Many different scenarios or ad hoc queries can be performed on the data to generate trends and analysis in a much shorter time providing a jump on the competition. Comparisons of queries have shown that column based DBMS out perform RDBMS up to and over 100 times. Companies that specialize in data aggregation can also use column oriented DBMS to cost effectively aggregate data for various business analytics processing. 

 

Many vendors provide their column oriented DBMS on standard or open systems hardware and software, thus eliminating additional training by using the same infrastructure. The cost to migrate to a column oriented DBMS is relatively cost effective with the tools provided by the vendor, and in many cases tuning and management of the database is automated, to a degree. Vendors either provide or integrate with the various data integration tools to allow disparate databases to be integrated into the same column oriented DBMS.

 

Conclusions

 

In practice, row oriented architectures are well-suited for OLTP-like workloads which are more heavily loaded with interactive transactions.  Column stores are well-suited for OLAP-like workloads (eg. Data warehouses) which typically involve a smaller number of highly complex queries over all data. The architecture and compression techniques of column based data provides unparalleled performance that allows companies to generate and analyze data in a much shorter period of time, which results in increased revenues, potential new markets, and trend analysis.

 

 


 

 

 

 

 
  .LegalPrivacy PolicySitemaptxt Copyright 2009 Harvard Research Group, Inc. All Rights Reserved.