Sybase's data warehousing approach is fundamentally different from other relational database providers. Sybase believes that the traditional relational approach and the ROLAP method is inefficient, in order to achieve sufficient performance, must be through high cost, the additional hardware, software, resources, money, time to invest heavily, otherwise impossible to achieve. As a result, Sybase has developed a new reverse relational database-it is based on columns rather than on rows.
A typical comparison between Sybase IQ and traditional database in data compression
When we began to look at Sybase IQ, we found that Sybase's argument about the benefits of using the column method was quite convincing. However, through the increasing demand for data warehouses, the rapid growth of data from the web and the analysis and reporting from the user (not to mention the upcoming RFID applications), and the customer experience, we can now prove that Sybase IQ provides a much better approach than traditional relational database vendors. Let's take a look at the Sybase IQ's revolutionary Innovations in column storage, data compression, indexing, and more.
Column Storage
Unlike traditional relational databases whose data is stored in rows in a table, Sybase IQ is stored and accessed through columns in the table. Although this approach is clearly not suitable for trading environments where a transaction is valid for a single row of data, in a query process environment it is clear that the query is selected based on a specific column.
One of the important benefits of the column approach is that the entire database is automatically indexed because the selection rules in the query are defined by columns. In fact, the situation is not as simple as the above, and Sybase IQ has various methods to support the array-based index, which we will discuss below.
Another result of using column methods is that Sybase IQ is more efficient in compression than traditional relational databases (the effect can be increased by 5 times times, according to Sybase). This reason, needless to say, is because all data fields in the same column have the same data type. Therefore, each column can be compressed for optimized efficiency and retrieval. In contrast, based on row storage, each domain has different data types, which is ideal for trading processes. In such an environment, it is not feasible to constantly transform the ideal compression algorithm, which means that any compression will probably be a minimum common rule.
Another important advantage of a column based approach is that it is entirely based on the amount of data that is read out. Whenever you access data from a traditional database, you need to read out the entire line, regardless of which fields you are interested in. In fact, this could mean reading 300 bytes of data and retrieving only 20 characters of data. But to read data based on columns, you just need to read out the data you want to know. Of course, when you read a single record, the difference in performance can be ignored, but many queries require a full table scan. When reading millions of rows of data, the difference in performance can be significant.
Sybase believes that the Sybase IQ column storage provides better performance than normal ROLAP methods, and IQ does not need to support parallel processing of hardware like most competitors or Sybase adaptive Server Enterprise (ASE). In particular, Sybase points out that the problem associated with data partitioning is the need to support parallel mechanisms for hardware. Obviously, partitioning poses a lot of problems (not to mention additional maintenance), regardless of partitioning, but it opens up a substantial path to performance improvement. However, Sybase further elaborated that this is merely a compensation mechanism for the poor performance inherent in the line based approach.
Sybase has many facts to support its assertion, but that does not mean that Sybase avoids any form of data partitioning. However, unlike a horizontal partition, Sybase IQ implements a vertical partition, which means that the partitions are based on columns rather than rows. One of the advantages of this approach is that partitions never become unbalanced because each column in each table has the same number of fields. This greatly reduces the maintenance requirements of the management partition, while eliminating the organization of the database, which needs to be organized when the partitions become unbalanced and begin to affect performance.
Finally, it should be noted that Sybase IQ does not avoid using OLAP. For those who want to query in a relatively predefined way at the aggregation level, OLAP has an obvious advantage. Based on this, Sybase supports OLAP feature attributes such as permutations, percentages, and averages.
Data compression
Traditional database engines do not compress data in a common way, mainly because of the following three problems:
1. Storage by row data storage is not conducive to compression. This is because data, mostly binary data, is not duplicated in this way. We found that the data stored in rows can have a 5%~10% compression ratio.
2. For many 2KB and 4KB binary data pages, the additional overhead for compression and decompression is too high;
3. In an OLTP environment, a large number of reads and updates are mixed together. Each update requires a compression operation, while the read only decompression operation, most of the data compression algorithm in the compression than the decompression 4 times times slower. This overhead will significantly reduce the transaction efficiency of the OLTP database engine, making data compression costly and almost unbearable.
In Data warehouse applications, data compression can be a much smaller cost in exchange for greater benefits. This includes reducing requirements for storage and increasing data throughput, which is equivalent to reducing query response time.
Sybase IQ uses data compression. This is because data is stored in columns, adjacent field values have the same data type, and the binary value is usually much smaller, so compression is easier and compression is higher. Sybase IQ typically has a compression greater than 50% for data stored by columns. Larger compression ratios, plus large page I/O, make Sybase iq less demanding for storage while gaining excellent performance in queries.
In a traditional database, the index built to improve query performance often takes up 3-10 times more disk space than the data itself requires. Sybase IQ Storage data occupies the disk space is usually only the original data file 40%-60%, is a traditional database occupies a fraction of the space.
Intelligent compression technology, combined with sophisticated index structures and column storage, gives IQ a much higher storage effect than other database engines. This will result in lower storage costs and higher performance (because the system requires only a small amount of disk I/O to read or write to any given database block).