An in-depth interpretation of Oracle Exadata Technology

Source: Internet
Author: User
Keywords Join each
Tags access applications based block cache compared data database server

"IT168 Technical Documentation" Since Oracle and HP launched Exadata, I have been very concerned about this product, and previously wrote an Oracle database machine introduced it. Last year, Oracle and Sun merged to launch Oracle Exadata V2, which has several changes compared to previous generations: first, using sun hardware; second, it claims to support OLTP applications; third, Oracle 11g R2 offers more new features.

Exadata Smart Flash Cache

Exadata V2 The overall architecture has not changed much, with sun hardware, in addition to using Intel's latest Nehalem CPU, each storage cell is configured with 384GB of Flash, which is why V2 can support OLTP applications.

Flash cache is completely automated management, Oracle will be based on data access to determine which data in the flash cache. All data are first written to the normal disk, and then read the flash cache according to the access situation, so if the flash card failure, the data will not be lost. Of course, Oracle provides a way for users to manually put a table or index pin in the flash cache.

In addition to automating management, Oracle also allows users to manually create Flash workloads, like regular disks, where these flash workloads are exported to the database via ASM, where users can put data files that are accessed very frequently. These flash workloads are not just cache, so ASM mirrors the cell and cell. If a block card fails, then the entire storage cell flash workloads will be offline to ensure that the data will not be lost.

Smart Scan

Smart Scan is the most important function of exadata, it is to put SQL on each cell to run, and then each cell only return the eligible data to the database, which greatly reduces the database server load and network traffic, And make full use of cell computing resources and IO resources.

The traditional way: all data needs to return to the database server, network bandwidth requirements, all the calculations on the database server completed.

Smart Scan: Only returns eligible data, reduces network bandwidth, and leverages computing and IO resources on the cell.

One thing to note here is that when using smart Scan, each cell returns to the DB server as a result set, not as a traditional block,db server completes the processing of the result set and returns to the client.

What does Smart scan do with joins? It's something I've been trying to figure out. In fact, smart Scan can only handle join filtering, and the real join work must be done on DB server, and Smart Scan is only suitable for complex joins that handle the DSS environment, and for simple OLTP types Join,smart Scan does not play its advantage. Imagine the following query:

Select E.ename,d.dname from EMP E, Dept D where and E.ename= ' Jacky ' and e.deptno=d.deptno;

Assuming that the nested loops Join,smart Scan can only complete the filter of the e.ename= ' Jacky ' condition, then return the data of the Eligible EMP table to DB Server, and then the DB server completes the join work. Query Dept Table (E.DEPTNO=D.DEPTNO) data. So smart scan is not suitable for nested loop join (I think the smart scan is enabled only if it is appropriate), and only a complex join with large data volumes in the DSS environment will play an advantage. and smart Scan can only complete filtering work, and can not really complete join work, this and Greenplum database is different (interested can see my article, Greenplum Technology Analysis). Imagine the following query (EMP and dept are large tables):

Select E.ename,d.dname from EMP E, dept D where E.deptno=d.deptno;

Assuming a hash join, the smart Scan can only return all two table data to the DB Server for a JOIN operation because there are no filtering conditions, but smart scan is not a bit useless, at least for column filtering, Just return the fields you need.

Oracle's documentation has mentioned that for a large table and a small table join, smart Scan will use Bloom filter to quickly position (see my previous article, interesting bloom filter). The method is to make the small table a bloom filter, and then scan the large table on each storage cell, using Bloom filter to quickly locate the eligible results and return to the DB Server for a join.

Storage Index

Store indexes, as the name implies is the index established at the storage level, simply to create an index for each column of data in the table, each index entry record the maximum value of a data interval, the minimum value and their physical location, the document says 1MB data corresponds to a index entry, see the following figure:

If we query b<2, or b>8 data, based on the storage index, we can skip these blocks of data that are not between Min and Max, greatly improving the speed of the scan, which is the meaning of storing the index.

Hybrid Columnar Compressed

First we need to figure out, what is row compression, what is called column compression. Our familiar databases, such as Oracle,mysql, are database based on rows, where different fields of rows are physically stored together, and one is a database based on columns, where the different rows of each field are physically stored together. Their strengths and weaknesses are equally prominent:

A row-based database is convenient to access a row, but because the data in the same column is stored separately, the entire table is scanned for almost any results if you want to query for a column. Based on row database compression, called row compression.

Columns based databases, because the data for the same column is physically placed together, so it's handy to access a column, which means that if you're querying for a column, you don't need to scan the entire table, you just need to scan the data for that column, but it's very inconvenient (and nonsense) to access all the fields in a row. Compression based on a column database, called column compression.

Oracle's usual compress functionality (including the 11g R2 advanced compress) is row compression because Oracle is a row based database. The approximate method is to hold a symbol table in the block header, and then place the same value there, with the same data on each line pointing to symbol table to compress the object. The effect of row compression is usually bad because we know that there is not much of the same data between rows and rows. Column compression is different, however, because the same column has the same data type, so it is easy to achieve a good compression effect.

Both row and column compression have their advantages and disadvantages, and Oracle's mixed-column compression technology, in fact, combines the high compression ratio of column compression and the access characteristics of the row database, combining the advantages of both. Oracle proposed the concept of Cu (Compress unit), within a CU, is a column based storage method, with column compression, but a CU holds all the field information of the row, so between Cu and Cu, Oracle is a row based database, access a row, Always only within a CU. Each CU is made up of contiguous blocks, and the CU header records the distribution of each column of each row in the Cu, and in mixed-column compression mode, a row is typically spanned across multiple blocks.

So mixed column compression, combined with column compression and row access characteristics, that can provide a very high compression rate, but also a good guarantee based on row type access.

Another important feature of Exadata is IO resource management, which we can use to manage IO resources If we deploy many databases on a exadata.

At present, I have not yet learned that there are exadata applications in China, and the data are relatively few. Hopefully there's a chance to actually test its performance, and I don't doubt his performance in a DSS environment, but the application of OLTP types is really as strong as Oracle says, yet to be validated.

Original address: http://www.dbanotes.net/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.