Tim Vincent: Hello, everyone. I've been doing DB2 research since 1991, the first is to do DB2 database research, this is my second time this year to Beijing, I have a lot of experience in DB2, today is very happy to have the opportunity to tell you about our product structure, kind, and my future prospects.
First, introduce the structure of DB2.
In this we need to look at some of the brief introduction to the architecture. Why we develop this product. First of all, it is important to introduce TCO, and some of the innovative technologies we develop are very important for DB2. The 2nd thing I want to talk about is that partitioning is more and more important because we have more data, and it's also important for I/O management, how we can minimize I/O. And then I'm going to introduce XML, why do we develop XML? When we first developed it, we wanted to achieve something. In addition, I would like to explain what our customers have benefited from XML.
Now let me introduce you to another question. Now people are building more and more databases, they put a lot of data in the same place, hoping to help them make timely decisions in business processes. There is now a need for more data warehouse management. They found that the workload had exploded, so the workload had to be managed because the traditional database would have supported traditional workloads, and many new problems had developed since then, and now the system is becoming more complex, with more data and an explosive growth in workloads, How to manage workloads becomes very important.
Now I'd like to introduce some of our best practices, our compression, DB2 fusion of a lot of technology, there are a lot of technologies that are used in many different ways. There is a lot of work that we have done in the best practice and in the end we get paid. Here I give you an example of a compression technology. There are also highlights of our products, such as DB2 9.5, which can be leveraged to Linux, UNIX and Windows, plus data Studio.
First, introduce the architecture of DB2.
First 1991 began to do DB2. There was a trend at that time, the work system more and more work, how to deal with more and more workload? We have a parallel mechanism, we can support the SQL and video programs, we have partitions within and between partitions in parallel. Use SMP to pass multicore architectures. There is also a query rewrite feature for SQL. Parallelism within a query is often required in the Data warehouse. This technique allows your query to be parallel on different computers. Other utilities can also support parallelism, and you can use the same architecture on different computers.
Using SMP, it usually refers to a parallel in a query. So in Linux and Unix we typically use all the CPUs through the operating system's threads and processes, and we can make the best use of SMP in parallel. We have issued a TPCC benchmark, a benchmark for measuring the effectiveness of the use. (figure) for the large use of memory, this is also a very important technology is the buffer pool technology, we will use the page storage, which makes our data and I/O can have a road map. It can connect us to the stored pages, we can use XML on the stored pages, and then we can do a file based on the system operation. Now this file system can be used Ppu file system, while there is a database management, you can through the file system for the desktop management, so we can support I/O, parallel I/O, but also can support the segmentation of data and automated intelligent segmentation, we can do large chunks of i/0. If we go back to file system topics, we now use direct I/O more, which allows us to scatter and collect I/O file systems. Why we are not using a file system buffer, the buffer pool is enough for us to do the file buffer. We can copy the files to the buffer pool faster.
(figure) If we look at our entire architecture, we have a balanced data warehouse where each partition has its own resources, they have their own partitions, different partitions are logical, and there are physical features. You can do a number of different logical partitions on a single machine, each with its own buffer pool, its own logical management, and its own disk. This technique avoids the limitations that are common to scalability, and we do not need a step-by-step management or a consistency between buffers. Therefore, our partitioning is very scalable, scalable, no matter how large TB capacity, to partition is no problem.
There is also a partition, which is called the fast Communication manager, and we have the value of management and information flow. In terms of optimization, we can make some optimizations for different common operations. It also makes it possible to run anything on a cross node, very efficient, very fast. We look at the client side, there are four categories: T2, T4,java J2BC We are support, it is PCL, and now there is pure it can make SQL upgrade, so that you can be well developed in JAVA. When I talk about data studio, I'll tell you how to use query to promote. We'll have an agreement on this piece of communication. It is connected to the database, and it behaves well in terms of connectivity. On the agent side, we support the entire running environment, we can do some processing within the query and detection. When processing this data they are in the buffer pool, the agent gets the data from the buffer pool, and then calculates it. One of the agents can read the data from the disk, we have a prefetch that can extract the data in advance so that I can get asynchronous I/O and get good CPU performance, so make sure I/O is not synchronized.
In addition, we also have a number of page cleaner, it can be related to the page to clear. There are also related logging systems in the log system. This is a technique proposed in the 80 's. This is an important technology that is very central to the entire database. And the "Time Lock Detection Machine", a certain transaction, the trade fair automatically locked, in one respect, the lock will be transferred into the log, you will encounter bottlenecks in many cases, in many benchmarks, such as 4.006 billion of the product, you in the lock and log related to say, you can see " The hand detector "can greatly improve the processing speed, especially for the" time lock "situation is well explored.
Here we talk about some of the technical highlights.
In terms of total cost, we negotiated with a lot of CIOs. CIO in all respects, from the cost model, 70% of the CIO's budget is invested in people, not in hardware. When we talk about TCO, we mainly want to reduce the cost of manpower, because in the operation of DB2 and maintenance will increase the cost, so we need these technologies to get the maximum utilization. So we need a database that can be managed by itself, with automatic balancing of resources and automatic storage. And our TCO should be spent managing the business rather than managing the data, and he needs to have more and better management skills so that it can be optimized automatically.
Adaptive self tuning Memory management. It has its own buffer, which is very important to the CPU, which is the most important aspect of doing optimization. The other sort is also important, it can sort your current I/O, as well as the "lock List", which uses the MB parameter, and then the capacity of the data memory. The logic of some control in this area then makes it possible to quickly adapt to the redistribution of workloads and changes. The buffer is automatically balanced. This allows the entire load of the CPU to be automatically rebalanced, which can save a lot of costs. So you have to be aware of the changes in your workload and do not need to be manually adjusted.
Here are some charts for you to explain, query is an index system, sometimes undo, this time the entire table space will have a great promotion, I/O will be upgraded, so if there is an adaptive tuning system, the system can adjust itself to get higher memory performance.
When we started to guide the self-tuning technology, we started to use it in a smaller system. Our engineers are in one months, after two months of adding more systems, we made the same configuration, we turned it into a benchmark tuning configuration, and in one hours the entire stim not only got better performance, but also improved results than previous data, and reduced overall costs.