http://blog.csdn.net/hzhsan/article/details/9384885
1. Preface
When it comes to databases, you will definitely think of SQL Server, Oracle and other relational databases. In fact, there are many kinds of databases, and there are many kinds of databases in the history of computer development.
Early, relational database and hierarchical database, network database, but the relational database relies on its simple description, easy to achieve the characteristics of the competition has won, in the early 90 's, from the FoxPro, access to Oracle, Informix, SyBase, SQL Server, relational database eminence.
However, in a particular application domain, the relational database does not perform perfectly, resulting in a new database type: A document-based database (such as notes) used in a collaborative office, an embedded database (such as SQLite) used in the field of embedded applications, Real-time databases (such as PI) used in the field of industrial monitoring, and so on.
This article provides a simple introduction to the real-time database.
2. In the field of industrial monitoring, the characteristics of database application
The definition of industrial monitoring system is very large, all need to operate equipment automation monitoring, control system can be defined as industrial monitoring system, which includes thermal power plant level monitoring System (SIS), in such applications, database applications have the following characteristics:
- Large number of measuring points
A new 300WM Thermal power plant sis system, the number of points to be processed more than 10,000 points, these measurement points of the change cycle is usually within 1 seconds, that is, more than 10,000 points need to be stored in the database in 1 seconds.
The core of real-time database is to deal with a lot of real-time information, because the carries data will occupy a lot of hard disk space. For example, the 10,000-point system, which is stored every 1 seconds, consumes 8 bytes at a time, then the amount of data saved for 10 years will be 10000*8*10*365*86400=25228800000000 bytes, which is 23TGB. If you use a 80GB HDD storage, you need to store 293 hard drives!
The value of each measurement point that needs to be processed is related to the time, and the data after one second may be different from the data that precedes the second, so it is necessary to record the corresponding time in a certain way while saving the measured point value.
3. Reasons for not selecting a relational database
relational database, it is more difficult to deal with the industrial control field data. The main reasons are:
The general relational database is transaction-based, and the job is rolled back after the processing has failed. So at least two places, the mechanism to slow down, both using today's Intel P4 dual Xeon Class CPU, 2000 points in the insertion of the use of more than 10s of time. The industrial monitoring system is faced with tens of thousands of-point real-time data at the second-level interval of data storage.
Commercial database to ensure integrity, all content is often placed in a file, which is difficult to maintain the storage of large amounts of data. If you have a 200G database, a full backup may take one day. An error in the backup file could invalidate the 200G backup file, so it is not practical.
- Cannot meet the needs of real-time applications
Using database to store real-time data and query methods can not meet the needs of real-time applications. A simple example is the assumption that the data is stored at 1-second intervals, with 604800 groups of data in a week, but it is now difficult to implement the standard SQL syntax by taking 10080 of them at 60-second intervals or taking out 8888 groups with equal-interval principle.
Therefore, in the field of industrial monitoring and the SIS application of power plant, it is necessary to find real-time database system suitable for real-time application.
4. Introduction of compression algorithm for real-time database
The core technology of real-time database system is data compression. The data needs to be compressed and then saved to the hard disk, and the data on the hard drive is decompressed when data is needed. At present, the compression algorithms used in real-time databases at home and abroad are usually divided into two types: lossless compression and lossy compression.
4.1 Lossless compression
Most of the information expression has a certain degree of redundancy, by using a certain model and coding method, can reduce this redundancy. Huffman coding is one of the most famous algorithms in lossless compression. Software like WinRAR and WinZip uses a compression method similar to Huffman encoding. The common feature of these compression methods is that the information does not change during compression and decompression.
In the real-time database, you can also use these lossless compression technology, but in the implementation, it is necessary to consider the efficiency of compression and decompression, if a compression algorithm is very high compression ratio, but its decompression speed is very slow, it is certainly not used in real-time database, otherwise, people in the query data, will wait to lose patience.
4.2 lossy compression
In contrast to lossless compression, compression is certain to lose some information, but it is important to ensure that the lost information does not affect the accuracy of the system data. We have also encountered in other areas of lossy compression applications, such as: JPG image compression is a lossy compression, MP3 sound compression is also a lossy compression.
In the real-time database, there are two main methods of lossy compression: dead-Zone compression and trend compression.
The so-called dead zone is defined as the constant range of values for a given measurement point. The use of dead-zone compression is to record data values outside the dead zone of the point. For example, there is a test point A, the dead zone is defined as 1%, the last recorded measured point value is 110.00, then the measured point value of the acquisition is 111.00, then the difference (111-110)/110<1%, then the value of the measured point in the dead zone of the point, it is considered not to change, that is not recorded. If the next measured point value is 120.00, then the difference (120-110)/110>1%, then the value of the measured point is considered to be outside the dead range of the point, it is considered to be changed, recorded.
Trend compression, which is compressed according to the periodic trend of the measured points, only records the starting and ending points that meet the trend conditions in principle. The rotary gate compression technique of Pi is a model of this kind of algorithm.
The general trend compression is as shown, T1 to T2 time the value of a measured point hold the trend, then the trend of the two tolerance line will be the next point in the dead zone range, if the next moment T3 between the two two tolerance lines, then do not record this value, two tolerance line will apply to the next moment, if the next moment T4 This value is recorded in addition to the two two tolerance lines, the trend changes, two tolerance lines change, and the next point of measurement is judged by the changed tolerance line.
5. How to access the real-time database
- Using the APIs provided by the live database
This is the most efficient and simple way to do this.
Most real-time databases provide a standard ODBC interface, as well as a SQL query language that enables users to use a real-time database as a standard database. But this approach is slow and does not reflect the full benefits of real-time data.
- Using the OPC method (OLE for Process Control)
Because too many databases and DCs use their own API way to access data, the algorithm can not be common, because industrial monitoring filed a standard access interface, which is OPC, now has more than 200 manufacturers to join the OPC organization, the mammoth.
6. Real-time database products to choose from
At present, enter the domestic market more mature real-time database products are listed as follows:
? American OSIsoft Company, PI;
? American Wonderware Company, Industrialsql Server, abbreviated as INSQL;
? American Ge,intellution Company, Ihistorian;
? American InStep Company, EDNA;
? Honeywell, United States, Process history Database, referred to as PhD;
? American AspenTech Company, Infoplus;
In general, foreign real-time database products in the industrial monitoring industry occupies a monopoly position. OSIsoft Company's PI using the rotary door compression patented technology and unique two times filtration technology, so that the data into the PI database through the most effective compression, greatly saving the hard disk space, is the most efficient, the simplest use, the most widely used real-time database. Because of its outstanding performance, PI has increased its price several times.
7. Summary
Real-time database plays a very important role in SIS system, and it is the base of storage data of SIS system. Real-time database is a special type of database system, but there are many differences with relational database. Only by understanding these similarities and differences can you have a deeper understanding of the real-time database.
--------------------------------------------------------------------------------------------------------------- -------------
Research status and development.
Currently in the domestic comparison of the current foreign real-time database products have Wonderware company Industrial Sql,osisoft company PI, AspenTech Company's IP21 (
Infoplus. 21). For example, 5,000-point database and 20-client application, several database prices are:
PI----$ 100,000, $6000 per interface, OPC interface charges;
infoplus.21----$ 110,000, 10,000 USD per interface, OPC interface no charge;
Industrial SQL Server----$ 65,000, each IDAS1200 USD, OPC link free.
Real-time data database products
Pi stores all the data in a proprietary database, and for the stored mode of process data, only 6 domains per point can be defined by a user. Pi is a pure real-time database, if you want to implement more advanced enterprise application also need to be equipped with relational database.
Pi using the rotary door compression patented technology and unique two times filter technology, so that the data into the PI database has been the most effective compression, greatly saving the hard disk space. According to the calculation, 10,000 points per second of data storage for one year, only 4G of space, that is, a normal hard disk can also store 5-10 years of data. Is the most efficient, the simplest and most widely used real-time database, because of its outstanding performance, PI has increased its price several times, and PI in its documentation exposes its various algorithms, such as the above mentioned rotary door compression and two times filter.
The OSI company did not provide the PI server software and the client software in the Chinese version, but it is said that in the development, you can see a Chinese version of Datalink. Some of the domestic companies are also doing the authorization of the Chinese work for customers.
System Performance
Each PI server can hold 1.5 million points (high-end servers) and can process 1.5 million points of data per second in the high-end product server of pi. On the client software Processbook, 1000 points can be taken from 2-year or 3-year history in a second time period.
Refer to the data provided by the OSI Company: The data value of the real-time information point is 1 days with an average change of 500, each information point occupies 3 bytes of storage space, that 10,000 points of storage for 1 years online is:
Number of changes × single point of storage x information points x1 years = Total space requirements
500x3x10000x365 = 5.475GB
Of course, hard drives are cheap, but saving space is not our starting point, the key is to quickly replay the data.
Device Interface Support
OSI provides more than 600 DCs interface software in the world, and can support OPC data acquisition, making PI database a bridge between enterprise production control system and management information system.
Client Tools
Pi-processbook: Organizes and displays process information, provides flowcharts, real-time and historical trends, and includes a VBA programming environment.
Pi-datalink:excel the embedded macro. Provides a very flexible process data analysis tool for end users. You can provide the necessary mathematical analysis using VBA, which provides the most general analysis tools, including: average, moving average, geometric average, standard deviation, cumulative value, and so on.
Pi-batch: Production batch process.
PI-ML: Manual input terminal (available in either software or hardware form).
Pi-ace: Allows the user to write equations. These equations can be reused without effort when capturing similar datasets.
PI ICE: The most innovative Web application for enterprise-class real-time performance management.
Pi ActiveView: Provides an existing PI Processbook display seamlessly to the Web.
Pi Alarm View: Summarize PI alarm Server information to display those data to any onsite customer or off-site customer in a hierarchical tree structure.
Pi Batch view: Displays PI batch data on a Windows desktop computer.
PI System Management tool: A set of applications that manage the PI system through the customer's PC.
PI Module Database: Divides the contents of the PI system into useful groups, helps organize real-time data, stores parameters or specifications, and makes them available for use in programs and displays at any time.
Rlink: Production data (PI) with SAP R/3, JD Edwards OneWorld, MRO software Maximo and Indus International passport/ A two-way gateway between ERP or EAM (Enterprise asset Management) systems such as EMPAC.
Two-time development tools
PI provides api,sdk, pi ODBC & PI OLE DB Two development tools for reading data from a database or writing data to a real-time database. In addition, the PI client procebook embedded in the VBA, convenient for the expansion of Procebook functions and customized.
PI API (Application programming Interface, Application development Interface)
Pi-api is a set of C-language functions that provide a common programming interface for accessing the PI system, and the application can access or manipulate the PI system by invoking the PI-API function. Many pi client software, including Pi Processbook, pi
DataLink, pi Profileview, etc., are through the Pi-api and PI server communication. The user can be called on the Unix,windows platform.
PI SDK (software Development Kit, software Development Kit)
The Pi SDK is another set of programming tools provided by PI to access the PI server and related subsystems. It uses Microsoft ActiveX technology, which contains an ActiveX in-process component, an ActiveX control, and other support code libraries, such as
online documentation, sample code, support documentation, and more. ActiveX is based on COM component technology and has developed language-agnostic features under the Microsoft Windows platform. Users can be in the development of simple. NET uses the Pi SDK to access the PI server,
It can also be used in efficient C + + and the Pi SDK is used in all other development tools that support COM technology.
Pi ODBC & PI OLE DB
ODBC is an open industry standard in the field of databases. OLE DB is a Microsoft COM-based database access interface standard. Using PI ODBC and PI OLE DB, users can access the PI system as if they were accessing a relational database. PI based ODBC
or PI OLE DB data interface programs and client applications that use standard SQL statements to access the PI database, including pi Archive, Pi Point database, or other configuration data.
Domestic users
OSI locates its database in the Enterprise Informatization integration platform, focusing on the development and service of real-time data software, mainly used in large-scale process industry users, such as power, petroleum, chemical and metallurgy. At present, China's use of Pi software enterprises to achieve a
Household, mainly concentrated in electric power, petroleum, chemical industry, domestic metallurgical industry users have Tangshan Iron and steel Company factory, Taiyuan Iron and Steel Group 2nd steelmaking plant, Shijiazhuang iron and Steel Company.
--------------------------------------------------------------------------------------------------------------- -------------
Introduction to the real-time database (first draft)
Http://blog.okbase.net/bruceteen/archive/85.html
Relational databases are widely used and are familiar to most people, so much so that when it comes to databases, by default it refers to relational databases, but in fact there are other kinds of databases that are widely used in production life, such as the real-time databases I will talk about, which are used in production control with very strict requirements and very large data volumes.
There are only three products in the current international and domestic use of real-time databases:
A. PI (Plant information System) of the United States OSI Company
B. PHD (Process history Database) of Honeywell, USA
C. IP21 of the American AspenTech Company (Infoplus. 21)
The prices of these real-time databases are very expensive, in millions of renminbi, but they are not all sold by a set of points (data points that can be accommodated), so they cannot be digitized to compare their prices. Because of the working relationship, I was fortunate to have access to these three types of databases, where they were compared.
1. PI
The use of rotary door compression patented technology and unique two times filtration technology, so that the data into the PI database has been the most effective compression, greatly saving the hard disk space. According to the calculation, 10,000 points per second of data storage for one year, only 4G of space, that is, a normal hard disk can also store 5-10 years of data. Is the most efficient, the simplest, the most widely used real-time database, because of its outstanding performance, PI has increased its price several times, does not fall under the OSI name, and PI in its documentation exposes her various algorithms, such as the above mentioned rotary door compression and two times filter.
2. PHD
Honeywell has a large share of DCs, so the PhD is widely used, and PhD uses Oracle relational database internally, so purchasing a PhD must first purchase Oracle. Because PhD's internal use of Oracle simplifies development and Oracle's performance limitations are severe, the PhD price is the lowest in the three databases, not an authentic real-time database. But don't assume that Oracle is strong in the PhD's internal use of Oracle, and that if you use Oracle directly, the huge amount of data can crash when you're using it for two or three seconds. Honeywell's blog is not in the real-time database this piece, but her DCs.
3. IP21
IP21 has largely not entered the Chinese market, and it is opening the Chinese market through a pre-gifting approach.
Before I evaluate IP21, I need to state that "my view of IP21 is a personal view, not a childcare for any product".
IP21 is the worst relational database I've ever seen, and one of the worst software I've ever seen,
A. The operation of its software installation program requires a hard dog, this stingy approach and pi public algorithm is not comparable to the practice, the problem is that its dog will often die.
B. The installation of its software is not guaranteed to be successful even by its professional staff, and 10 computers allow its professional staff to install about one or two units.
C. There is only one installation disk for the software, but the disk needs to be installed for more than four hours, and it is not always possible to see a version of the Java interpreter installed and then deleted.
D. Without real automatic installation, their engineers need to modify a lot of files on the computer before installing.
E. Mid-installation if an error is not reported immediately, it takes four hours after installation to see the installation failed, but only to know that the installation failed, do not know at which step installation failed.
F. Management and maintenance software is very complex, unless someone is willing to sacrifice the future to learn it, otherwise it can only let its own staff to play anything.
G. Operating efficiency is very low, and the use of system resources is very serious, a server can only be used for one IP21.
How real-time databases are accessed
A. Using your own API is the most efficient way to do this, but it's also the simplest.
B. With ODBC, this is not really a good way, because the real-time database is different from the relational database, ODBC is not very useful, so there are a lot of limitations when using ODBC, most of the features do not support the ODBC approach.
C. Using the OPC method (OLE for Process Control)
Because too many databases and DCs use their own API way to access data, the algorithm can not be common, because the proposed standard access interface, which is OPC, now more than 200 manufacturers to join the OPC organization, Mammoth, including the infamous m$, the reason why m$ notorious because m$ Mandatory in this standard access interface using com/dcom, so that OPC can only be used under Windows, and efficiency (because it is a industrial situation, so efficiency is very important) low. M$ in the OPC organization is very active, so now the OPC is basically out of the original goal, so that many manufacturers dissatisfaction, including the OSI, although the OSI Pi provides OPC interface, but the OSI does not recommend that customers use it, and do not provide technical support. COM in the OPC has another big problem, because COM rules must support the previously developed interface, and the industrial requirements are very strict, the cost and time of development testing is very high, no vendor is willing to support the previous COM interface, so there is no real COM-compliant OPC.
Turn from: http://www.cnblogs.com/liushui-sky/articles/5209560.html
Introduction to Real-time databases