There are many friends, especially beginners, who do not know much about the concepts of heterogeneous databases, heterogeneous data sources, and distributed databases. Sometimes they mix the three concepts, so the author has learned some materials, and sort them out. Hope to help friends.
Heterogeneous Databases:HDB Heterogeneous Database
The heterogeneous database system isCollection of related multiple Database Systems, Can be implementedData sharing and transparent accessEach database system exists before it is added to a heterogeneous database system and has its own dmbs. Each component of a heterogeneous database has its own autonomy. when data is shared, each database system still has its own application features, Integrity Control, and security control. Heterogeneous Database SystemHeterogeneousIt is mainly reflected in the following aspects:
- Heterogeneous computer architectureEach participating database can run on a mainframe, minicomputer, workstation, PC, or embedded system.
- Heterogeneous Basic Operating SystemsThe basic operating systems of each database system can be UNIX, Windows NT, and Linux.
- Heterogeneous dmbsYesRelational Database SystemOracle, SQL Server, orDatabases with different data modelsSuch as relationship, mode, hierarchy, network, and object-oriented, function-based databases form a heterogeneous database system.
---- Heterogeneous Database SystemTargetIt is used to merge and share data information resources, hardware equipment resources, and human resources between different databases. WhereKey PointsIt is to create a global data mode or a global external view based on the local database mode. This global model is particularly important for the establishment of an advanced decision support system.
---- Large organizations have branches in many locations, and each sub-institution's database has its own information data, decision makers generally only care about the macro information described in the global model. The description of the global model of heterogeneous databases based on the data warehouse technology is a good solution.Data WarehouseYou can collect information from multiple databases in a heterogeneous database system and establish a unified global model. The collected data also supports access to historical data, you can use the unified data interface provided by the data warehouse to query decision-making support.
For heterogeneous database systems, data sharing should be achieved through two aspects: first, database conversion; second, transparent data access. Developed by Huazhong University of Science and Technology, the commercialized Database Management System dm3 with its own copyright is implemented through the database conversion tools and API interfaces provided.
---- Dm3 provides a database Conversion Tool to convert a model defined in a database system into a model in another database, and then load the data as needed, at this time, users can use their familiar database systems and familiar query languages to achieve data sharing. Database conversion tools first perform type conversion, access the source database system, convert the data definition model of the source database to the data definition model of the target database, and then reorganize the data, load the data in the source database system into the target database.
---- In the conversion process, it is difficult to implement strict equivalent conversion. First, determine the syntax and semantic conflicts in the two models. These conflicts may include:
1. Name Conflict: The identifier in the source model may be a reserved word in the target model, and you need to rename it.
2. Format conflict: the same data type may have different Representation Methods and semantics. In this case, the transformation functions between the two models need to be defined.
3. structure conflict: if the data definition models of the two database systems are different, for example, relational models and hierarchical models, you need to redefine the object attributes and relationships, to prevent the loss of attributes or contact information.
---- In short, after data conversion, on the one hand, all the information to be shared in the source database mode is converted to the target database, on the other hand, such conversion cannot contain redundant association information.
----The database conversion tool can realize data model conversion between different database systems. The problem that needs to be further studied is: if the database conversion performs data definition mode conversion and data conversion at the same time, the same data set may have multiple copies in a heterogeneous database system. Therefore, a new access control mechanism needs to be introduced. Global access control, concurrency mechanism, and security control are provided for heterogeneous database systems on the basis of ensuring database autonomy and maintaining its integrity and security.
---- If database conversion only performs data definition conversion and does not generate data copies, access data under the new target database definition model framework, the implementation is still the access to data in the source database system. In this case, transactions implemented using the data processing language in the new database system cannot be directly accessed to the source database. transactions must be translated at the transaction level before they can be executed.
---- achieves transparent data access in heterogeneous data systems, allowing users to regard Heterogeneous Distributed Database Systems as common distributed database systems, accessing a database using a familiar data processing language is the same as accessing a database system. However, there is currently no widely used data definition model or data query language. To achieve transparent data access, multiple-to-one conversion, two-way middleware and other technologies can be used. Open Database Connectivity (ODBC) the Program API is a standard application used to access data in related or unrelated database management systems ). ODBC provides applications with a set of high-level calling interface specifications and a runtime support environment based on dynamic link libraries. Currently, common front-end tools for database application development, such as power builder and Delphi, are connected to various database systems through open database Interconnection (ODBC) interfaces. Most database management systems (such as Oracle, Sybase, and SQL Server) provide ODBC drivers to make the database system very open. The biggest advantage of the ODBC interface is its interoperability capability. Ideally, each driver and data source should support identical ODBC function calls and SQL statements, enables ODBC applications to operate on all database systems. However, in fact, different databases have different levels of support for SQL syntax. Therefore, the ODBC Specification defines the driver consistency level, odbc api consistency determines the types of ODBC functions that can be called by applications. ODBC 2.0 defines three levels of functions, currently, the dm3 odbc api supports all the second-level extended functions in the ODBC 2.0 specification.
---- With the increasing popularity of Internet applications, the heterogeneous Distributed Information System of the Internet is developing rapidly. Java is platform-independent and portable, high Security, good stability, distributed, object-oriented, and other advantages have become the preferred language for Internet application development. To implement database applications based on heterogeneous system platforms in the Internet environment, a unified programming interface independent of a specific database management system and a general SQL-based database access method must be provided. Java and database interface specification JDBC (Java database connectivity) is a common application programming interface that supports basic SQL functions, it provides a unified user interface at different functional modules, providing a new solution for Direct Web Access to heterogeneous databases. JDBC has been supported by more and more database vendors, connection providers, Internet service providers, and application developers.
Heterogeneous Data Sources:Disparate data source
Understanding of heterogeneous data sources:
As the name suggests, it refers to the data between different database management systems.
Heterogeneous Data Source instance:
For example, in the process of enterprise informatization construction, due to the phase, technical, and other economic and human factors of the construction and implementation of data management systems of various business systems, as a result, enterprises have accumulated a large amount of business data using different storage methods in the development process, including the data management systems used are also quite different, from simple file databases to complex network databases, they constitute Heterogeneous Data Sources of enterprises.
Enterprise data sources are heterogeneous in two aspects:
1. System heterogeneity, that is, the business application systems, database management systems, and even operating systems on which the data source depends constitute the system heterogeneity.
2. Heterogeneous modes, that is, different data sources are stored in different modes. The storage mode mainly includes the relational mode, object mode, object relationship mode, and document nesting mode. The relational mode (Relational Database) is the mainstream storage mode. At the same time, even in the same storage mode, their schema structure may be different. For example, the data types of different relational data management systems are not completely consistent, such as DB2, Oracle, Sybase, Informix, SQL Server, and FoxPro.
3. heterogeneous sources, that is, the heterogeneity between internal and external data sources of an enterprise.
Distributed Database: Distributed Database(DDB)
Understanding of distributed databases:
Distributed DatabaseManagement (DDBMSIs the product of the combination of network technology and database technology.
A distributed database system is composed of several sites. These sites are also called nodes. They are connected together in the communication network. Each node is an independent database system. They all have their own databases, central processors, and terminals, and their local database management systems. Therefore, the distributed database system can be seen as a combination of a series of centralized database systems. They belong to the same system logically,The physical structure is distributed..
Features of distributed databases:
1. In a distributed database system, the concept of centralized control is not emphasized. It has a hierarchical control structure based on the global database administrator, but each local database administrator has a high degree of autonomy.
2. The concept of Data independence is equally important in distributed database systems. However, a new concept is added, that is, distributed transparency. The so-called distributed transparency means that when writing a program, it seems that the data is not distributed, so transferring the data will not affect the correctness of the program. However, the execution speed of the program is reduced.
3. Different centralized database systems, data redundancy is seen as a required feature in distributed systems. The reason is: first, if you copy data on the desired node, to improve local applicability. When a node fails, you can copy data from other nodes. Therefore, this increases the effectiveness of the system. Of course, the evaluation of optimal redundancy in Distributed Systems is complicated.
The types of distributed systems can be roughly classified into three types:
1. distributed data, but there is only one total database and there is no local database.
2. Hierarchical processing. Each layer has its own database.
3. A fully-dispersed distributed network without central control. There are multiple connection methods between nodes, such as loose connection, close connection, and dynamic connection, broadcast and notification connections.