Data Transformation conflict and processing
Data Transformation conflict:
In the process of data conversion, it is very difficult to achieve strict equivalence conversion. You must determine the various grammatical and semantic conflicts that exist in the two models, which may include:
(1) Naming conflict: The identifier of the source data source may be a reserved word in the destination data source.
(2) format conflict: The same data type may have different presentation methods and semantic differences.
(3) Structural conflicts: If the data definition models between the two DBMS are different, such as the relational model and the hierarchy model, you will need to redefine the entity attributes and contacts to prevent the loss of property or contact information.
(4) Type conflict: There is a discrepancy between the precision of the same data type in different databases.
(5) Other conflicts: the large object types of different databases have different constraints, and there are some special types. An error occurs when a field with more than one text or image is in a table in SQL Server. Oracle also does not allow blobs and long types in one table to be more than one.
Conflict handling methods:
For conflicts in the above data transformations, the corresponding conflicts can be handled.
For naming conflicts, you can first examine the reserved words in the data source, establish a set of reserved words, and rename the names in the reserved words as needed.
For format conflicts, special handling of specific types can be performed based on the data type of the corresponding data source from the driver of the data source, depending on the ODBC SQL type. For character data that contains "'" characters, it is necessary to use the escape character for special processing in the data conversion process, otherwise it will be mistaken for a string separator.
For precision conflicts of the same data type for different databases, type conversions combine the ODBC SQL type and precision to determine the mapping relationship between the source data type and the target data type. Identify the data type in the destination data source that most closely matches the precision of the source data source type as the default mapping relationship.
Data types are matched in the conversion process, and date data is best converted to character type and processed separately according to different target data sources. If you use the To_date function in Oracle, and FoxPro use the Ctod function to convert a string of date format to a date.
For the text, image type in SQL Server, you need to make a selection when converting, or mirror text to VARCHAR2 (4000), or a long type, but only one in a long table. The Oracle,text type can be imaged as a CLOB type, and image can be mirrored as a blob. The CLOB type can have more than one column in an Oracle table.
Reading metadata for a data source
Metadata type of the data source
Meta-data types metadata information purpose
Data source connection Information database name, drive, connection source data source and destination data source
Server, DSN name,
Data source description, user name, etc.
Table information table name, table owner, for creating tables in Data transformation
Table patterns, table types
Column information column name, type, width, for table creation and column mapping in Data transformation
accuracy, scale, or null
Type information type name, maximum column width, for table creation and type mapping in Data transformation
Maximum minimum scale,
Prefix characters,
Whether to accept empty,
Keyword list
Key information primary key name, primary key column, for conversion of table patterns in Data transformation
Foreign key name, foreign key column,
Foreign Key Association columns
Other object information: index information, stored procedure information, permission information, etc. for conversion of database objects in Data transformation.
metadata read methods for data sources:
(1) Calling ODBC API functions:
You can call the ODBC API function directly to read the metadata for the data source. That is, the following functions are called separately SQL Tables,sql Columns, SQL Describecol,sql get Typeinfo,sql Foreign keys,sqlprimary Keys,sql, SQL procedures, SQL Statistics,sql tableprivileges,sql column privileges obtains table information from the data source, column information, type information, key information, and other object information. The primary step is to connect the data source, assign the statement handle, and then invoke the ODBC API to obtain various metadata information.
However, the direct invocation of ODBC API functions is complex, various parameters are difficult to understand, and direct access to the returned data is more difficult. VC + + MFC class Library to the ODBC API encapsulation, part of the Simplification of ODBC calls (especially for the database recordset operations), but simply using MFC classes to obtain heterogeneous database structure information is still difficult, so need to MFC and direct call ODBC API method combined. Use the ODBC interface function to overload some member functions of the CRecordset class in MFC, and create classes such as Ctable,ccolumns,ctypes,cprimarykeys. With these newly created classes, it is convenient to obtain heterogeneous database structure information.
(2) through ADO object:
The table information of the data source can be obtained by the Gettablenames method of the Connection object in ADO, and the various column information and type information of the dataset can be obtained by the Fielddefs attribute in the Recordset object in ADO. Before you can get metadata for a data source, you must first create a connection object to connect to the data source, open the corresponding datasheet through the DataSet object Recordset, and then get the metadata for the corresponding data source.
Data type Conversions
3 Heterogeneous data source data Types Overview:
For different database systems, each DBMS defines a set of its own data types, however, no matter how the data type changes in each system, its function satisfies the user's basic requirements of data processing, such as numerical type, including integer, real, floating point, double precision, character type, including fixed length, variable length, etc. date type, including year, month , days and hours, minutes, seconds, etc., long character type, including text type, as well as coin type. With the continuous development of database systems and the upgrading of the version, the types of data are increasing, such as hypertext and binary processing of multimedia and large text data types. These common things, to the data conversion between the system is possible and convenient, but different data types of the database is also a difference. The difference between its definition and expansion also brings many difficulties to the data transformation between systems. For example, the date and time data format returned by the DBMS varies significantly from one DBMS to another. Some systems return dates and times in 8-byte integer format, while others return in floating-point format. And some DBMS contains a long type, and no other DBMS has this type. Therefore, the key to the conversion of heterogeneous database data types is to find the corresponding relationship.
data type conversion method one: (Design type mapping table)
In order to realize mutual data conversion, we must design several bidirectional data conversion programs and solve different data type matching problems. When adding a database system, it is necessary to solve the data type matching problem between the database system and the existing multiple heterogeneous libraries, and to add multiple corresponding conversion programs. To implement program extensibility, you can solve type conversion problems by designing type mapping tables.
The corresponding relationship of different database system data types and the corresponding Data conversion processing program are separated, the data conversion program is relatively independent, and the type conversion relation is stored in the special table structure. Through detailed and in-depth analysis of data types between different database systems, the default type correspondence and possible correspondence between different versions of different database systems are found, and these data are stored in the Type mapping table in advance.
data type conversion method two: (Use ODBC SQL type)
Data stored in the data source has a data type called a data source data type or SQL data type. SQL data types are defined by each DBMS according to the SQL-92 standard and can be specific to a data source. The driver also defines a set of data types in ODBC SQL syntax and driver data types called ODBC SQL data types (data types that begin with the SQL prefix). Each driver is responsible for mapping the SQL data type of a particular data source to an ODBC SQL data type identifier. Therefore, different data sources can use the ODBC SQL data type identifier as a benchmark to get the default mapping relationship of data types during data conversion. The driver returns a mapping relationship between the SQL data type of the data source and the ODBC SQL data type through function Sqlgettypelnfo, in functions Sqlco1attributes,sqldescribecol and Sqldescribeparm, The driver also uses an ODBC SQL data type to describe the data type of the columns and parameters.
In addition, ODBC provides a set of ODBC C data types that begin with the Sql_c prefix. The ODBC C data type indicates the data type of the C buffer that is used in the application to store data. All drivers must support all C data types, and all C types to the corresponding SQL type conversion, and all drivers support at least character SQL type, so that the DBMS data type can be mapped to a C language data type, so that the transfer process will not change the data. Each SQL data type is compatible with an ODBC C data type. Before returning data from a data source, the driver converts it to the specified C data type. Before sending data to the data source, the driver converts it from the specified C data type to the SQL data type.
processing of large objects in the process of data conversion
Large Object Types Overview:
Large object type blobs are all called binary Large Objects, i.e. binary large objects. BLOBs can be distinguished into three different forms: audio-visual data, binary data, and large text data. Therefore, the most common application is to store graphics, sound and other objects, in addition to the sophomore object, OLE object can also be stored in the database by BLOB type, if the text object is too large, beyond the length of the text type, you must use a BLOB field for storage. Blob fields are not directly supported in a frequently used programming environment, so we need to call the corresponding function to complete the use of the BLOB.
Different database systems support large object types, and large object data types supported by common database systems are shown in table 4:
Large object data types supported by the database system:
SQL SERVER "sql_variant", "ntext", "image", "varbinary", "binary", "text"
ORACLE "BLOB", "Long Raw", "BFILE", "raw", "CLOB", "Long"
SYBASE "LONG VARchar"
VFP "MEMO"
ACCESS "OLE OBJECT", "MEMO"
Kingbase "Blob", "text", "Bytea", "varbinary", "binary", "text"
Access Methods for large objects:
(1) using MFC to provide CLongBinary classes:
VC access to large object data have a number of methods, such as OLE, ActiveX, etc., and VC MFC provides the CLongBinary class can easily implement access to BLOB fields. Use the CLongBinary class to access more than maxint data, the maximum amount of memory available. But the data is completely stored in memory, too much for oversized data.
(2) using the SQLGetData and SQLPutData functions of ODBC:
For data that cannot be stored in a single buffer, after other data in the row has been fetched, you can retrieve the data directly from the driver using SQLGetData batches. To retrieve long data from one column, the application first calls SQLFetchScroll or SQLFetch to move a row and calls SQLGetData to get the data for the bound column. SQLPutData allows parameters or fields to be sent to the driver when an application statement is executed. This function is used to send a character or binary value to.