Application Scenarios:
this week in the SIT, I help one of the only Test sisters. Data quality verification for some tables, the first step is to compare the table definition of the source and stage tables with the Consistency of data content.
This project uses Oracle as the DW,source is oracle,sqlserver and xls.
Without permission to set up the database link, the tests commonly used methods are compared to the total number of rows, and then if the data set is too large, then sample the pair, export the data to XLS, and then use Beyondcompare for comparison.
If the values are different, you need to find out which rows have different values, at least to find a specific value.
When I tested it, I found that it was hard to do it manually. The following gadgets are written to assist with the test.
Function:
1. support For comparison of data from Oracle,sqlserver,xls .
2. for Oracle and SQL Server, traditional comparison can be used to obtain a ReadOnly, forward Hu DataReader, line by row ratio. Click the Comparedata button.
3. for Oracle vs . Oracle , you can use the Getdiffrange button , the binary lookup method, for comparison.
The test data is as follows:
Execute the following code in the Oracle database to generate the test data.
Declare i integer: =1;beginwhile i<110000 loopinsert into mytest (id,rid,name) VALUES (mysequence.nextval,i, ' DataItem ' | | i); I:=i+1;end loop;end; INSERT INTO Mytest2 SELECT * from mytest the order by ID; Update Mytest2 set rid=100 where id=100000; Commit;select * from MyTest minus select * from Mytest2; Select COUNT (*) as TotalCount, To_char (avg (Ora_hash (id| | Rid | | Name))) as Avghash from (select ID as RN, t.* from MyTest t) select COUNT (*) as TotalCount, To_char (avg (Ora_hash | Rid | | Name))) as Avghash from (select ID as RN, t.* from Mytest2 t) update mytest2 set name= ' Evantest ' where id=1000;commit;
Range: You can set your own, if it is 1 , then directly to the first different record, if it is greater than 1 is an interval value.
means that different rows of data are between them.
in order to support both Oracle and SQL Server, I developed using the interface idbconnection and IDataReader in the library to support Oracle and SQL Server. read XLS is not implemented for the moment .
SQL Requirements:
Binary search requires a sample order, so it needs to be written
Select COUNT (*) as TotalCount, To_char (avg (Ora_hash (id| | Rid | | Name)) as Avghash from (select ID as RN, t.* from MyTest t)
style, the first is the total number of rows, the second requirement is that there needs to be a fixed column called RN.
I use the Ora_hash function to hash the values of each row and then calculate the average to see if the contents of the table range are the same, which is the function provided by Oracle.
Similar to the function checksum, but the computation speed is much slower than this, especially when the table data volume is large.
The Hashbyte function is similar in SQL Server. Note If you do not use the To_char function, the calculated value will overflow, and if placed in. NET to undertake, will report the OCI data overflow error, so To_char is also necessary.
you are welcome to share some experience in BI testing.
Software: files.cnblogs.com/huaxiaoyao/datacompare.rar
BI test Tool cross-database data comparison, support Oracle,sqlserver