This article is purely gossip and basically has no practical value. Oracle always uses SQL _ID to mark a unique SQL statement. SQL _ID corresponds to SQL _TEXT one by one. If the two SQL texts are different, including spaces and other invisible characters, the SQL _ID will be different. The gossip in this article is: How Does Oracle hash a 13-bit data according to the content of SQL _TEXT?
This article is purely gossip and basically has no practical value. Oracle always uses SQL _ID to mark a unique SQL statement. SQL _ID corresponds to SQL _TEXT one by one. If the two SQL texts are different, including spaces and other invisible characters, the SQL _ID will be different. The gossip in this article is: How Does Oracle hash a 13-bit data according to the content of SQL _TEXT?
This article is purely gossip and basically has no practical value. Oracle always uses SQL _ID to mark a unique SQL statement. SQL _ID corresponds to SQL _TEXT one by one. If the two SQL texts are different, including spaces and other invisible characters, the SQL _ID will be different. This article details how Oracle hashes a 13-Bit String Based on SQL _TEXT. Why is this string 13 characters in length? Why does this character often start with a number?
This article introduces the conversion principle in detail by referring to tanel poder and Slavik (1, 2), and provides PHP/Perl implementation code.
0. Overview
Oracle first calculates the md5 Hash Value of SQL _TEXT. It obtains the 64-bit (bits) lower hash value, and obtains 5 bits each time (the last 4 bits ), use Base32 to convert it into visible characters in sequence, that is, the final SQL _ID you see. This is how it works.
However, there are some precautions in the actual conversion process:
(A) When Oracle calculates the md5 hash, an invisible character \ 0 is added to the end of SQL _TEXT. This kind of SQL _TEXT is often used in AWR reports.
(B) Pay attention to the little-endian issue.
(C) The visible character of Base32 transcoding is 0123456789 abcdfghjkmnpqrstuvwxyz.
(D) Pay attention to the precision of large numbers when writing programs. In this article, Perl/PHP programs use mathematical large number processing functions.
1. Detailed process Example 1.1
We consider the following given SQL:
select sysdate from dual;
Run and query v $ SQL in Oracle 10 Gb. The SQL _ID of this SQL statement is
SQL > select sql_id, hash_value from v$sql where sql_text = 'select sysdate from dual';SQL_ID HASH_VALUE------------- ----------h35uxf5uhmm1 2343063137
1.2 detailed process of SQL _ID calculation 1.2.1 Calculation of hash values
Add an empty character \ 0 to the end of SQL _TEXT, and then hash the md5:
use Digest::MD5 qw(md5 md5_hex md5_base64);$stmt = "select sysdate from dual\0";$hash = md5 $stmt;
Select sysdate from dual \ 0 MD5 hash value is abd4dbb3096b15f1ebba0c78614ea88b, a total of 128 bits (obviously 32 bits, how to say 128 bits ?), The 64-bit parameter is "ebba0c78 614ea88b ".
The md5 hash bytecode is as follows (128 bits ):
|10101011|11010100|11011011|10110011||00001001|01101011|00010101|11110001||11101011|10111010|00001100|01111000||01100001|01001110|10101000|10001011|
1.2.2 lower 64-bit integer
The 64-bit lower md5 hash value is:
|11101011|10111010|00001100|01111000||01100001|01001110|10101000|10001011|
It is divided into two parts: High 32-bit and low 32-bit, which are: ebba0c78 614ea88b, and the corresponding binary byte stream is: | 11101011 | 10111010 | 00001100 | 01111000 | and | 01100001 | 01001110 | 10101000 | 10001011 |. You can use the unpack function to unbind hash values. Note that the modulo operation is an integer operation, and the environment here is x86_64 little-endian. Therefore, the corresponding integer byte order during the modulo operation (when a person reads data ):
|01111000|00001100|10111010|11101011||10001011|10101000|01001110|01100001|
Perl code:
my($a,$b,$msb,$lsb) = unpack("V*",$hash);
1.2.3 convert Base32 to visible characters
Oracle uses Base32 to convert byte streams into visible characters.
A Base32 character corresponds to the 5-bit (bits) of the byte stream. The total number here is 64 bits, so it is 64/5, a total of 13 characters. The 12 characters are five characters, and one character is always four characters (the first character of SQL _ID ).
Let's take a look at the byte stream in this case. Every five bits are converted into a Base32 encoding. The last five bits are 00001 (decimal 1) and the corresponding Base32 encoding is 1; take the second to the last five digits 10011 (decimal 19), and take the third to the last five digits as 010011 (decimal 19 )...
The Base32 encoding character used by Oracle is:
abcdfghjkmnpqrstuvwxyz
Encoding and character ing
Code 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 characters 0 1 2 3 4 5 6 7 8 9 a B c d f g ----------------------------------------------------- code 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 characters h j k m n p q r s t u v w x y z
Therefore, the numbers 1, 19, and 19 are 1, m, and m. This is the last three digits corresponding to SQL _ID:
h35uxf5uhmm1
3. code snippet
The code for the 3 P program is as follows:
3.1 Perl
#!/usr/bin/perl -wuse Digest::MD5 qw(md5 md5_hex md5_base64);use Math::BigInt;my $stmt = "select sysdate from dual\0";my $hash = md5 $stmt;my($a,$b,$msb,$lsb) = unpack("V*",$hash);my $sqln = $msb*(2**32)+$lsb;my $stop = log($sqln) / log(32) + 1;my $sqlid = '';my $charbase32 = '0123456789abcdfghjkmnpqrstuvwxyz';my @chars = split '', $charbase32;for($i=0; $i < $stop-1; $i++){ my $x = Math::BigInt->new($sqln); my $seq = $x->bp(32**$i)->bmod(32); $sqlid = $chars[$seq].$sqlid;}print "SQL is:\n $stmt \nSQL_ID is\n $sqlid\n";
3.2 PHP
function stmt_2_sqlid($stmt){ $h = md5($stmt."\0",TRUE); $un = unpack("V*",$h); $msb = $un[3] + 0; if($msb < 0) {$msb = pow(2,32) + $msb;} $lsb = $un[4] + 0; if($lsb < 0) {$lsb = pow(2,32) + $lsb;} $sqln = bcadd(bcmul($msb , bcpow(2,32)) , $lsb); $stop = log($sqln) / log(32) + 1; $sqlid = ''; $alphabet = '0123456789abcdfghjkmnpqrstuvwxyz'; for($i = 0; $i < $stop-1; $i++){ $seq = bcmod((bcp($sqln,bcpow(32,$i),5)),32); $sqlid = $alphabet[$seq].$sqlid; } return $sqlid;}$stmt = 'select sysdate from dual';echo stmt_2_sqlid($stmt);
Python 3.3
Reference: Oracle SQL _id and hash value
4. Last
A slightly interesting fact is that the first digit of SQL _ID is often a number. This is because it is a 64-bit (bits) character, divided by five characters, the last character is always only 4 characters, the range is always 0 to 15, the corresponding character is 0123456789 abcdfg, that is to say, more than 50% SQL _ID begin with a number.
Okay, gossip is over.
Original article address: How Does Oracle generate SQL _ID Based on SQL _TEXT? Thanks to the original author for sharing.