Method for Extracting and cleansing varchar2 to number data (from traditional to simplified)
Background]
When extracting the "contact number" field for data extraction, it is found that some Chinese and English characters exist. You need to clear this field.
[Cause of spam Data]
If a field such as "contact number" is set to number at the beginning of the design, rather than varchar2, A constraint is imposed on the front-end software, I believe there will be no such interference in English or Chinese, and no junk data will appear. Now, I have to find a solution.
[Solution process]
I started to think about the complexity. I wanted to use a function for judgment and then use case for processing. But then I got the DBA prompt from other companies and used a regular expression to solve it!
The following is a simulated experiment:
1. Create an experiment table (source)
Createtable test_num
(
Telvarchar2 (32)
);
2. Create some experimental data (source)
3. Create an experiment target table (target end)
Createtable test_num_new
(
Telnumber
);
4. insert data from the source end to the target end
5. Create a judgment Function
Create or replace function isnum (v_in varchar2)
Return varchar is
Val_err exception;
Pragma exception_init (val_err,-6502 );
Scrub_num number;
Begin
Scrub_num: = to_number (v_in );
Return 'y ';
Exception when val_err then
Return 'n ';
End;
Function purpose: Pass the value to the isnum function. If number is used, return Y. If not number, return N.
6. Use case and functions for cleaning
Insert into test_num_new select case isnum (TEL) when 'y' then TEL
When 'n' then''
End from test_num;
Commit;
7. view the target table after cleaning
The cleaning of junk data is completed. But is this complicated?
Use a regular expression.
8. Clear Target Data
Truncate table test_num_new;
9. insert data to the target table using regular expressions
Insert into test_num_new select REGEXP_REPLACE (tel, '\ d', '') from test_num;
Commit;
10. view the target table
We can see that the regular expression can be used to remove Chinese and English interference from the processed data and only save the number to achieve the expected effect. Compared with the previous case and function implementation, simplified a lot.
The experiment is complete.
[Summary]
Use regular expressions to retain only the number to remove Chinese and English interference.
Syntax: REGEXP_REPLACE (field name or string, '\ d ','')
Meaning: remove non-number characters in the field
\ D: indicates a non-number character.