Data cleansing Note: sorts strings to dates (A Date Field is processed in multiple formats)
Background]
When cleaning data, it is found that there are three types of data formats in a certain time field of the source system. It is suspected that this is caused by inconsistent source data formats accepted from three or more systems. This is because the source uses the varchar2 format for this time field, which is caused by the fact that the energy-end system has no specification when receiving data uploaded by different systems. Data under this field needs to be processed and cleaned by category.
[Solution]
We can use the case function to classify different types of data, for example:
Select case when condition 1 THEN processing method 1 WHEN condition 2 THEN processing method 2 ELSE processing method 3END naming from source table;
[Experiment]
Create an experiment table as follows:
Create table Experiment table (ID varchar2 (32) default sys_guid (), DATE_TIME varchar2 (50), MEMO varchar2 (32 ));
Insert experiment data to simulate three time formats:
Insert into experiment table (DATE_TIME, MEMO) values ('2017-08-11 2017: 23.0: 18.0 ', '1'); insert into experiment table (DATE_TIME, MEMO) values ('2017-05-27 2015: 12.0: 24.0 ', '1'); insert into experiment table (DATE_TIME, MEMO) values ('2017 11:00:12 PM ', '2'); insert into experiment table (DATE_TIME, MEMO) values ('2017 10:10:00 AM ', '2'); insert into experiment table (DATE_TIME, MEMO) values ('2017 02 08: 12: 23: 000 PM ', '3'); insert into experiment table (DATE_TIME, MEMO) values ('2017 01 31 09: 00: 00: 000 PM ', '3'); commit; select * from experiment table;
Create the target table as follows:
Create table target table (ID VARCHAR2 (32), RESULT_TIME DATE, LEVEL_NUMBER VARCHAR2 (32 ));
If the data is not processed, the following error is returned:
INSERT/* + append */INTO target table nologging selectid id, case when DATE_TIME LIKE '%-%' THEN TO_DATE (REPLACE (DATE_TIME ,'. 0 ', ''), 'yyyy-MM-DD HH24: MI: ss') WHEN DATE_TIME LIKE' %: % 'then TO_DATE (REPLACE (DATE_TIME, ': 000', ''), 'yyyy mm dd HH: MI: SS am', 'nls _ DATE_LANGUAGE = American') ELSE TO_DATE (DATE_TIME, 'yyyy mm dd HH: MI: ss am ', 'nls _ DATE_LANGUAGE = American') END RESULT_TIME, MEMO LEVEL_NUMBERFROM experiment table; COMMIT; SELECT * FROM target table;
Small knowledge, easy to remember.
Supplement: Date Processing in English format
select to_date('1-JULY-15 22:23:11','DD-MON-YY hh24:mi:ss') FROM DUAL;