Data cleansing Note: String-to-date: Timestamp-caused problems
Background]
During data extraction, the source "time significance" field data is in "timestamp format" and the field type is string type. However, the target end requires that the data enter the date type and be cleaned.
[Solution]
This problem may be tricky at first, but after a simple analysis of the "timestamp format", we will find that there is a method to apply.
For example, the following format:
'14-JUN-15 08.23.35.048000 PM +08:00','DD MON YYYY HH.MI AM'
You can use this method:
select to_date(replace(substr('14-JUN-15 08.23.35.048000 PM +08:00',1,18),'.',':')||substr('14-JUN-15 08.23.35.048000 PM +08:00',26,3),'DD-MON-YY HH:MI:SS AM') from dual;
The result is as follows:
20:23:35
The above processing method is actually very simple. It is to split the timestamp into two parts, and then separate the two parts, as shown below:
Let's take a look at the correct way to use the timestamp, as shown in the following example:
Create table Experiment table (ID varchar2 (32) default sys_guid (), DATE_TIMESTAMP date default systimestamp, memo varchar2 (32); insert into experiment table (memo) values (1 ); insert into experiment table (memo) values (2); insert into experiment table (memo) values (3); commit; select * from experiment table;
You can also use the following method:
Create table Experiment table 2 (ID varchar2 (32) default sys_guid (), DATE_TIMESTAMP date default current_timestamp, memo varchar2 (32); insert into experiment table 2 (memo) values (1); commit; select * from experiment table 2;
Through the above demonstration, we can use current_timestamp or systimestamp to obtain the timestamp. We can see that when inserting a timestamp for a field of the date type, the displayed time format is normal, this is because oracle has implicitly converted the data, but what if it is queried separately? Query in the following way:
select sessiontimezone,current_timestamp from dual;
As you can see, the timestamp is displayed in the format of "14-JUN-15 08.23.3520.48000 PM + 08:00" by default, which means that when you create a table, if "you do not have the date type for this field" or "varchar2 type is used but not forced conversion", the content stored in this field is in this format. When this field is extracted to a table of the date type, a problem occurs.
[Experiment]
Create the source data table (experiment table 3) and target table, as shown below:
Create table Experiment table 3 (ID varchar2 (32) default sys_guid (), DATE_TIMESTAMP varchar2 (50) default current_timestamp, memo varchar2 (50); select * from experiment table 3; insert into experiment table 3 (memo) values (1); insert into experiment table 3 (memo) values (2); insert into experiment table 3 (memo) values (3 ); insert into experiment table 3 (memo) values (4); commit; create table target table (ID varchar2 (32), DATE_TIME date, memo varchar2 (50 ));
Unprocessed extraction operations:
After the "clean" operation is added, the data extraction operation is as follows:
INSERT/* + append */INTO target table nologgingSELECTID ID, to_date (replace (substr (DATE_TIMESTAMP ),'. ',': ') | substr (DATE_TIMESTAMP, 26,3), 'dd-MON-YY HH: MI: SS am') DATE_TIME, memo memofrom experiment table 3; commit; select * from target table;
The data has been cleaned and extracted.
Small knowledge, easy to remember.