Introduction
This document describes the Java Implementation of A DB2 User-Defined Function UDF. This function has two input parameters: a utc timestamp, for example, 2004-04-04-04.00.00.000000) and a region name such as "America/Guayaquil"), and return the corresponding timestamp in the new region. Here, the timestamp is 2004-04-03-23.00.00.000000 ).
The two main challenges encountered when performing such conversions are:
- All different names that may appear in a region and all different regions that may be passed in as input parameters are supported.
- For each region, the daylight savings time is calculated.
For data warehouse projects that contain data from many different applications and distributed in several continents, we have proven that this function is applicable to Extract Transform Load (ETL) the process required for data analysis and processing is extremely useful.
Background
The time continuum in a continuous interval.Date,TimeTo be more accurate, you can also addDecimal point after second. This definition is calledTimestampIn DB2, the ISO representation is as follows:
2004-07-24-16.18.28.410002
|
In this case, the precision can be reduced to 1 microsecond.
Record the timestamp when a transaction-related event occurs. This activity is called "record timestamping)" for the transaction )". The timestamp must be recorded multiple times throughout the lifecycle of a transaction, so that you can record the time of creation, last modification, and last access to a transaction.
For centralized storage transactions that occur in multiple geographic locations and multiple regions, the common design is to use the Universal Time Coordinated, UTC Time corresponding to the Time of the region) write down each timestamp record. Record the location of the transaction at the same time, such as the customer number or business department number when the transaction occurs, you can reconstruct the local time of the transaction.
When data from a large number of different sources must be processed, analyzed, and reported, we need to reconstruct the local time of the UTC timestamp at the original transaction location. Frequently asked questions include:
- I know the UTC time, but what is the customer's local time?
- Is it morning or afternoon?
- Is it within the working hours of our geographic region or outside the working hours?
- Other problems.
Challenges
Seemingly simple and easy-to-complete tasks prove to be neither simple nor easy-to-complete.
Real-time conversion from local time to UTC time is usually completed in the application. It can only be used for the current time, not for any time point in the past or in the future), and can only be used for the region of the server when the application is running, rather than any other region ).
One challenge we encountered was to understand the region names we received:
- For the same region, different names are provided for each data source in our data warehouse, such as Eastern Daylight, Eastern Standard Time, America/New_York, and EST.
- Each data source processes a group of different regions. For example, if a data source has only North American region, and other data sources have offices in Europe, a larger region list is required.
Another challenge is implementing the Daylight Savings Time, DST) rules for all regions we are dealing with-these rules cannot be easily used yet.
Another requirement is that the API used to call the conversion function can be easily used.
For example, to obtain a local time with the UTC timestamp in "America/Nassau", the SELECT statement used by a fictitious Acme Intl. company must be as simple as follows:
SELECT ACME.F_CONVERT_TIMEZONE(TRANSACTION_TIMESTAMP, "America/Nassau")FROM ACME.TRANSACTION_TABLE;
|
Solution
Some data sources are filled by Java applications. Therefore, the names used by each application for these data sources can be different Java JDK versions 1.1.8 and 1.4. Since Java JDK has compiled code for all DST rules in a fairly comprehensive set of regions, we chose to write conversion functions in Java and run the function on the Java JDK of DB2.
For ease of use, we wrap the Java class in another DB2 UDF.
Solution details
The Java class has a class variable used to store the search dictionary, which contains all possible spelling and names that may be used as input for each region.
tz_map = new Hashtable();… tz_map.put("Eastern Daylight", "EST");tz_map.put("Eastern Standard Time", "EST");tz_map.put("America/New_York", "EST");…
|
For example, all the keys above correspond to the value "EST ". This is the internal value used by the class method for timestamp conversion.
Note: long names are encouraged for time zones, such as "America/New_York ". However, in this specific implementation, we use a short name, because UDB DB2 version 7.2 uses JDK 1.1.8, and this version can only use a short name.
You can manually fill the search table. We spent a lot of time looking for internal Java settings for each region and grouping long and short names with the same DST rules and time zones.
We have mapped more than 250 regions. You can also add a new ing if needed. In this way, we have the required flexibility to add new data sources and new regions to the data warehouse.
For actual timestamp conversion, we use the following class method:
public static java.lang.String J_CONVERT_TIMEZONE(java.lang.String ivc_UTCtimestamp, java.lang.String ivc_timezone)
|
First, parse the input timestamp string into its components, and generate a Java calendar from those values, and then use the format converter formatter) generates a new converted timestamp. The subtle part is not converted and directly converted into the output, Because Java Calendar is not accurate to that extent.
You can use the following statement to register the Java class method as a UDF:
public static java.lang.String J_CONVERT_TIMEZONE(java.lang.String ivc_UTCtimestamp, java.lang.String ivc_timezone) throws Exception { // get the short name equivalent of the input ivc_timezone = (String)tz_map.get(ivc_timezone); if (ivc_timezone == null) ivc_timezone = "GMT"; // default to UTC if entry not found // replace the . with - so that we only have one token separator instead of two String ivc_UTCtimestamp_new = ivc_UTCtimestamp.replace('.', '-' ); // parse, validate and convert the TS string to integers, based on the one separator StringTokenizer st = new StringTokenizer(ivc_UTCtimestamp_new, "-"); int year = Integer.parseInt(st.nextToken()); int month = Integer.parseInt(st.nextToken()); int day = Integer.parseInt(st.nextToken()); int hour = Integer.parseInt(st.nextToken()); int min = Integer.parseInt(st.nextToken()); int sec = Integer.parseInt(st.nextToken()); String micro = st.nextToken(); // just carried over from the input // create with the above a calendar in UTC Calendar calUTC = Calendar.getInstance(); calUTC.clear(); calUTC.setTimeZone(TimeZone.getTimeZone("GMT")); calUTC.set(year, month-1, day, hour, min, sec ); // prepare the formatter for the specified timezone DateFormat formatter = new SimpleDateFormat("yyyy'-'MM'-'dd'-'HH.mm.ss", Locale.US); TimeZone tz = TimeZone.getTimeZone(ivc_timezone); formatter.setTimeZone(tz); // return the new value return formatter.format(calUTC.getTime()) + "." + micro; }
|
The preceding DB2 UDF can be called from SQL, but for convenience, we create another DB2 UDF to convert the input from the DB2 timestamp to a string, converts the output from the string to the DB2 timestamp, so that both the input and output are timestamps compatible with DB2. The code used is as follows:
CREATE FUNCTION ACME.F_CONVERT_TIMEZONE ( IPTS_TIMESTAMP TIMESTAMP, IPCH_TIMEZONE VARCHAR(30))RETURNS TIMESTAMPBEGIN ATOMIC DECLARE vvch_result VARCHAR(30); SET vvch_result = j_convert_timezone(char(IPTS_TIMESTAMP), rtrim(IPCH_TIMEZONE)); RETURN CASE vvch_result WHEN 'null' THEN NULL ELSE timestamp(vvch_result) END;END
|
Finally, we can call this function in the following simple SQL statement:
SELECT ACME.F_CONVERT_TIMEZONE(TRANSACTION_TIMESTAMP, "America/Nassau")FROM ACME.TRANSACTION_TABLE;
|
Future improvements and changes
We can easily add the following changes to enhance this solution:
- The new region method is to extend the search dictionary ).
- The new synonym Method for adding a region is to extend the search dictionary ).
- When you can get the JDK of the changed version, you can start to use the internal conversion to use the long name to update the Java code ).
- Create an inversion function -- this function takes the timestamp of a region as the parameter and returns the corresponding UTC timestamp.
Note: UTC is gradually becoming a synonym for Greenwich Mean Time (GMT.
Conclusion
In the ETL process of the Data Warehouse project, the function of converting the timestamp from one region to another is proved to be the main force in the conversion. This article provides the code for such Java functions to run in the form of UDB DB2 UDF.