Spark job in hive storage when there is a very strange phenomenon, according to Report_time time conversion to the hour partition, time is not up, and only part of the time is not on, the difference of 12 hours.
Because the problem does not occur with other clusters, you want to find out if there is a logic problem with the code. After reading the code, it is easy to find the logic:
After acquiring the Report_time, the time is obtained directly through a time conversion function. Continue to view event conversion functions:
This is used in Java SimpleDateFormat for a given time conversion, this conversion Java is more common, it should be no problem.
Suspected to be related to the time configuration of the cluster, view the time configuration of the two machines of the cluster:
Host node Time configuration: (NTP time synchronization is not turned on)
The time configuration information for the other node is inconsistent, and it is found that the New York time zone is configured:
Suspect is the normal code, in the abnormal time configuration in the execution of a problem, wrote a test code:
The time zone is running on a normal machine and the result is normal:
An unhealthy node is running in another time zone, and the problem recurs:
To this, it should be determined that the SimpleDateFormat with the system time zone.
The relevant data can be queried by calling the Settimezone (Timezone.gettimezone ("CST") method on the SimpleDateFormat object to specify the time zone to resolve the problem. However, it is more important to ensure the consistency of cluster node time.
Java SimpleDateFormat time Zone problem resolution