Recent work summary

Source: Internet
Author: User

The data warehouse has encountered several problems recently, which are summarized as follows.

1. Migration from mysql to oracle. This is a complicated problem, because we have no plans to invest in an ETL tool such as datastage, so at first I decided to write my own code and import mysql Data into the text, use sqlldr to import data to oracle. This process is not very complicated, but it is annoying because I encountered the following problems:

1.1 NULL value. When mysql exports a file, the null value will be processed as "\ N". In this way, oracle will directly process it as a character when it comes to the database, so this problem occurs, I may still need to perform an update operation after the data is imported successfully. I will recognize 40 thousands or 50 thousands pieces of data, but I am not patient with millions of data records. So I chose to modify it in SQL, and wrote the case when statement to process null into '', so that sqlldr will process it into a null value when it is written into the database.

1.2 Chinese characters. Programmers may love and hate Chinese characters, because our programmers are not good at English, and most people hate reading English documents. On the one hand, they must face the problem of frequent garbled Chinese characters. The Chinese Character garbled characters occurred when exporting mysql data into the database. What's strange is that Chinese characters are displayed normally when I directly query data using sqlplus on the database server, on a remote windows system, using plsql dev to query Chinese characters is a bunch of garbled characters. Later, I added the "characterset UTF8" in the second line of the control file to solve this problem. I hope you can solve this problem using my method.

1.3 line feed. The business system is user-oriented and does not require users to perform standard input. For example, some users prefer to add many special characters to their registration nicknames, and I have been plagued by various line breaks for a long time. Line breaks are common line breaks. In operating systems such as android, line breaks are in the form of linux. When exported, mysql does not export line breaks in some form, instead, a line break is directly carried out in the text, for example, a person's nickname is

"Sunshine Boys

I love XXX"

The output is as follows (the table structure is id, nickname ):

123, sunny boys

I love XXX

In this way, sqlldr cannot identify the database and can only store the database in the second row, but the id is number type. When the database is in the database, "I love XXX" will report an error. To solve this problem, we had to use case when to process "\ n" and process it into ''. In any case, the Data Warehouse does not need to import these non-mainstream names into the database so accurately. Next, we will also encounter a line break for windows. For this problem, we still use case when for processing. Therefore, some non-mainstream information can only be processed with several layers of case when. It makes me unable to speak.

2. In fact, the most famous tool in the ETL industry is datastage, but IBM has always been very expensive, and the deployment cost is also relatively large. Therefore, Internet companies generally turn to open-source software. At this time, google told me that kettle is very useful.

Kettle consists of three parts: spoon, kitchen, and pan. They are all kitchen utensils, and kettle is the meaning of kettle. It is really interesting for foreigners sometimes. Spoon is a graphical interface that can be used to create conversions and jobs. kitchen is used to schedule jobs and pan is used to schedule conversions. Because kettle does not have a server and a client like DS, jobs created on the client will be scheduled regularly on the server. If kettle is required to implement this function, spoon must be kept on, in addition, the job is set to the running state, which can easily cause errors in java heap. Therefore, we recommend that you use crontab to schedule kitchen regularly.

For more than a month, I have tried the two methods mentioned above. The first method was my first enterprise, which the research institute liked to use. But I found through actual operations that I could write my own code, the maintenance cost will be very high, and it is easy to make mistakes. If this person leaves, the person who takes over will be very uncomfortable. The use of tools is a popular practice in the industry. In fact, if you use a tool well and are proficient in it, you can also be called a master in this field, now I have a deep understanding of the phrase "to do good deeds, you must first sharpen your tools ".

I will summarize Kettle's usage.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.