Kettle FAQ (2)

Source: Internet
Author: User
Tags version control system
Kettle FAQ (2)

Author:Gemini5201314

10. Character Set
Kettle uses UTF8, which is commonly used in Java to transmit character sets. Therefore, no matter what database you are using or any database type character set, kettle is supported. If you encounter Character Set problems, the following prompts may help you:
1. There will be no garbled characters between a single database and a single database, regardless of the type and character set of the original database and target database.
2. if you do not know what the maximum compatible Character Set of the character set of multiple source databases is, the best way is to use UTF8 to create a database.
3. do not judge character sets in the environment where you work: a tester now has an oracle database based on the xxx character set, unfortunately, the xxx character set is not of the utf8 type, so he converts another oracle database based on the yyy character set to oracle through an ETL process, later, he found that garbled characters may occur regardless of the settings. This is because the character set of your database is not supported. No matter how you set it, it is useless. the tested database does not represent the database in which the final product runs. In particular, sometimes unrelated databases of different projects are installed on the same machine to save time, this environment was not clearly analyzed during the test, so the importance of describing the physical environment was also emphasized.
4. what you see does not necessarily mean the actual storage: when mysql processes character sets, it needs to add character set parameters to the jdbc connection parameters, oracle requires the server and client to use the same character set to display the data correctly. Therefore, it is not necessarily true that the character set garbled characters you see are garbled characters, this requires you to check whether there will be garbled characters in the character set before conversion and whether there will be garbled characters After conversion. You may need to change some parameters in your desktop environment to adapt to this change.
5. Do not use multiple character sets as data sources in one conversion.

11. predefined time dimension
Kettle provides a small tool to help us pre-fill the time dimension. This tool is available in kettle_home/samples/transformations/General-populate date dimension. the data produced in this example may not meet various needs, but you can modify this example to meet your needs.

12. SQL tab and Options tab
When creating a database connection, you can specify the connection pool parameters you need to initialize (under the Pooling tab). It also includes an Options tab and an SQL tab, the Options tab mainly sets connection parameters, such as whether autocommit is on or off, defaultFetchSize, and useCursorFetch (which is supported by mysql by default). oracle also supports defaultExecuteBatch and oracle. jdbc. streamBufferSize, oracle. jdbc. freeMemoryOnEnterImplicitCache. You can check the connection parameters supported by the corresponding database. Another tips: when creating a database connection, select your database type and select the Options tab, the following is a Show help text on options usage. Clicking this button will take you to an official parameter list page corresponding to the connection parameters of each database, by querying this list page, you can know the database parameters you can use.
The SQL tab is the SQL statement that Kettle will immediately execute after you connect to this Connection. I personally recommend that you execute an SQL statement that unified all date formats into the same format, for example, in oracle:
Alter session set nls_date_format = xxxxxxxxxxxxx
Alter session set nls_xxxxxxxxx = xxxxxxxxxxxx
This prevents you from using to_date () in large quantities during conversion. The to_char function is only used to unify the date format and is especially applicable to incremental updates.

13. Data Replication
Sometimes we may need data replication or a backup database. In this case, you need a private database solution. Kettle may not be your first choice, for example, for Oracle, there may be rman, oracle stream, oracle replication, etc. mysql also has private solutions such as mysql rmaster/slave replication, if you are sure that your demand is not in data integration, kettle may not be a good solution of first choice. You should consult professional DBAs for better performance.

14. How to Control version changes
Each transformation and job in Kettle has a version field (when you save it), but this function is not practical yet. If you need version control, we recommend that you convert transformation and job into text files and save them with svn or cvs or any version control system you are familiar, kettle will add version control functions in the next version (more easy to use ).

15. supported data sources
Kettle supports a wide range of data sources, such as some uncommon Access, MaxDB (sap db), Hypersonic, sap r/3 system, Borland Interbase, Oracle RDB, teradata and 3.0 newly added Sybase IQ.
It also includes Excel, CSV, LDAP, and OLAP Server Mondrian. Currently, Web services are supported, but SOAP is not yet supported.

16. debugging and Testing
When there is an unpredictable problem with ETL conversion, or you are not clear about the functions of a step, you may need to create a simulated environment to adapt the program, the following suggestions may be helpful:
Try to use the generate row step or a fixed text file to create a simulated data source.
The simulated data source must be representative, and the data set must be as small as possible (for performance considerations), but the data itself should be sufficiently scattered.
After creating a simulated dataset, You should know exactly what the data you want to convert will look like.

17. handle errors
It is normal for ETL tasks to encounter conversion errors due to data problems. You should not design an ETL process that depends on temporary tables or has transaction characteristics, in the face of the huge challenge of data source quality, Error handling is not uncommon. kettle also provides a very convenient way to handle errors. Right-click the step you may encounter and choose Define Error handing, it requires you to specify a step to handle the error. You can use a text file or a database table to store the error information. The error information contains an id and an error field, after you get the error information, you need to analyze the cause of the Error. For example, if you violate the primary key constraint, the primary key generation method may be incorrect or the data is duplicated, the violation of foreign key constraints may be that the data in some tables you depend on has not been converted or the foreign key table itself has filtered out the data. after you adjust these errors, make sure that all dependent data is correctly processed. kettle user guide provides a more detailed explanation, which also includes an example of using javascript to handle errors. This method can be used as a way to process simple data quality.

18. documents, documents, documents
Kettle provides a wide range of documents and user manuals, including how to connect to a database, how to implement a function, and a list of all parameters, the meaning of each input and output in the dialog box is explained. When you encounter problems, you should read these documents as soon as possible. in addition, kettle has a very active community where you can ask questions, but remember to search the Forum before asking questions to see if you have asked similar questions, if you do not remember to clearly describe your problem

Summary
This series of articles mainly discusses how to use kettle to process slow growth dimensions in data warehouses, how to design dynamic ETL, and some design skills for incremental updates, how to integrate kettle in applications and some frequently asked questions when using kettle. if you are looking for a tool to help you solve database integration problems or you are planning to build a data warehouse for a business intelligence project, kettle is a good choice, you don't have to pay any fee to get a lot of data integration features, a lot of documentation and community support. isn't that what you want from a commercial tool? What are you waiting for? Start your data integration journey.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.