The purpose of indexing is to greatly improve query efficiency and read/write efficiency.
Kettle inserts data into the SQL statement. indexes must be created during the update, which greatly improves the processing time.
However, when an index is created, the following error occurs: Specified key was too long; max key length is 1000 bytes.
In mysql, the index field type settings are too long. Let's take a look at the Field Types in fields. The total value cannot exceed 1000 bytes.
Note that the field length is the char length multiplied by the number of digits of the field type. The length of the accumulated and cannot exceed the limit is 1000:
Latin1 = 1 byte = 1 character
Uft8 = 3 byte = 1 character
Gbk = 2 byte = 1 character
Take GBK as an example:
Create unique index 'unique _ record' ON reports ('report _ name', 'Report _ client', 'Report _ City ');
Report_name varchar (200), report_client varchar (200), report_city varchar (200)
(200 + 200 + 200) * 2 = 1200> 1000, a 1071 error is reported. If you change report_city to varchar (100), the index can be created successfully.
However, if the table is a UTF8 character set, the index cannot be created. Therefore, multiply the value by 3.
KEY 'idx _ 1' ('packagename', 'storename', 'app _ name', 'version', 'category ', 'app _ link_hash') USING BTREE
All fields are of utf8 type. After 3, the value exceeds 1000.
In the table I created, the field length exceeds the limit, and the data cannot be moved. So I used the following tips:
1. If the applink field is greater than 200, it must exceed. Solution: Use the MD5 Algorithm to convert applink_hash code of a specific length (32-bit length) to a one-to-one correspondence with a specific length. Use this new field to only 32 bits. Exclude long fields and convert them to specific hashcodes.
2. The unique key is analyzed when kettle is inserted. The relationship between fields. Many fields can be combined to uniquely identify a field, so this field does not need to appear in the index. Kettle does not need to compare this field during insertion.
3. Change it to the compatible type of small fields. If the content of a field is acceptable, that is, the content contained is very small, such as the utf8 type, you can change it to gbk to save a field. However, the data content must be contained by gbk. Otherwise, the content is not recognized and it is useless to insert the data.