Hbase for mysql Data Migration

Source: Internet
Author: User

Cannot directly dump, write a java multi-thread program for Migration

Problem 1: Operation not allowed after ResultSet closed

In a bare jdbc Statement, a thread runs 7 tables. This problem occurs only when the first table is run. Changing the method to static synchronized can solve this problem, but the slow speed cannot be tolerated. Finally, at the suggestion of a colleague, use spring jdbctemplate to solve the problem

Question 2: SELECT 'Token', 'Count' FROM {TABLE_NAME} order by 'Token' desc limit ?, ? The speed cannot be tolerated after

The problem scenario is to traverse the database. You can record the maximum token of the last time, and then where token>? Limit?

A single limit value is equivalent to limit 0 ,? Fast. It is a problem how to retrieve a real page with more than 1 million pages. I did not expect a solution.

Mysql> explain select token, count from 'trackurl _ 0'. 'click _ count_00 'order by token limit;
+ ---- + ------------- + ---------------- + ------- + --------------- + --------- + ------ + --------- + ------- +
| Id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + ---------------- + ------- + --------------- + --------- + ------ + --------- + ------- +
| 1 | SIMPLE | click_count_00 | index | NULL | PRIMARY | 30 | NULL | 1010000 |
+ ---- + ------------- + ---------------- + ------- + --------------- + --------- + ------ + --------- + ------- +

 

Explain select token, count from 'trackurl _ 0'. 'click _ count_00 'where token> "asdf" limit 10000;
+ ---- + ------------- + ---------------- + ------- + --------------- + --------- + ------ + --------- + --------------- +
| Id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + ---------------- + ------- + --------------- + --------- + ------ + --------- + --------------- +
| 1 | SIMPLE | click_count_00 | range | PRIMARY | 30 | NULL | 2865959 | Using where |
+ ---- + ------------- + ---------------- + ------- + --------------- + --------- + ------ + --------- + --------------- +

 

Type: Join type. From this option, we can preliminarily determine the query efficiency. There are several different types (sorted by the best to the worst ):
(1). system: The table has only one record, which is a special case of const.
(2). const: a maximum of one row in the table that meets the query conditions is read at the beginning of the query. Because there is only one row, the column value of this row can be considered as a constant by the rest of the optimizer. Const tables are fast because they are read only once! (Query above)
(3). eq_ref: for each row combination from the preceding table, read a row from the table. For example, select * from A, B where A. id = B. id. If id is unique or primary key in Table B, this type is returned. For each row in Table A, read A row that matches the record in table B. Except for const, this is the best join type.
(4 ). ref: This type is similar to eq_ref. The difference is that eq_ref can select a unique row in the following table based on the unique or primary key, but cannot determine the unique row. This type is used.
(5). ref_or_null: The join type is like ref, but MySQL can be added to search for rows containing NULL values. The optimization of this join type is often used in solving subqueries.
(6). index_merge: The index merge method is used to scan and search rows through range and combine the results into one. Merging will generate the union of the Union, intersection, or ongoing scanning intersection. In the EXPLAIN output, this method is represented by index_merge in the type column. In this case, the key column contains the index used by a column, and key_len contains the longest key element of these indexes.
(7). unique_subquery: unique_subquery is an index lookup function that can replace subqueries completely, improving efficiency. Explain select * from jos_content where id in (select id from jos_categories); this type is used.
(8). index_subquery: The join type is similar to unique_subquery. You can replace IN subqueries, but it is only applicable to non-unique indexes IN subqueries.
(9). range: searches for rows within a specified range and uses an index to select rows. The key column shows the index used. Key_len contains the longest key element of the index used. In this type, the ref column is NULL. When the =, <>,>, >=, <, <=, is null, <=>, BETWEEN, or IN operator IS used to compare the keyword columns with constants, this type can be used.
(10). index: This is the same as ALL, except that only the index tree is scanned. This is usually faster than ALL because index files are usually smaller than data files.
(11). all: for each row combination from the previous table, a complete table scan will be performed. If the table is the first table without the const mark, the effect is not very good and it is very poor in all other cases. You can avoid ALL by adding more indexes so that rows can be retrieved from earlier tables based on constant values or column values.

(Original: http://www.linlook.com/detail_MTE3MA==.html)

Problem 3: ArrayBlockingQueue blocks data loss

offer(E e) 
Inserts the specified element at the tail of this queue if it is possible to do so immediately without exceeding the queue's capacity, returningTrueUpon success andFalseIf this queue is full.

When the queue is full, offer returns false. If the queue is set to full, when the timeout time is set, false is returned. If the queue is full, data is lost.

Problem 4: hbase single-thread insertion is only about 500

Multi-threaded or batch insert solutions: the speed is about 50000 records per second, about 50 MB of data, half of the bandwidth.

 

Hbase shell wiki: http://wiki.apache.org/hadoop/Hbase/Shell

 

Hbase dependent jar package:

Hadoop core needs to use commons. logging

Zookeeper uses log4j

In fact, hbase client only needs several jar packages to complete common functions.

Commons-configuration-1.6

Commons-lang-2.6

Commons-logging-1.1.1

Hadoop-core-1.0.0

Hbase-0.92.1

Log4j-1.2.16

Slf4j-api-1.6.1

Slf4j-log4j12-1.5.8

Zookeeper-3.4.3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.