Open source Cloud Computing Technology Series (iv) (Cloudera experience)

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Nbsp; user name

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The position of Cloudera is

bringing big Data to the Enterprise with Hadoop

Cloudera to standardize the configuration of Hadoop, you can help companies install, configure, and run Hadoop to handle and analyze large-scale enterprise data.

Since it is for enterprise use, Cloudera software configuration is not the latest Hadoop 0.20, but the use of Hadoop 0.18.3-12.cloudera. The Ch0_3 version is encapsulated and integrates the Hadoop-based SQL implementation interface provided by Facebook for Hive,yahoo, which reduces the cost and standardization of installation, configuration, and use of these software. Of course, in addition to the integration and encapsulation of these mature tools, cloudera a more interesting tool is sqoop, the current tool does not provide independent, so this is our overall experience cloudera a starting point, is to experience the convenience of sqoop tools.

Sqoop ("Sql-to-hadoop"), a tool designed to easily import information from SQL databases into your Hadoop Cluster. Through Sqoop, you can easily import data from a traditional RDBMS into a Hadoop cluster, for example, from MySQL and Oracle import data, very convenient, from the export to import a command to fix, and can be screened, compared to the current more mature through the text file or pipeline relay, the development of efficiency and The simplicity of the configuration is the feature of this tool.

Sqoop can do it.

Imports individual tables or entire databases to files in HDFS generates Java classes to allow for you to interact with your Impor Ted data provides the ability to import from SQL databases straight into your Hive data Warehouse

After setting up a import job in Sqoop, you can get started sharable with SQL database-backed data from your Hadoop MapReduce clust Er in minutes.

Here we first take an example to experience sqoop immediately, and then introduce the complete configuration of the cloud computing environment.

This example demonstrates that if the Customer table data is taken to the Hadoop cluster for analysis, how to export data from the Users table and automatically import to hive, hoc SQL query analysis through hive. This can reflect the powerful data processing capabilities of Hadoop and does not affect the production library.

First set up the Test users table:

Mysql> CREATE TABLE USERS (
-> user_id INTEGER not NULL PRIMARY KEY,
-> first_name VARCHAR () not NULL,
-> last_name VARCHAR () not NULL,
-> join_date date not NULL,
-> Zip INTEGER,
-> State CHAR (2),
-> email VARCHAR (128),
-> Password_hash CHAR (64));
Query OK, 0 rows Affected (0.00 sec)

Insert a test data

Insert into USERS (User_id,first_name,last_name,join_date,zip,state,email,password_hash) VALUES (1, ' A ', ' B ', ' 20080808 ', 330440, ' ha ', ' test@test.com ', ' xxxx ';
Query OK, 1 row affected, 1 Warning (0.00 sec)

Mysql> select * from USERS;
+---------+------------+-----------+------------+--------+-------+---------------+---------------+
| user_id | first_name | last_name | Join_date | Zip | State | email | Password_hash |
+---------+------------+-----------+------------+--------+-------+---------------+---------------+
| 1 | A | B | 2008-08-08 | 330440 | Ha | test@test.com | xxxx |
+---------+------------+-----------+------------+--------+-------+---------------+---------------+
1 row in Set (0.00 sec)

Then we use Sqoop to import the users table of MYSQ library to hive.

Sqoop--connect jdbc:mysql://localhost/test--username root--password xxx--local--table USERS--hive-import

09/06/20 18:43:50 INFO Sqoop. Sqoop:beginning code Generation

09/06/20 18:43:50 INFO Manager. Sqlmanager:executing SQL Statement:select t.* from USERS as t WHERE 1 = 1

09/06/20 18:43:50 INFO Manager. Sqlmanager:executing SQL Statement:select t.* from USERS as t WHERE 1 = 1

09/06/20 18:43:50 INFO Orm.CompilationManager:HADOOP_HOME is/usr/lib/hadoop

09/06/20 18:43:50 INFO orm. Compilationmanager:found Hadoop core jar at:/usr/lib/hadoop/hadoop-0.18.3-12.cloudera.ch0_3-core.jar

09/06/20 18:43:50 INFO Orm. Compilationmanager:invoking javac with args:-sourcepath./-d/tmp/sqoop/compile/-classpath/etc/hadoop/conf:/home/ hadoop/jdk1.6/lib/tools.jar:/usr/lib/hadoop:/usr/lib/hadoop/hadoop-0.18.3-12.cloudera.ch0_3-core.jar:/usr/lib/ hadoop/lib/commons-cli-2.0-snapshot.jar:/usr/lib/hadoop/lib/commons-codec-1.3.jar:/usr/lib/hadoop/lib/ commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop/lib/ commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-net-1.4.1.jar:/usr/lib/hadoop/lib/ hadoop-0.18.3-12.cloudera.ch0_3-fairscheduler.jar:/usr/lib/hadoop/lib/hadoop-0.18.3-12.cloudera.ch0_3- scribe-log4j.jar:/usr/lib/hadoop/lib/hsqldb.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/ jetty-5.1.4.jar:/usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.1.3.jar:/usr/lib/hadoop/lib/ libfb303.jar:/usr/lib/hadoop/lib/libthrift.jar:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/ mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/servlet-api.jar:/usr/lib/hadoop/lib/slf4j-api-1.4.3.jar:/usr/ lib/hadoop/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-ext/ commons-el.jar:/usr/lib/hadoop/lib/jetty-ext/jasper-compiler.jar:/usr/lib/hadoop/lib/jetty-ext/ jasper-runtime.jar:/usr/lib/hadoop/lib/jetty-ext/jsp-api.jar:/usr/lib/hadoop/hadoop-0.18.3-12.cloudera.ch0_3- Core.jar:/usr/lib/hadoop/contrib/sqoop/hadoop-0.18.3-12.cloudera.ch0_3-sqoop.jar/USERS.java

09/06/20 18:43:51 INFO orm.CompilationManager:Writing jar file:/tmp/sqoop/compile/users.jar

09/06/20 18:43:51 INFO Manager. Localmysqlmanager:beginning mysqldump Fast Path Import

09/06/20 18:43:51 INFO Manager. Localmysqlmanager:performing Import of table USERS from database test

09/06/20 18:43:52 INFO Manager. Localmysqlmanager:transfer Loop complete.

09/06/20 18:43:52 INFO hive. hiveimport:loading uploaded data into Hive

09/06/20 18:43:52 INFO Manager. Sqlmanager:executing SQL Statement:select t.* from USERS as t WHERE 1 = 1

09/06/20 18:43:52 INFO Manager. Sqlmanager:executing SQL Statement:select t.* from USERS as t WHERE 1 = 1

09/06/20 18:43:52 WARN hive. Tabledefwriter:column Join_date had to is cast to a pager precise type in Hive

09/06/20 18:43:53 INFO Hive. Hiveimport:hive History File=/tmp/root/hive_job_log_root_200906201843_1606494848.txt

09/06/20 18:44:00 INFO hive. Hiveimport:ok

09/06/20 18:44:00 INFO hive. Hiveimport:time taken:5.916 seconds

09/06/20 18:44:00 INFO hive. Hiveimport:loading data to table users

09/06/20 18:44:00 INFO hive. Hiveimport:ok

09/06/20 18:44:00 INFO hive. Hiveimport:time taken:0.344 seconds

09/06/20 18:44:01 INFO hive. Hiveimport:hive import complete.

The import is successful, we verify the correctness of the import in hive.

Hive
Hive History File=/tmp/root/hive_job_log_root_200906201844_376630602.txt
Hive> select * from USERS;
OK
1 ' A ' ' B ' 2008-08-08 ' 330440 ' ha ' test@test.com ' xxxx '
Time taken:5.019 seconds
Hive>

You can see exactly the same data as the MySQL library.

This completes the import from MySQL library to HDFs.

and provides an automatically generated Users.java program for MapReduce analysis.

more Users.java

//ORM class for USERS

//Warning:this class is auto-generated. Modify at your own disorientated.

import Org.apache.hadoop.io.Text;

import org.apache.hadoop.io.Writable;

import org.apache.hadoop.mapred.lib.db.DBWritable;

import Org.apache.hadoop.sqoop.lib.JdbcWritableBridge;

import java.sql.PreparedStatement;

import Java.sql.ResultSet;

import java.sql.SQLException;

import Java.io.DataInput;

import Java.io.DataOutput;

import java.io.IOException;

import java.sql.Date;

import Java.sql.Time;

import Java.sql.Timestamp;

public class USERS implements Dbwritable, writable {

public static final int protocol_version = 1;

private Integer user_id;

public Integer get_user_id () {

return user_id;

}

private String first_name;

public String Get_first_name () {

return first_name;

}

private String last_name;

public String Get_last_name () {

return last_name;

}

private Java.sql.Date join_date;

public java.sql.Date get_join_date () {

return join_date;

}

private Integer zip;

public Integer Get_zip () {

return zip;

}

private String State;

public String get_state () {

return state;

}

private String Email;

public String Get_email () {

return email;

}

private String Password_hash;

public String Get_password_hash () {

return password_hash;

}

public void ReadFields (ResultSet __dbresults) throws SQLException {

this.user_id = Jdbcwritablebridge.readinteger (1, __dbresults);

this.first_name = jdbcwritablebridge.readstring (2, __dbresults);

this.last_name = jdbcwritablebridge.readstring (3, __dbresults);

this.join_date = jdbcwritablebridge.readdate (4, __dbresults);

This.zip = Jdbcwritablebridge.readinteger (5, __dbresults);

this.state = jdbcwritablebridge.readstring (6, __dbresults);

This.email = jdbcwritablebridge.readstring (7, __dbresults);

This.password_hash = jdbcwritablebridge.readstring (8, __dbresults);

}

public void Write (PreparedStatement __dbstmt) throws SQLException {

Jdbcwritablebridge.writeinteger (user_id, 1, 4, __dbstmt);

jdbcwritablebridge.writestring (first_name, 2, __dbstmt);

jdbcwritablebridge.writestring (last_name, 3, __dbstmt);

jdbcwritablebridge.writedate (Join_date, 4, __dbstmt);

Jdbcwritablebridge.writeinteger (Zip, 5, 4, __dbstmt);

Jdbcwritablebridge.writestring (State, 6, 1, __dbstmt);

jdbcwritablebridge.writestring (email, 7, __dbstmt);

jdbcwritablebridge.writestring (Password_hash, 8, 1, __dbstmt);

}

public void ReadFields (Datainput __datain) throws IOException {

if (__datain.readboolean ()) {

this.user_id = null;

} else {

this.user_id = integer.valueof (__datain.readint ());

    }

if (__datain.readboolean ()) {

this.first_name = null;

} else {

this.first_name = text.readstring (__datain);

    }

if (__datain.readboolean ()) {

this.last_name = null;

} else {

this.last_name = text.readstring (__datain);

    }

if (__datain.readboolean ()) {

this.join_date = null;

} else {

this.join_date = new Date (__datain.readlong ());

    }

if (__datain.readboolean ()) {

this.zip = null;

} else {

this.zip = integer.valueof (__datain.readint ());

    }

if (__datain.readboolean ()) {

This.state = null;

} else {

this.state = text.readstring (__datain);

    }

if (__datain.readboolean ()) {

this.email = null;

} else {

This.email = text.readstring (__datain);

    }

if (__datain.readboolean ()) {

this.password_hash = null;

} else {

This.password_hash = text.readstring (__datain);

    }

}

public void Write (DataOutput __dataout) throws IOException {

if (null = = this.user_id) {

__dataout.writeboolean (TRUE);

} else {

__dataout.writeboolean (FALSE);

__dataout.writeint (this.user_id);

    }

if (null = = This.first_name) {

__dataout.writeboolean (TRUE);

} else {

__dataout.writeboolean (FALSE);

text.writestring (__dataout, first_name);

    }

if (null = = This.last_name) {

__dataout.writeboolean (TRUE);

} else {

__dataout.writeboolean (FALSE);

text.writestring (__dataout, last_name);

    }

if (null = = This.join_date) {

__dataout.writeboolean (TRUE);

} else {

__dataout.writeboolean (FALSE);

__dataout.writelong (This.join_date.getTime ());

    }

if (null = = This.zip) {

__dataout.writeboolean (TRUE);

} else {

__dataout.writeboolean (FALSE);

__dataout.writeint (This.zip);

    }

if (null = = This.state) {

__dataout.writeboolean (TRUE);

} else {

__dataout.writeboolean (FALSE);

text.writestring (__dataout, state);

    }

if (null = = This.email) {

__dataout.writeboolean (TRUE);

} else {

__dataout.writeboolean (FALSE);

text.writestring (__dataout, email);

    }

if (null = = This.password_hash) {

__dataout.writeboolean (TRUE);

} else {

__dataout.writeboolean (FALSE);

text.writestring (__dataout, Password_hash);

    }

}

public String toString () {

StringBuilder sb = new StringBuilder ();

sb.append ("" + user_id);

sb.append (",");

sb.append (first_name);

sb.append (",");

sb.append (last_name);

sb.append (",");

sb.append ("" + join_date);

sb.append (",");

sb.append ("" + Zip);

sb.append (",");

sb.append (state);

sb.append (",");

sb.append (email);

sb.append (",");

sb.append (Password_hash);

return sb.tostring ();

}

}

As you can see, auto-generated programs are very readable and can be customized for two of development uses.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More