Open source Cloud Computing Technology Series (iv) (Cloudera experience)

Source: Internet
Author: User
Keywords Nbsp; user name

The position of Cloudera is

bringing big Data to the Enterprise with Hadoop

Cloudera to standardize the configuration of Hadoop, you can help companies install, configure, and run Hadoop to handle and analyze large-scale enterprise data.

Since it is for enterprise use, Cloudera software configuration is not the latest Hadoop 0.20, but the use of Hadoop 0.18.3-12.cloudera. The Ch0_3 version is encapsulated and integrates the Hadoop-based SQL implementation interface provided by Facebook for Hive,yahoo, which reduces the cost and standardization of installation, configuration, and use of these software. Of course, in addition to the integration and encapsulation of these mature tools, cloudera a more interesting tool is sqoop, the current tool does not provide independent, so this is our overall experience cloudera a starting point, is to experience the convenience of sqoop tools.

Sqoop ("Sql-to-hadoop"), a tool designed to easily import information from SQL databases into your Hadoop Cluster. Through Sqoop, you can easily import data from a traditional RDBMS into a Hadoop cluster, for example, from MySQL and Oracle import data, very convenient, from the export to import a command to fix, and can be screened, compared to the current more mature through the text file or pipeline relay, the development of efficiency and The simplicity of the configuration is the feature of this tool.

Sqoop can do it.

Imports individual tables or entire databases to files in HDFS generates Java classes to allow for you to interact with your Impor Ted data provides the ability to import from SQL databases straight into your Hive data Warehouse

After setting up a import job in Sqoop, you can get started sharable with SQL database-backed data from your Hadoop MapReduce clust Er in minutes.

Here we first take an example to experience sqoop immediately, and then introduce the complete configuration of the cloud computing environment.

This example demonstrates that if the Customer table data is taken to the Hadoop cluster for analysis, how to export data from the Users table and automatically import to hive, hoc SQL query analysis through hive. This can reflect the powerful data processing capabilities of Hadoop and does not affect the production library.

First set up the Test users table:

Mysql> CREATE TABLE USERS (
-> user_id INTEGER not NULL PRIMARY KEY,
-> first_name VARCHAR () not NULL,
-> last_name VARCHAR () not NULL,
-> join_date date not NULL,
-> Zip INTEGER,
-> State CHAR (2),
-> email VARCHAR (128),
-> Password_hash CHAR (64));
Query OK, 0 rows Affected (0.00 sec)

Insert a test data

Insert into USERS (User_id,first_name,last_name,join_date,zip,state,email,password_hash) VALUES (1, ' A ', ' B ', ' 20080808 ', 330440, ' ha ', ' test@test.com ', ' xxxx ';
Query OK, 1 row affected, 1 Warning (0.00 sec)

Mysql> select * from USERS;
+---------+------------+-----------+------------+--------+-------+---------------+---------------+
| user_id | first_name | last_name | Join_date | Zip | State | email | Password_hash |
+---------+------------+-----------+------------+--------+-------+---------------+---------------+
| 1 | A | B | 2008-08-08 | 330440 | Ha | test@test.com | xxxx |
+---------+------------+-----------+------------+--------+-------+---------------+---------------+
1 row in Set (0.00 sec)

Then we use Sqoop to import the users table of MYSQ library to hive.

Sqoop--connect jdbc:mysql://localhost/test--username root--password xxx--local--table USERS--hive-import


09/06/20 18:43:50 INFO Sqoop. Sqoop:beginning code Generation


09/06/20 18:43:50 INFO Manager. Sqlmanager:executing SQL Statement:select t.* from USERS as t WHERE 1 = 1


09/06/20 18:43:50 INFO Manager. Sqlmanager:executing SQL Statement:select t.* from USERS as t WHERE 1 = 1


09/06/20 18:43:50 INFO Orm.CompilationManager:HADOOP_HOME is/usr/lib/hadoop


09/06/20 18:43:50 INFO orm. Compilationmanager:found Hadoop core jar at:/usr/lib/hadoop/hadoop-0.18.3-12.cloudera.ch0_3-core.jar


09/06/20 18:43:50 INFO Orm. Compilationmanager:invoking javac with args:-sourcepath./-d/tmp/sqoop/compile/-classpath/etc/hadoop/conf:/home/ hadoop/jdk1.6/lib/tools.jar:/usr/lib/hadoop:/usr/lib/hadoop/hadoop-0.18.3-12.cloudera.ch0_3-core.jar:/usr/lib/ hadoop/lib/commons-cli-2.0-snapshot.jar:/usr/lib/hadoop/lib/commons-codec-1.3.jar:/usr/lib/hadoop/lib/ commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop/lib/ commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-net-1.4.1.jar:/usr/lib/hadoop/lib/ hadoop-0.18.3-12.cloudera.ch0_3-fairscheduler.jar:/usr/lib/hadoop/lib/hadoop-0.18.3-12.cloudera.ch0_3- scribe-log4j.jar:/usr/lib/hadoop/lib/hsqldb.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/ jetty-5.1.4.jar:/usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.1.3.jar:/usr/lib/hadoop/lib/ libfb303.jar:/usr/lib/hadoop/lib/libthrift.jar:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/ mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/servlet-api.jar:/usr/lib/hadoop/lib/slf4j-api-1.4.3.jar:/usr/ lib/hadoop/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-ext/ commons-el.jar:/usr/lib/hadoop/lib/jetty-ext/jasper-compiler.jar:/usr/lib/hadoop/lib/jetty-ext/ jasper-runtime.jar:/usr/lib/hadoop/lib/jetty-ext/jsp-api.jar:/usr/lib/hadoop/hadoop-0.18.3-12.cloudera.ch0_3- Core.jar:/usr/lib/hadoop/contrib/sqoop/hadoop-0.18.3-12.cloudera.ch0_3-sqoop.jar/USERS.java


09/06/20 18:43:51 INFO orm.CompilationManager:Writing jar file:/tmp/sqoop/compile/users.jar


09/06/20 18:43:51 INFO Manager. Localmysqlmanager:beginning mysqldump Fast Path Import


09/06/20 18:43:51 INFO Manager. Localmysqlmanager:performing Import of table USERS from database test


09/06/20 18:43:52 INFO Manager. Localmysqlmanager:transfer Loop complete.


09/06/20 18:43:52 INFO hive. hiveimport:loading uploaded data into Hive


09/06/20 18:43:52 INFO Manager. Sqlmanager:executing SQL Statement:select t.* from USERS as t WHERE 1 = 1


09/06/20 18:43:52 INFO Manager. Sqlmanager:executing SQL Statement:select t.* from USERS as t WHERE 1 = 1


09/06/20 18:43:52 WARN hive. Tabledefwriter:column Join_date had to is cast to a pager precise type in Hive


09/06/20 18:43:53 INFO Hive. Hiveimport:hive History File=/tmp/root/hive_job_log_root_200906201843_1606494848.txt


09/06/20 18:44:00 INFO hive. Hiveimport:ok


09/06/20 18:44:00 INFO hive. Hiveimport:time taken:5.916 seconds


09/06/20 18:44:00 INFO hive. Hiveimport:loading data to table users


09/06/20 18:44:00 INFO hive. Hiveimport:ok


09/06/20 18:44:00 INFO hive. Hiveimport:time taken:0.344 seconds


09/06/20 18:44:01 INFO hive. Hiveimport:hive import complete.

The import is successful, we verify the correctness of the import in hive.

Hive
Hive History File=/tmp/root/hive_job_log_root_200906201844_376630602.txt
Hive> select * from USERS;
OK
1 ' A ' ' B ' 2008-08-08 ' 330440 ' ha ' test@test.com ' xxxx '
Time taken:5.019 seconds
Hive>

You can see exactly the same data as the MySQL library.

This completes the import from MySQL library to HDFs.

and provides an automatically generated Users.java program for MapReduce analysis.

more Users.java


//ORM class for USERS


//Warning:this class is auto-generated. Modify at your own disorientated.


import Org.apache.hadoop.io.Text;


import org.apache.hadoop.io.Writable;


import org.apache.hadoop.mapred.lib.db.DBWritable;


import Org.apache.hadoop.sqoop.lib.JdbcWritableBridge;


import java.sql.PreparedStatement;


import Java.sql.ResultSet;


import java.sql.SQLException;


import Java.io.DataInput;


import Java.io.DataOutput;


import java.io.IOException;


import java.sql.Date;


import Java.sql.Time;


import Java.sql.Timestamp;


public class USERS implements Dbwritable, writable {


public static final int protocol_version = 1;


private Integer user_id;


public Integer get_user_id () {


return user_id;


  }


private String first_name;


public String Get_first_name () {


return first_name;


  }


private String last_name;


public String Get_last_name () {


return last_name;


  }


private Java.sql.Date join_date;


public java.sql.Date get_join_date () {


return join_date;


  }


private Integer zip;


public Integer Get_zip () {


return zip;


  }


private String State;


public String get_state () {


return state;


  }


private String Email;


public String Get_email () {


return email;


  }


private String Password_hash;


public String Get_password_hash () {


return password_hash;


  }


public void ReadFields (ResultSet __dbresults) throws SQLException {


this.user_id = Jdbcwritablebridge.readinteger (1, __dbresults);


this.first_name = jdbcwritablebridge.readstring (2, __dbresults);


this.last_name = jdbcwritablebridge.readstring (3, __dbresults);


this.join_date = jdbcwritablebridge.readdate (4, __dbresults);


This.zip = Jdbcwritablebridge.readinteger (5, __dbresults);


this.state = jdbcwritablebridge.readstring (6, __dbresults);


This.email = jdbcwritablebridge.readstring (7, __dbresults);


This.password_hash = jdbcwritablebridge.readstring (8, __dbresults);


  }


public void Write (PreparedStatement __dbstmt) throws SQLException {


Jdbcwritablebridge.writeinteger (user_id, 1, 4, __dbstmt);


jdbcwritablebridge.writestring (first_name, 2, __dbstmt);


jdbcwritablebridge.writestring (last_name, 3, __dbstmt);


jdbcwritablebridge.writedate (Join_date, 4, __dbstmt);


Jdbcwritablebridge.writeinteger (Zip, 5, 4, __dbstmt);


Jdbcwritablebridge.writestring (State, 6, 1, __dbstmt);


jdbcwritablebridge.writestring (email, 7, __dbstmt);


jdbcwritablebridge.writestring (Password_hash, 8, 1, __dbstmt);


  }


public void ReadFields (Datainput __datain) throws IOException {


if (__datain.readboolean ()) {


this.user_id = null;


} else {


this.user_id = integer.valueof (__datain.readint ());


    }


if (__datain.readboolean ()) {


this.first_name = null;


} else {


this.first_name = text.readstring (__datain);


    }


if (__datain.readboolean ()) {


this.last_name = null;


} else {


this.last_name = text.readstring (__datain);


    }


if (__datain.readboolean ()) {


this.join_date = null;


} else {


this.join_date = new Date (__datain.readlong ());


    }


if (__datain.readboolean ()) {


this.zip = null;


} else {


this.zip = integer.valueof (__datain.readint ());


    }


if (__datain.readboolean ()) {


This.state = null;


} else {


this.state = text.readstring (__datain);


    }


if (__datain.readboolean ()) {


this.email = null;


} else {


This.email = text.readstring (__datain);


    }


if (__datain.readboolean ()) {


this.password_hash = null;


} else {


This.password_hash = text.readstring (__datain);


    }


  }


public void Write (DataOutput __dataout) throws IOException {


if (null = = this.user_id) {


__dataout.writeboolean (TRUE);


} else {


__dataout.writeboolean (FALSE);


__dataout.writeint (this.user_id);


    }


if (null = = This.first_name) {


__dataout.writeboolean (TRUE);


} else {


__dataout.writeboolean (FALSE);


text.writestring (__dataout, first_name);


    }


if (null = = This.last_name) {


__dataout.writeboolean (TRUE);


} else {


__dataout.writeboolean (FALSE);


text.writestring (__dataout, last_name);


    }


if (null = = This.join_date) {


__dataout.writeboolean (TRUE);


} else {


__dataout.writeboolean (FALSE);


__dataout.writelong (This.join_date.getTime ());


    }


if (null = = This.zip) {


__dataout.writeboolean (TRUE);


} else {


__dataout.writeboolean (FALSE);


__dataout.writeint (This.zip);


    }


if (null = = This.state) {


__dataout.writeboolean (TRUE);


} else {


__dataout.writeboolean (FALSE);


text.writestring (__dataout, state);


    }


if (null = = This.email) {


__dataout.writeboolean (TRUE);


} else {


__dataout.writeboolean (FALSE);


text.writestring (__dataout, email);


    }


if (null = = This.password_hash) {


__dataout.writeboolean (TRUE);


} else {


__dataout.writeboolean (FALSE);


text.writestring (__dataout, Password_hash);


    }


  }


public String toString () {


StringBuilder sb = new StringBuilder ();


sb.append ("" + user_id);


sb.append (",");


sb.append (first_name);


sb.append (",");


sb.append (last_name);


sb.append (",");


sb.append ("" + join_date);


sb.append (",");


sb.append ("" + Zip);


sb.append (",");


sb.append (state);


sb.append (",");


sb.append (email);


sb.append (",");


sb.append (Password_hash);


return sb.tostring ();


  }


}

As you can see, auto-generated programs are very readable and can be customized for two of development uses.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.