Tags: hadoop HDFS sqoop MySQL
Sqoop is a plug-in the hadoop project. You can import the content in HDFS of the Distributed File System to a specified MySQL table, or import the content in MySQL to the HDFS File System for subsequent operations.
Test Environment Description:
Hadoop version: hadoop-0.20.2
Sqoop: sqoop-1
Use sqoop to import data from a MySQL database to hbase
Prerequisites: Install sqoop and hbase.
Download jbdc DRIVER: mysql-connector-java-5.1.10.jar
Copy the mysql-connector-java-5.1.10.jar to/usr/lib/sqoop/lib/
Command for importing hbase from MYSQL:Sqoop import -- connect JDBC: mysql: // 10.10.97.116: 3306/Rsearch -- table researchers -- hbase-Table A -- colum
Sqoop is a tool used for data transmission between hadoop and RDBMS. The configuration is relatively simple. Download the latest sqoop package from the apache website. : Www. apache. orgdistsqoop1.99.1 decompress the package to the server. The server requires jdk, hadoop, and hive. Configuration: confsqoop-env.sh #
Sqoop is a tool used for data transmission betwe
HDFs to MySQLCsv/txt files to HDFsMySQL to HDFsMapping of Hive to HDFs:drop table if exists emp;CREATE TABLE emp (IDintComment'ID', Emp_namestringComment'name', Jobstring) Comment'Career'row format delimited--stored asrcfile Location'/user/hive/warehouse/emp';Stored as keyword, hive currently supports three different ways:1: Is the most common textfile, the data does not compress, the disk overhead is big, the parsing cost is also big2:squencefile,hadoop API provides a binary API approach, which
Recently in the data analysis of a traffic flow, the demand is for a huge amount of urban traffic data, need to use MapReduce cleaning after importing into hbase storage, and then using the Hive External table associated with hbase, hbase data query, statistical analysis, Save the analysis results in a hive table, and finally use Sqoop to import the data from that table into MySQL. The whole process is probably as follows:
Below I mainly
SqoopRelational DB and Hive/hdfs/hbase import the exported MapReduce framework.Http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.4-cdh5.1.0/SqoopUserGuide.htmlEtl:extraction-transformation-loading abbreviations, data extraction, transformations (business processing), and loading.File data Source: Hive load CommandRelational DB data Source: Sqoop ExtractionSqoop Import data to hdfs/hive/hbase--> Business proc
Operating Environment CentOS 5.6 Hadoop HiveSqoop is a tool developed by the Clouder company that enables Hadoop technology to import and export data between relational databases and hdfs,hive.Shanghai still school Hadoop Big Data Training Group original, there are hadoop big Data technology related articles, please pay more attention!Problems you may encounter during use:
Sqoop relies on zookeeper, so zookeeper_home must be configured in the
OverviewSqoop is an Apache top-level project that is used primarily to pass data in Hadoop and relational databases. With Sqoop, we can easily import data from a relational database into HDFs, or export data from HDFs to a relational database.
Sqoop Architecture:
The Sqoop architecture is simple enough to integrate hive, HBase, and Oozie to transmit data through
Sqoop installation is also very simple. After sqoop is installed, you can test whether it can be connected to mysql (Note: The jar package of mysql should be placed under SQOOP_HOMElib ):
Sqoop installation is also very simple. After sqoop is installed, you can test whether it can be connected to mysql (Note: The jar p
Sqoop Although the stable application in the production environment for many years, but some of its own shortcomings to the actual operation caused inconvenience. Sqoop2 became the object of research, so what are the advantages of SQOOP2? First of all, we first understand the use of Sqoop, using sqoop data will not be lost, and
BackgroundSqoop is a tool used to transfer data from Hadoop and relational databases (RDBMS) to each other. When using Sqoop, we need to provide the access password for the database. Currently Sqoop supports 4 ways to enter passwords:
Clear text mode.
Interactive mode.
File mode.
Alias mode.
The author uses the Sqoop in CDH5.10, the vers
;
+----+------+------+ |id|name|age| +----+------+------+ |7|a|1| | 8|b|2| |9|
c|3| +----+------+------+ 3rowsinset (0.00sec) 2. Licensing for individual users Note: After the Sqoop commits the job, each node accesses the database during the map phase, so the prior authorization is required mysql> Grant [All | select | ...] on {db}. {table} to {User}@{host} identified by {passwd};
mysql> flush Privileges;
#我给特定的hostname授权 username:root passwd:root Ac
Sqoop is used to import and export data.(1) Import data from databases such as MySQL, Oracle, etc. into HDFs, Hive, HBase (2) Export data from HDFs, Hive, hbase to MySQL, Oracle and Other databases (3) Import and export transactions are in mapper task units. 1, Sqoop installation steps1.1, Execution command: TAR-ZXVF sqoop-1.4.3.bin__hadoop-1.0.0.tar.gz decompres
Sqoop importing data from Oracle to hive, example:
[plain] view plain copy sqoop import--connect jdbc:oracle:thin: @oracle-host:port:orcl--username Name--passwo RD passwd--hive-import-table TableNameIf you do not add additional parameters, the default column delimiter for imported data is ' \001 ', and the default line delimiter is ' \ n '.
The problem is that if there is ' \ n ' in the imported data, Hive
A, import to Sqoop to eclipse: Download Sqoop 1.3 of the TAR package decompression, we open the Build.xml, found B, Debug Sqoop: Because the Sqoop Bin folder in the script, Sqoop to start the Java process, the Java process is Sqoop
Note: The process described in configuring sqoop1.99.7 in this article is based on the configuration of Hadoop.
One, can refer to the installation Environment Description
Apache Hadoop2.6.1
Sqoop1.99.7
centos6.5
MySQL Server 5.6
Second, Sqoop2 download
Directly on Sqoop official website http://mirrors.hust.edu.cn/apache/sqoop/1.99.7/Select the bin version of the Sqoop2 1.99.7, this version has been com
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.