標籤:
問題導讀:
1、--connect參數作用?
2、使用哪個參數從控制台讀取資料庫訪問密碼?
3、Sqoop將關係型資料庫表中資料匯入HDFS基本參數要求及命令?
4、資料預設匯入HDFS檔案系統中的路徑?
5、--columns參數的作用?
6、--where參數的作用?
一、部分關鍵參數介紹
參數介紹--connect <jdbc-uri> 指定關係型資料庫JDBC連接字串--connection-manager <class-name> 指定資料庫使用的管理類--driver <class-name> 手動指定要使用JDBC驅動程式類--hadoop-mapred-home <dir> 重寫覆蓋$HADOOP_MAPRED_HOME--help 使用提示協助提示--password-file 設定包含身分識別驗證密碼的路徑檔案-P 從控制台讀取資料庫訪問密碼--password <password> 設定資料庫身分識別驗證密碼--username <username> 設定資料庫訪問使用者--verbose 列印更多程式執行流程資訊--connection-param-file <filename> 用於提供串連參數的可選的屬性檔案
二、要匯出的MySQL資料庫
[[email protected] ~]$ mysql -uhive -phive spiceReading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -AWelcome to the MySQL monitor. Commands end with ; or \g.Your MySQL connection id is 419Server version: 5.1.73 Source distributionCopyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or itsaffiliates. Other names may be trademarks of their respectiveowners.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> select * from users;+----+----------+----------+-----+---------+------------+-------+------+| id | username | password | sex | content | datetime | vm_id | isad |+----+----------+----------+-----+---------+------------+-------+------+| 56 | hua | hanyun | 男 | 開通 | 2013-12-02 | 0 | 1 || 58 | feng | 123456 | 男 | 開通 | 2013-11-22 | 0 | 0 || 59 | test | 123456 | 男 | 開通 | 2014-03-05 | 58 | 0 || 60 | user1 | 123456 | 男 | 開通 | 2014-06-26 | 66 | 0 || 61 | user2 | 123 | 男 | 開通 | 2013-12-13 | 56 | 0 || 62 | user3 | 123456 | 男 | 開通 | 2013-12-14 | 0 | 0 || 64 | kai.zhou | 123456 | ? | ?? | 2014-03-05 | 65 | 0 |+----+----------+----------+-----+---------+------------+-------+------+7 rows in set (0.00 sec)
三、將上面資料庫users表中資料匯入到HDFS中
執行匯入命令,最少要指定資料庫連接字串、訪問使用者名稱、訪問密碼和要匯入的表名,預設情況下會將資料匯入到HDFS根目錄下的/user/hadoopUser/<表名>/目錄下,也可以使用--target-dir參數,指定匯入目錄。如下:
[[email protected] ~]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive --table users --target-dir /output/sqoop/Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.15/01/17 20:28:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.15/01/17 20:28:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.15/01/17 20:28:16 INFO tool.CodeGenTool: Beginning code generation15/01/17 20:28:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 115/01/17 20:28:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 115/01/17 20:28:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0Note: /tmp/sqoop-hadoopUser/compile/c010e7410ec7339ef9b4d9dc2ddaac80/users.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.15/01/17 20:28:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoopUser/compile/c010e7410ec7339ef9b4d9dc2ddaac80/users.jar15/01/17 20:28:18 WARN manager.MySQLManager: It looks like you are importing from mysql.15/01/17 20:28:18 WARN manager.MySQLManager: This transfer can be faster! Use the --direct15/01/17 20:28:18 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.15/01/17 20:28:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)15/01/17 20:28:18 INFO mapreduce.ImportJobBase: Beginning import of users15/01/17 20:28:18 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.addressSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/home/hadoopUser/cloud/hbase/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]15/01/17 20:28:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar15/01/17 20:28:19 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps15/01/17 20:28:19 INFO client.RMProxy: Connecting to ResourceManager at secondmgt/192.168.2.133:803215/01/17 20:28:20 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `users`15/01/17 20:28:20 INFO mapreduce.JobSubmitter: number of splits:415/01/17 20:28:20 INFO Configuration.deprecation: mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files15/01/17 20:28:20 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files15/01/17 20:28:20 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class15/01/17 20:28:20 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class15/01/17 20:28:20 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir15/01/17 20:28:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1421373857783_000215/01/17 20:28:21 INFO impl.YarnClientImpl: Submitted application application_1421373857783_0002 to ResourceManager at secondmgt/192.168.2.133:803215/01/17 20:28:21 INFO mapreduce.Job: The url to track the job: http://secondmgt:8088/proxy/application_1421373857783_0002/15/01/17 20:28:21 INFO mapreduce.Job: Running job: job_1421373857783_000215/01/17 20:28:34 INFO mapreduce.Job: Job job_1421373857783_0002 running in uber mode : false15/01/17 20:28:34 INFO mapreduce.Job: map 0% reduce 0%15/01/17 20:28:44 INFO mapreduce.Job: map 25% reduce 0%15/01/17 20:28:49 INFO mapreduce.Job: map 75% reduce 0%15/01/17 20:28:54 INFO mapreduce.Job: map 100% reduce 0%15/01/17 20:28:54 INFO mapreduce.Job: Job job_1421373857783_0002 completed successfully15/01/17 20:28:54 INFO mapreduce.Job: Counters: 27 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=368040 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=401 HDFS: Number of bytes written=288 HDFS: Number of read operations=16 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Job Counters Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=174096 Total time spent by all reduces in occupied slots (ms)=0 Map-Reduce Framework Map input records=7 Map output records=7 Input split bytes=401 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=205 CPU time spent (ms)=10510 Physical memory (bytes) snapshot=599060480 Virtual memory (bytes) snapshot=3535720448 Total committed heap usage (bytes)=335544320 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=28815/01/17 20:28:54 INFO mapreduce.ImportJobBase: Transferred 288 bytes in 35.2792 seconds (8.1635 bytes/sec)15/01/17 20:28:54 INFO mapreduce.ImportJobBase: Retrieved 7 records.
四、查看匯入HDFS上的資料
[[email protected] ~]$ hadoop fs -cat /output/sqoop/*56,hua,hanyun,男,開通,2013-12-02,0,158,feng,123456,男,開通,2013-11-22,0,059,test,123456,男,開通,2014-03-05,58,060,user1,123456,男,開通,2014-06-26,66,061,user2,123,男,開通,2013-12-13,56,062,user3,123456,男,開通,2013-12-14,0,064,kai.zhou,123456,?,??,2014-03-05,65,0
與原資料庫中記錄一樣,匯入成功。
五、選擇部分資料匯入
1、--columns參數指定列
Sqoop預設是將表中每條記錄的所有屬性值匯入,有的時候,我們只需要匯入部分屬性值,這時可以使用--columns參數,指定需要匯入的列名,多個列之間用逗號隔開。如下將users表中的使用者名稱、性別和時間匯入到HDFS中:
[[email protected] ~]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive > --table users --columns "username,sex,datetime" --target-dir /output/sqoop/
查看結果:
[[email protected] ~]$ hadoop fs -cat /output/sqoop/*hua,男,2013-12-02feng,男,2013-11-22test,男,2014-03-05user1,男,2014-06-26user2,男,2013-12-13user3,男,2013-12-14kai.zhou,?,2014-03-05
2、--where參數過濾行
另一個參數--where,可以對行做過濾,得到部分合格記錄,而不是表中全部記錄。如下,將users表中id值大於60的資料匯入到HDFS中:
[[email protected] conf]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive \ > --table users --where " id > 60" --target-dir /output/sqoop/
查看結果:
[[email protected] conf]$ hadoop fs -cat /output/sqoop/*61,user2,123,男,開通,2013-12-13,56,062,user3,123456,男,開通,2013-12-14,0,064,kai.zhou,123456,?,??,2014-03-05,65,0
推薦閱讀:
上一篇:Sqoop1.4.4在Hadoop2.2.0叢集上的安裝
下一篇:Sqoop使用SQL語句形式將MySQL資料庫表中資料匯入到HDFS中
使用Sqoop1.4.4將MySQL資料庫表中資料匯入到HDFS中