使用Sqoop1.4.4將MySQL資料庫表中資料匯入到HDFS中

來源:互聯網
上載者:User

標籤:

問題導讀:

        1、--connect參數作用?

        2、使用哪個參數從控制台讀取資料庫訪問密碼?

        3、Sqoop將關係型資料庫表中資料匯入HDFS基本參數要求及命令?

        4、資料預設匯入HDFS檔案系統中的路徑?

        5、--columns參數的作用?

        6、--where參數的作用?

一、部分關鍵參數介紹

參數介紹--connect <jdbc-uri>             指定關係型資料庫JDBC連接字串--connection-manager <class-name>    指定資料庫使用的管理類--driver <class-name>             手動指定要使用JDBC驅動程式類--hadoop-mapred-home <dir>     重寫覆蓋$HADOOP_MAPRED_HOME--help      使用提示協助提示--password-file                     設定包含身分識別驗證密碼的路徑檔案-P                             從控制台讀取資料庫訪問密碼--password <password>             設定資料庫身分識別驗證密碼--username <username>             設定資料庫訪問使用者--verbose                     列印更多程式執行流程資訊--connection-param-file <filename>   用於提供串連參數的可選的屬性檔案
二、要匯出的MySQL資料庫

[[email protected] ~]$ mysql -uhive -phive spiceReading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -AWelcome to the MySQL monitor.  Commands end with ; or \g.Your MySQL connection id is 419Server version: 5.1.73 Source distributionCopyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or itsaffiliates. Other names may be trademarks of their respectiveowners.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> select * from users;+----+----------+----------+-----+---------+------------+-------+------+| id | username | password | sex | content | datetime   | vm_id | isad |+----+----------+----------+-----+---------+------------+-------+------+| 56 | hua      | hanyun   | 男  | 開通    | 2013-12-02 |     0 |    1 || 58 | feng     | 123456   | 男  | 開通    | 2013-11-22 |     0 |    0 || 59 | test     | 123456   | 男  | 開通    | 2014-03-05 |    58 |    0 || 60 | user1    | 123456   | 男  | 開通    | 2014-06-26 |    66 |    0 || 61 | user2    | 123      | 男  | 開通    | 2013-12-13 |    56 |    0 || 62 | user3    | 123456   | 男  | 開通    | 2013-12-14 |     0 |    0 || 64 | kai.zhou | 123456   | ?   | ??      | 2014-03-05 |    65 |    0 |+----+----------+----------+-----+---------+------------+-------+------+7 rows in set (0.00 sec)

三、將上面資料庫users表中資料匯入到HDFS中

        執行匯入命令,最少要指定資料庫連接字串、訪問使用者名稱、訪問密碼和要匯入的表名,預設情況下會將資料匯入到HDFS根目錄下的/user/hadoopUser/<表名>/目錄下,也可以使用--target-dir參數,指定匯入目錄。如下:

[[email protected] ~]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive --table users  --target-dir /output/sqoop/Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.15/01/17 20:28:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.15/01/17 20:28:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.15/01/17 20:28:16 INFO tool.CodeGenTool: Beginning code generation15/01/17 20:28:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 115/01/17 20:28:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 115/01/17 20:28:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0Note: /tmp/sqoop-hadoopUser/compile/c010e7410ec7339ef9b4d9dc2ddaac80/users.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.15/01/17 20:28:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoopUser/compile/c010e7410ec7339ef9b4d9dc2ddaac80/users.jar15/01/17 20:28:18 WARN manager.MySQLManager: It looks like you are importing from mysql.15/01/17 20:28:18 WARN manager.MySQLManager: This transfer can be faster! Use the --direct15/01/17 20:28:18 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.15/01/17 20:28:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)15/01/17 20:28:18 INFO mapreduce.ImportJobBase: Beginning import of users15/01/17 20:28:18 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.addressSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/home/hadoopUser/cloud/hbase/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]15/01/17 20:28:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar15/01/17 20:28:19 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps15/01/17 20:28:19 INFO client.RMProxy: Connecting to ResourceManager at secondmgt/192.168.2.133:803215/01/17 20:28:20 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `users`15/01/17 20:28:20 INFO mapreduce.JobSubmitter: number of splits:415/01/17 20:28:20 INFO Configuration.deprecation: mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files15/01/17 20:28:20 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files15/01/17 20:28:20 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class15/01/17 20:28:20 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class15/01/17 20:28:20 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir15/01/17 20:28:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1421373857783_000215/01/17 20:28:21 INFO impl.YarnClientImpl: Submitted application application_1421373857783_0002 to ResourceManager at secondmgt/192.168.2.133:803215/01/17 20:28:21 INFO mapreduce.Job: The url to track the job: http://secondmgt:8088/proxy/application_1421373857783_0002/15/01/17 20:28:21 INFO mapreduce.Job: Running job: job_1421373857783_000215/01/17 20:28:34 INFO mapreduce.Job: Job job_1421373857783_0002 running in uber mode : false15/01/17 20:28:34 INFO mapreduce.Job:  map 0% reduce 0%15/01/17 20:28:44 INFO mapreduce.Job:  map 25% reduce 0%15/01/17 20:28:49 INFO mapreduce.Job:  map 75% reduce 0%15/01/17 20:28:54 INFO mapreduce.Job:  map 100% reduce 0%15/01/17 20:28:54 INFO mapreduce.Job: Job job_1421373857783_0002 completed successfully15/01/17 20:28:54 INFO mapreduce.Job: Counters: 27       File System Counters              FILE: Number of bytes read=0                FILE: Number of bytes written=368040                FILE: Number of read operations=0                FILE: Number of large read operations=0                FILE: Number of write operations=0                HDFS: Number of bytes read=401                HDFS: Number of bytes written=288                HDFS: Number of read operations=16                HDFS: Number of large read operations=0                HDFS: Number of write operations=8        Job Counters                Launched map tasks=4                Other local map tasks=4                Total time spent by all maps in occupied slots (ms)=174096                Total time spent by all reduces in occupied slots (ms)=0        Map-Reduce Framework                Map input records=7                Map output records=7                Input split bytes=401                Spilled Records=0                Failed Shuffles=0                Merged Map outputs=0                GC time elapsed (ms)=205                CPU time spent (ms)=10510                Physical memory (bytes) snapshot=599060480                Virtual memory (bytes) snapshot=3535720448                Total committed heap usage (bytes)=335544320        File Input Format Counters                Bytes Read=0        File Output Format Counters                Bytes Written=28815/01/17 20:28:54 INFO mapreduce.ImportJobBase: Transferred 288 bytes in 35.2792 seconds (8.1635 bytes/sec)15/01/17 20:28:54 INFO mapreduce.ImportJobBase: Retrieved 7 records.
四、查看匯入HDFS上的資料

[[email protected] ~]$ hadoop fs -cat /output/sqoop/*56,hua,hanyun,男,開通,2013-12-02,0,158,feng,123456,男,開通,2013-11-22,0,059,test,123456,男,開通,2014-03-05,58,060,user1,123456,男,開通,2014-06-26,66,061,user2,123,男,開通,2013-12-13,56,062,user3,123456,男,開通,2013-12-14,0,064,kai.zhou,123456,?,??,2014-03-05,65,0

       與原資料庫中記錄一樣,匯入成功。

五、選擇部分資料匯入

        1、--columns參數指定列     

             Sqoop預設是將表中每條記錄的所有屬性值匯入,有的時候,我們只需要匯入部分屬性值,這時可以使用--columns參數,指定需要匯入的列名,多個列之間用逗號隔開。如下將users表中的使用者名稱、性別和時間匯入到HDFS中:

[[email protected] ~]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive > --table users --columns "username,sex,datetime" --target-dir /output/sqoop/
         查看結果:

[[email protected] ~]$ hadoop fs -cat /output/sqoop/*hua,男,2013-12-02feng,男,2013-11-22test,男,2014-03-05user1,男,2014-06-26user2,男,2013-12-13user3,男,2013-12-14kai.zhou,?,2014-03-05
     2、--where參數過濾行

        另一個參數--where,可以對行做過濾,得到部分合格記錄,而不是表中全部記錄。如下,將users表中id值大於60的資料匯入到HDFS中:

[[email protected] conf]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive \  > --table users  --where " id > 60"  --target-dir /output/sqoop/
        查看結果:

[[email protected] conf]$ hadoop fs -cat /output/sqoop/*61,user2,123,男,開通,2013-12-13,56,062,user3,123456,男,開通,2013-12-14,0,064,kai.zhou,123456,?,??,2014-03-05,65,0

推薦閱讀:

         上一篇:Sqoop1.4.4在Hadoop2.2.0叢集上的安裝

         下一篇:Sqoop使用SQL語句形式將MySQL資料庫表中資料匯入到HDFS中

使用Sqoop1.4.4將MySQL資料庫表中資料匯入到HDFS中

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.