Hive [2] basic introduction and hive Basic Introduction

Source: Internet
Author: User

Hive [2] basic introduction and hive Basic Introduction

2.3 Hive internal Introduction: P44

$ The jar file in HIVE_HOME/lib is a specific functional part; (CLI module) other components, Thrift service, can remotely access other process functions; you can also use JDBC and ODBC to access Hive. A metastoreservice (metadata service) is required for all Hive clients. Hive uses this service to store table mode information and other metadata information; by default, the built-in Derby SQL Server is used to provide limited single-process storage services. The HWI Hive web page provides services for remote access to Hive; the configuration file of Hive is stored in the conf directory. 2.4 start Hive to enter CLI mode. $ HIVE_HOME/bin/hive will display the commands executed by the user and the locations in the local file system where the log data is stored., there are also information such as OK and the time consumed by the query. Note: keywords in Hive are case-insensitive. If you use the Derby database for metadata storage The preceding directory is a metastore_db directory created by Derby when the Hive session is started. If you switch to another directory to start Hive, it will create this directory in other directories, and forget the previous directory will cause data loss, so it is best to configure the metadata storage as a permanent path; hive. metastore. warehouse. dir is used to specify the location where the Hive table is stored. The default value in Hadoop is/usr/hive/warehouse. This parameter specifies different values for the attribute, allowing each user to define their own data warehouse directory., this prevents other system users from being affected, such as set hive. metastore. warehouse. dir =/user/myname/hive/warehouse; to avoid the trouble of specifying such a script every time Hive is started, you can put it in $ HIVE_HOME /. in the hiverc file, each time the Hive file is started, this File; 2.5 JDBC is used to connect the components provided by Hive without metadata storage components, which is not available in Hadoop and must be provided externally; metadata stores metadata information such as table mode and partition information. You can specify this information when performing create tablse or alter talbe operations, because the multi-user system may concurrently store these metadata, the default built-in database is not suitable for the production environment; Set MySQL to store the job metadata: assume that in db1.mydomain. mySQL is running on port 3306 of the pvt server, and the database name is hive_db set metadata storage database configuration in the hive-siet.xml <property> <name> javax. jdo. option. connectionURL </name> <value> jdbc: mysql: // db1.mydomain. pvt: 3306/hive_db? CreateDatabaseIfNotExist = true </value> <description> JDBC connect string for a JDBC metastore </description> </property> <name> javax. jdo. option. connectionDriverName </name> <value> com. mysql. jdbc. driver </value> <description> Driver class name for a JDBC metastore </description> </property> <name> javax. jdo. option. connectionUserName </name> <value> root </value> <description> username to use against Metastore database </description> </property> <name> javax. jdo. option. connectionPassword </name> <value> 911 </value> <description> password to use against metastore database </description> </property> <property> to enable Hive to connect to MySql, the JDBC driver needs to be placed under the class path: MySQL JDBC (Jconnector: http://www.mysql.com/downloads/connecotr/j/) After download can be placed under the Hive library path, $ HOME_HIVE/lib; after the configuration information is complete, Hive stores the metadata in MySql. 2.6 Hive command $ HOM E_HIVE/bin/hive enter the Hive e Service channel bin/hive -- helfp of the CLI to view the services provided in the Help Service List of the hive command, including the frequently used CLI, you can use the -- service name to enable a service. Hive services: cli interface, user-defined table, hiveserver Hive Server monitors Thrift connections from other processes on the hwi HiveWeb interface. It is a simple web interface that can execute query statements and other commands, you do not need to log on to the machine in the cluster and use CLI to query an extension of the jar hadoop jar command. In this way, you can run the application metastore that requires the Hive environment to start an extended Hive metadata service, multiple Clients can use rcfilecat to print the content of files in RCFile format. -- aux The path option allows you to specify a colon-separated affiliated Java package. These files contain custom extensions that you may need; -- config file directory this command allows the user to overwrite the default attribute configuration in $ HIVE_HOME/conf and point to a new configuration file directory; 2.7 command line interface $ hive -- help -- service cli displays the list of options provided by CLI usage: hive-d, -- define <key = value> Variable subsitution to apply to hive commands. e.g. -d A = B or -- define A = B -- database <databasename> Specify the database to use-e <quoted-query-string> SQL from command line-f <filename> SQL From files-h The "one-time use" command in Hive can be like this: $ hive-e "select * from mytable limit 3"; the command will exit after it is executed once; $ hive-S-e "select * from mytable limit 3">/tmp/myquery; The-S option can enable the silent mode. OK and Timetaken are not displayed in the result; and save the result to the/tmp/myquery file instead of HDFS; $ hive-S-e "set" | grep warehouse if you cannot remember an attribute name, you can use this command to query attributes. Execute Hive query from a file. In Hive, you can use the-f file name to execute one or more query statements in the specified file; generally, these Hive Query files are saved. q or. hql suffix file; $ hive-f/path/to/file/withqueryies. hqlhive> Source/path/to/file/withqueries. hql; execute the Hive statement in the file in CLI; $ hive-e "load data local inpath '/tmp/myfile 'into TALBE src "; 2.7.5 hiverc File $ hive-I '/tmp/myfile' allows you to specify a file. When CLI is started, the file is executed before the prompt appears. You can also find it in the Hive HOME directory. hiverc files, and the commands in this file are automatically executed. For commands that are required and frequently executed, you can use this file, such as setting system attributes and variables. For example: add jar/path/to/costom_hive_extensions.jar to a jar file set hive. cli. print. current. db = true; display the current working database set hive.exe c. mode. local. auto = true; Hive is encouraged to run locally if it can be executed in local mode, which can speed up data query for small datasets. Note: The values following each row; be sure to add more Hive CLI in 2.7.6. 1) Automatic completion function: press the Tab tabulation key during command input, and the CLI will automatically complete possible keywords. 2) view operation command history: You can use the up and down arrows to view the previous Command, which is recorded in $ HOME_HIVE /. in hivehistory, you can save 10000 2.7.8 shell commands. You can execute simple bash shell commands without exiting Hive CLI! Hive>! /Bin/echo "what up dog"; hive>! Pwd; Note: you cannot use the shell Pipe function or the automatic file name instance function; 2.7.9 using Hadoop dfs commands in Hive [using hadoop commands in hvie is faster than using them in bash shell because it will execute these commands in the same hive process] hive> dfs-sl /; you only need to remove the hadoop keyword in the hadoop command and end it with a semicolon to hive> dfs-help; check the comments in the Hive script 2.7.10 of the list of all functional options provided by dfs.
In the file where HiveQuery is saved, you can use -- to annotate the statement, but not in CLI. 2.7.11 display the field name set hive. cli. print. header = true; to display the field name.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.