In previous learning and practice hive, the CLI or hive–e approach was used, allowing only HIVEQL to perform queries, updates, and so on, and this was a clumsy and singular approach. Fortunately, Hive provides a light client implementation that, through Hiveserver or HiveServer2, allows the client to manipulate data in hive without starting the CLI, both allowing remote clients to use multiple programming languages such as Java, Python submits the request to hive to retrieve the results. Hiveserver or HiveServer2 are based on thrift, but Hivesever is sometimes called thrift server, and HiveServer2 does not. Since there is already hiveserver why do you need HiveServer2? This is because Hiveserver cannot handle concurrent requests from more than one client, which is due to limitations caused by the thrift interface used by Hiveserver and cannot be modified by modifying Hiveserver code. Therefore, rewriting the Hiveserver code in the Hive-0.11.0 version has been HiveServer2, which solves the problem. HIVESERVER2 supports multi-client concurrency and authentication, providing better support for open API clients such as JDBC and ODBC.
Now that HiveServer2 provides more powerful functionality, it will focus on learning, but it will also give you a quick look at how hiveserver is used. Enter Hive--service help in the command, as shown below. As you can see from the results, you can use hive <parameters>--service serviceName <serviceparameters> to start specific services such as the CLI, Hiverserver, Hiveserver2 and so on.
[hadoop@hadoop~]$ Hive--service help
Usage ./hive<parameters>--service serviceName <service parameters>
Service list:beelinecli help hiveserver2 hiveserver hwi jar lineage Metastore Metatool ORCFILEDUMPRCF Ilecat schematool version
parametersparsed:
--auxpath:auxillary jars
--config:hive Configuration Directory
--service:starts specificservice/component. CLI is default
Parameters used:
hadoop_home or Hadoop_prefix:hadoop installdirectory
hive_opt:hive options for help on
aparticular service:
./hive--s Ervice serviceName--help
Debug Help: ./hive--debug--help
Enter Hive--service hiveserver–help on the command line to view the Help information for Hiveserver:
[hadoop@hadoop~]$ Hive--service hiveserver--help
starting hive Thrift Server
usage:hiveserver-
h,--Help Print Help information
--hiveconf <property=value> use value for given property
-- maxWorkerThreads <arg> Maximum number of worker threads,
default:2147483647
--minworkerthreads <arg> Minimum number of worker threads,
default:100-
p <port> Hive Server portnumber, default:10000-
v,--verbose verbose mode
Starting the Hiveserver service, you can learn that the default Hiveserver runs on port 10000, a minimum of 100 worker threads, and a maximum of 2147483647 worker threads.
[hadoop@hadoop~]$ Hive--service hiveserver-v
starting hive Thrift Server
14/08/01 11:07:09warn conf. Hiveconf:deprecated:hive.metastore.ds.retry.* no longer has anyeffect. Use Hive.hmshandler.retry.*instead
starting hive Serveron Port 10000 with min worker threads and 2147483647 Maxwor Ker Threads
Next, learn more powerful hiveserver2. HIVESERVER2 allows configuration management in configuration file Hive-site.xml, with the following parameters:
hive.server2.thrift.min.worker.threads– minimum number of worker threads, default is 5.
hive.server2.thrift.max.worker.threads– minimum number of worker threads, default is 500.
Hive.server2.thrift.port–tcp's listening port, which defaults to 10000.
hive.server2.thrift.bind.host–tcp bound host, default is localhost.
You can also set the environment variables Hive_server2_thrift_bind_host and Hive_server2_thrift_port override Hive-site.xml settings for the host and port number. Starting with Hive-0.13.0, HIVESERVER2 supports the transmission of messages over HTTP, which is particularly useful when there is proxy mediation between the client and server. The parameters related to HTTP transport are as follows:
The hive.server2.transport.mode– default value is binary (TCP), which is an optional value of HTTP.
Hive.server2.thrift.http.port–http's listening port, the default value is 10001.
The endpoint name of the hive.server2.thrift.http.path– service, which defaults to Cliservice. The
minimum worker thread in the hive.server2.thrift.http.min.worker.threads– service pool, which defaults to 5. The
minimum worker thread in the hive.server2.thrift.http.max.worker.threads– service pool, which defaults to 500.
There are two ways to start Hiveserver2, one is the hive--service Hiveserver2 described above, and the other is more concise, hiveserver2. Use Hive--service hiveserver2–h or Hive--service hiveserver2–help to view Help information:
Starting HiveServer2
unrecognizedoption:-H
usage:hiveserver2-
h,--help Print help information
--hiveconf <property=value> Use value for given property
By default, HiveServer2 executes the query as the user who submits the query (true), and if Hive.server2.enable.doAs is set to False, the query runs as the user running the Hiveserver2 process. To prevent memory leaks in non-encrypted mode, you can disable the file system cache by setting the following parameter to true:
fs.hdfs.impl.disable.cache– disables the HDFs file system cache with the default value of FALSE.
fs.file.impl.disable.cache– disables the local file system cache, the default value is False.