Hadoop-hive initialization

Last Update:2018-07-26 Source: Internet

Author: User

Tags error handling

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Around large data, data mining, artificial intelligence, there are many nouns, these nouns are interrelated, people who do not understand may just regard them as a branch of advanced code farming, but the true technology is not half the number of farmers can do, or these areas may not be the code farmers do, Even those who do this may be Java or Python developers, but programming is only their sideline, and their main business is data Science . However, there is a problem, since it is a sideline, that is, the data scientists are not too programmed, however, not programming, data processing is very difficult, in order to solve this thorny problem, programmers have developed a new model, data scientists are often very proficient in the database, that is, SQL processing, So it builds a bridge between SQL and programming, that is, hive.

The design purpose of Hive
The goal of Hive design is to allow analysts with proficiency in SQL skills to manipulate data operations using large data algorithms (add-hack checks), which are done by programmers, or designed by programmers.

A more graceful explanation
Hive is a data Warehouse tool based on Hadoop that maps structured data files to a database table and provides a simple SQL query that translates SQL statements into MapReduce tasks. The advantage is that the learning cost is low, the simple mapreduce statistic can be realized quickly by SQL statement, and it is very suitable for the statistic analysis of data Warehouse without developing special MapReduce application.

Hive Architecture
The metastore,hive Data Warehouse, equivalent to the database, corresponds to the HDFs file CLI, jdbc, WEB GUI corresponding to the command-line interface, development JDBC interface, browser interface (the window that the data is viewed and manipulated) Driver: the drive connector, Connect operation to HDFs

Hive to build a Hadoop cluster error handling
Today, when the Hadoop cluster was started, it was two namenode, the last active started, standby failed, and by looking at the boot log, it was found that the error always points to the historical task job, that is, the Hadoop cluster restarts and always executes the last failed task, resulting in a cache presence. Unable to skip. Errors are similar to the following:

ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader:Encountered exception on Operation Addop [Length=0, inodeid=16828, Path=/usr/output/friendsort/_success, replication=3, mtime=1525359058429

Online Solutions: Unable to Restrat standby Namenode
Find the wrong steps in more detail, but the solution is too much trouble.

Master node Command input: Open Safe mode and save, leave
sudo-u hdfs hdfs dfsadmin-safemode Enter
sudo-u
HDFs HDFs dfsadmin-savenamespace O-u HDFs HDFs dfsadmin-safemode leave

Auxiliary node command input: Get the saved fsimage above, restore (concrete what meaning I did not figure out, should technology restore before the data, erase the current)
sudo-u HDFs HDFs Namenode-bootstrapstandby-force

Direct upload hive Compression package, decompression, add environment variable hive configuration
Hive configuration has three ways: the first is local configuration, the second is to use stand-alone MySQL as a warehouse, the third is the use of remote MySQL database. Local MySQL database is used here
Copy Modify Profile: Hive-site.xml, delete all internal configuration information, add the following:
"' Stylus

Hive.metastore.warehouse.dir
/user/hive_remote/warehouse

Javax.jdo.option.ConnectionURL
Jdbc:mysql://localhost:3306/metastore?createdatabaseifnotexist=true

Javax.jdo.option.ConnectionDriverName
Com.mysql.jdbc.Driver

Javax.jdo.option.ConnectionUserName
Root

Javax.jdo.option.ConnectionPassword
Root

"'
* Install MySQL and create Metastore database, set user rights (otherwise inaccessible) Install connection
* Start Hive, success: Direct input hive command (this may be a lot of errors, some of the error resolution below) Error Record: (20% of the time to study, 80% of the time to find the wrong)
Java.lang.RuntimeException:Unable to instantiate Org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Start Metastore Hive--service Metastore could not create serversocket to address 0.0.0.0/0.0.0.0:9083.
The hive Metastore process has already been started and should tell the relevant process to kill. Ps-ef |grep hive Kill the hive related process. Kill-9 2545, then reboot: Hive--service metastore boot hive times wrong access denied for user ' root ' @ ' hadoop01 ' (using Password:yes)
Reason: Permission issue or password problem
Password issues to see Configuration, Permissions issues: First step: View MySQL Database All permissions data select Host,user,password from Mysql.user;, modify permissions update mysql.user set host = '% ' where user = ' root ' and host = ' 127.0.0.1 '; refresh modify flushprivileges; Reference Blog Hive installation warning WARN conf. Hiveconf:hiveconf of name hive.metastore.local does not exist
The hive version of the Hive.metastore.local property is no longer in use after 0.10 0.11 or later. Remove hive.metastore.local configuration entries in profile hive-site.xml MySQL unrecognized service problem resolution
CentOS install MySQL Public three services, must be installed completely. View MySQL installation Services via rpm-q MySQL, less reinstall Hive No command
environment variable NOT effective: Configure environment variable Three methods, select the second type

More Hive Content: Hive Interface Introduction (Web UI/JDBC) configuration

Related blog:
Installation tutorial more detailed, fine to environment variables

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More