Hadoop-hive initialization

Source: Internet
Author: User
Tags error handling

Around large data, data mining, artificial intelligence, there are many nouns, these nouns are interrelated, people who do not understand may just regard them as a branch of advanced code farming, but the true technology is not half the number of farmers can do, or these areas may not be the code farmers do, Even those who do this may be Java or Python developers, but programming is only their sideline, and their main business is data Science . However, there is a problem, since it is a sideline, that is, the data scientists are not too programmed, however, not programming, data processing is very difficult, in order to solve this thorny problem, programmers have developed a new model, data scientists are often very proficient in the database, that is, SQL processing, So it builds a bridge between SQL and programming, that is, hive.

The design purpose of Hive
The goal of Hive design is to allow analysts with proficiency in SQL skills to manipulate data operations using large data algorithms (add-hack checks), which are done by programmers, or designed by programmers.

A more graceful explanation
Hive is a data Warehouse tool based on Hadoop that maps structured data files to a database table and provides a simple SQL query that translates SQL statements into MapReduce tasks. The advantage is that the learning cost is low, the simple mapreduce statistic can be realized quickly by SQL statement, and it is very suitable for the statistic analysis of data Warehouse without developing special MapReduce application.

Hive Architecture
The metastore,hive Data Warehouse, equivalent to the database, corresponds to the HDFs file CLI, jdbc, WEB GUI corresponding to the command-line interface, development JDBC interface, browser interface (the window that the data is viewed and manipulated) Driver: the drive connector, Connect operation to HDFs

Hive to build a Hadoop cluster error handling
Today, when the Hadoop cluster was started, it was two namenode, the last active started, standby failed, and by looking at the boot log, it was found that the error always points to the historical task job, that is, the Hadoop cluster restarts and always executes the last failed task, resulting in a cache presence. Unable to skip. Errors are similar to the following:

ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader:Encountered exception on Operation Addop [Length=0, inodeid=16828, Path=/usr/output/friendsort/_success, replication=3, mtime=1525359058429

Online Solutions: Unable to Restrat standby Namenode
Find the wrong steps in more detail, but the solution is too much trouble.

Master node Command input: Open Safe mode and save, leave
sudo-u hdfs hdfs dfsadmin-safemode Enter
sudo-u
HDFs HDFs dfsadmin-savenamespace O-u HDFs HDFs dfsadmin-safemode leave

Auxiliary node command input: Get the saved fsimage above, restore (concrete what meaning I did not figure out, should technology restore before the data, erase the current)
sudo-u HDFs HDFs Namenode-bootstrapstandby-force
Direct upload hive Compression package, decompression, add environment variable hive configuration
Hive configuration has three ways: the first is local configuration, the second is to use stand-alone MySQL as a warehouse, the third is the use of remote MySQL database. Local MySQL database is used here
Copy Modify Profile: Hive-site.xml, delete all internal configuration information, add the following:
"' Stylus

Hive.metastore.warehouse.dir
/user/hive_remote/warehouse


Javax.jdo.option.ConnectionURL
Jdbc:mysql://localhost:3306/metastore?createdatabaseifnotexist=true


Javax.jdo.option.ConnectionDriverName
Com.mysql.jdbc.Driver


Javax.jdo.option.ConnectionUserName
Root


Javax.jdo.option.ConnectionPassword
Root

"'
* Install MySQL and create Metastore database, set user rights (otherwise inaccessible) Install connection
* Start Hive, success: Direct input hive command (this may be a lot of errors, some of the error resolution below) Error Record: (20% of the time to study, 80% of the time to find the wrong)
Java.lang.RuntimeException:Unable to instantiate Org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Start Metastore Hive--service Metastore could not create serversocket to address 0.0.0.0/0.0.0.0:9083.
The hive Metastore process has already been started and should tell the relevant process to kill. Ps-ef |grep hive Kill the hive related process. Kill-9 2545, then reboot: Hive--service metastore boot hive times wrong access denied for user ' root ' @ ' hadoop01 ' (using Password:yes)
Reason: Permission issue or password problem
Password issues to see Configuration, Permissions issues: First step: View MySQL Database All permissions data select Host,user,password from Mysql.user;, modify permissions update mysql.user set host = '% ' where user = ' root ' and host = ' 127.0.0.1 '; refresh modify flushprivileges; Reference Blog Hive installation warning WARN conf. Hiveconf:hiveconf of name hive.metastore.local does not exist
The hive version of the Hive.metastore.local property is no longer in use after 0.10 0.11 or later. Remove hive.metastore.local configuration entries in profile hive-site.xml MySQL unrecognized service problem resolution
CentOS install MySQL Public three services, must be installed completely. View MySQL installation Services via rpm-q MySQL, less reinstall Hive No command
environment variable NOT effective: Configure environment variable Three methods, select the second type

More Hive Content: Hive Interface Introduction (Web UI/JDBC) configuration

Related blog:
Installation tutorial more detailed, fine to environment variables

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.