Hue installation and configuration practices

Last Update:2016-07-28 Source: Internet

Author: User

Tags sqoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hue installation and configuration practices

Hue is an open-source Apache Hadoop UI system. It was first evolved from Cloudera Desktop and contributed to the open-source community by Cloudera. It is implemented based on the Python Web framework Django. By using Hue, we can interact with the Hadoop cluster on the Web Console of the browser to analyze and process data, such as operating data on HDFS and running MapReduce Job. I have heard of the convenience and power of Hue for a long time and have never tried it myself. Next, let's take a look at the feature set supported by Hue through the official website:

By default, session data is managed based on lightweight sqlite databases. user authentication and authorization can be customized to MySQL, Postgresql, and Oracle.
Access HDFS based on File Browser
Develop and run Hive query based on Hive Editor
Supports Solr-based search applications, and provides visual data views and dashboards (Dashboard)
Supports interactive queries for Impala-based applications
Supports Spark editor and Dashboard)
Supports Pig editor and submits script tasks.
The Oozie editor is supported. You can submit and monitor Workflow, Coordinator, and Bundle through a dashboard.
Supports HBase browsers to visualize data, query data, and modify HBase tables.
Supports the Metastore browser to access Hive metadata and HCatalog
Support Job Explorer with access to MapReduce Job (MR1/MR2-YARN)
Supports the Job designer to create MapReduce, Streaming, and Java jobs.
Support Sqoop 2 editor and Dashboard)
Supports ZooKeeper browsers and editors
Supports MySql, PostGresql, Sqlite, and Oracle database query editors.

Next, we verify some features of Hue through actual installation.

Environment preparation

Here, the Basic Environment and Its configuration are as follows:

CentOS-6.6 (Final)
JDK-1.7.0_25
Maven-3.2.1
Git-1.7.1
Hue-3.7.0 (branch-3.7.1)
Hadoop-2.2.0
Hive-1, 0.14
Python-1, 2.6.6

Based on the above software tools, ensure correct installation and configuration. It should be noted that we use Hue to execute Hive queries, and we need to start the HiveServer2 service:

cd /usr/local/hivebin/hiveserver2 &

Otherwise, Hive query cannot be executed through Hue Web control.

Install configurations

I have created a hadoop user. As a hadoop user, I first use the yum tool to install Hue-related dependent software:

sudo yum install krb5-devel cyrus-sasl-gssapi cyrus-sasl-deve libxml2-devel libxslt-devel mysql mysql-devel openldap-devel python-devel python-simplejson sqlite-devel

Then, run the following command to download and build the Hue package:

cd /usr/local/sudo git clone https://github.com/cloudera/hue.git branch-3.7.1sudo chown -R hadoop:hadoop branch-3.7.1/cd branch-3.7.1/make apps

If there is no problem with the above process, we have installed Hue. The Hue configuration file is/usr/local/branch-3.7.1/desktop/conf/pseudo-distributed.ini, the default configuration file does not run properly Hue, so you need to modify the content, corresponds to the Hadoop cluster configuration. The configuration file divides the configuration into multiple segments based on the integration of different software. Each segment contains sub-segments to facilitate configuration management, as shown below (the Sub-segment name is omitted ):

Desktop
Libsaml
Libopenid
Liboauth
Librdbms
Hadoop
Filebrowser
Liboozie
Oozie
Beeswax
Impala
Pig
Sqoop
Proxy
Hbase
Search
Indexer
Jobsub
Jobbrowser
Zookeeper
Spark
Useradmin
Libsentry

We can easily configure what we need as needed. The following table describes how to modify the configuration file:

Hue configuration section	Hue configuration item	Hue configuration value	Description
Desktop	Default_hdfs_superuser	Hadoop	HDFS user management
Desktop	Http_host	10.10.4.125	Host/IP address of Hue Web Server
Desktop	Http_port	8000	Hue Web Server Service port
Desktop	Server_user	Hadoop	Process user running Hue Web Server
Desktop	Server_group	Hadoop	Process User Group Running Hue Web Server
Desktop	Default_user	Yanjun	Hue Administrator
Hadoop/hdfs_clusters	Fs_defaultfs	Hdfs: // hadoop6: 8020	Corresponds to the core-site.xml configuration item fs. defaultFS
Hadoop/hdfs_clusters	Hadoop_conf_dir	/Usr/local/hadoop/etc/hadoop	Hadoop configuration file directory
Hadoop/yarn_clusters	Resourcemanager_host	Hadoop6	Corresponds to the yarn-site.xml configuration item yarn. resourcemanager. hostname
Hadoop/yarn_clusters	Resourcemanager_port	8032	ResourceManager service port number
Hadoop/yarn_clusters	Resourcemanager_api_url	Http: // hadoop6: 8088	Corresponds to yarn-site.xml configuration item yarn. resourcemanager. webapp. address
Hadoop/yarn_clusters	Proxy_api_url	Http: // hadoop6: 8888	Configure yarn. yarn-site.xml for web-proxy.address
Hadoop/yarn_clusters	History_server_api_url	Http: // hadoo6: 19888	Mapreduce. jobhistory. webapp. address corresponding to the mapred-site.xml configuration item
Beeswax	Hive_server_host	10.10.4.125	Node host name/IP address of Hive
Beeswax	Hive_server_port	10000	HiveServer2 service port number
Beeswax	Hive_conf_dir	/Usr/local/hive/conf	Hive configuration file directory

The content related to the Hadoop cluster and Hive (Hive is configured for the beeswax segment and interacts with Hive through HIveServer2) are configured above ).
Finally, start the Hue service and execute the following command:

cd /usr/local/branch-3.7.1/build/env/bin/supervisor &

Hue function verification

We mainly execute Hive queries on the Hue Web Console, so we need to prepare Hive-related tables and data.

Prepare Hive

First, create a database in Hive (Authorize if you do not have the permission ):

GRANT ALL TO USER hadoop;CREATE DATABASE user_db;

Here, the hadoop user is the management user of Hive and can grant all permissions to this user.
Create an example table. The table creation DDL is as follows:

CREATE TABLE user_db.daily_user_info (  device_type int,  version string,  channel string,  udid string)PARTITIONED BY (  stat_date string)ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'STORED AS INPUTFORMAT  'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

The format of the prepared data file is as follows:

.2.1     C-gbnpk     b01b8178b86cebb9fddc035bb238876d     3.0.7     A-wanglouko     e2b7a3d8713d51c0215c3a4affacbc95     1.2.7     H-follower     766e7b2d2eedba2996498605fa03ed331.2.7     A-shiry     d2924e24d9dbc887c3bea5a1682204d9     1.5.1     Z-wammer     f880af48ba2567de0f3f9a6bb70fa962     1.2.7     H-clouda     aa051d9e2accbae74004d761ec747110     2.2.13     H-clouda     02a32fd61c60dd2c5d9ed8a826c53be42.5.9     B-ywsy     04cc447ad65dcea5a131d5a993268edf

Each field is separated by a TAB. The meaning of each field corresponds to the field user_db.daily_user_info in the preceding table. Then, we load the test data to the partitions in the example table:

LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-05.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-05');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-06.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-06');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-07.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-07');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-08.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-08');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-09.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-09');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-10.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-10');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-11.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-11');

You can log on through the Hive CLI interface to view the table data:

SELECT COUNT(1) FROM daily_user_info;

I have 241709545 records as test data.

Hue logon page

After the Hue service is successfully started, you can directly access http: // 10.10.4.125: 8000/in a browser to log on. To enable this function for the first time, you need to enter the Default User and password, and then you can log in, as shown in:

When you log on to the console for the first time, you can select the user as the Hue administrator. You have a high permission to add users and manage the operation permissions of users and their user groups.

Hue user Homepage

After successful logon, go to the Hue Web Console homepage, as shown in:

After successful logon, the system will first perform some basic environment configuration checks, which are related to which applications are specified when we actually modify the configuration.

Hive query editor page

After the user logs on, select the Hive menu item under Query Editors ,:

When submitting a query, because the query has been executed for a long time, you can wait for the query to be executed. The final result is displayed on the Results tab of the current room, you can also view the Hive background execution status during execution.

Job browser page

You can view jobs running in various States on the Hadoop cluster through the Job Browser http: // 10.10.4.125: 8000/jobbrowser, including Succeeded, Running, Failed, and Killed ,:

To view the specific Job execution status, you must correctly configure and start the JobHistoryServer and WebAppProxyServer services of the Hadoop cluster. You can view the relevant data on the Web page. Our example is as follows ,:

If you want to view the execution of a MapTask or ReduceTask corresponding to a Job, you can click the corresponding link. This is similar to the Job Web management interface of Hadoop YARN, which makes monitoring very convenient.

User Management and authorization

After successfully logging on as an authorized administrator user, you can click the user in the upper-right corner (yanjun here). The "Manage Users" menu item is displayed in the drop-down list, where you can create a new user, and specify the access permission, as shown in:

Above, I created several users and specified the group to which the user belongs (Groups, supports group management ). In fact, we can set different Hue applications to different groups, and then assign new users to the relevant groups. In this way, we can control the permissions of users to access Hue applications. The user who created and assigned permissions above can log on to the Hue Web management system by setting the user name and password, and interact with various Hadoop-related applications (such as MySQL and Spark.

Summary

Through the above understanding and problems encountered during the installation and configuration process, let's make a summary:

If you install and configure Hue Based on the CentOS environment, it may be relatively complicated and may not be easy to complete. I started to configure Based on CentOS-5.11 (Final), and the configuration was not successful, probably because the Hue version used is too high (branch-3.0 and branch-3.7.1 I have tried ), or it may be caused by problems such as the installation of some software packages on which CentOS depends. We recommend that you use a newer version of CentOS, where I use CentOS-6.6 (Final), branch-3.7.1 source code compilation for Hue, and Python requires 2.6 +.
With Hue, we may also be interested in user management and permission assignment. Therefore, we can use other relational databases, such as MySQL, as needed, and back up data, to prevent the loss of user data related to the Hue application and prevent problems such as the inability to access the Hadoop cluster. You need to modify the Hue configuration file and change the default storage method sqlite3 to a familiar relational database. Currently, MySQL, Postgresql, and Oracle are supported.
If necessary, it may combine the underlying access control mechanism of the Hadoop cluster, such as Kerberos or Hadoop SLA, with the user management and authorization authentication functions of Hue to better restrict and control access permissions.
Based on the Hue features we mentioned earlier, we can select different Hue applications based on our actual application scenarios. Through this plug-in configuration, we can start the application and interact with it through Hue, such as Oozie, Pig, Spark, and HBase.
If you use a lower version of Hive, such as 0.12, you may encounter problems during verification. You can select a compatible version of Hue Based on the Hive version to install the configuration.
Due to this installation and configuration practice, the CDH Software Package released by Cloudera is not used. It may be smoother if CDH is used.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More