Hue installation and configuration practices

Source: Internet
Author: User
Tags sqoop
Hue is an open-source ApacheHadoopUI system. It was first evolved from ClouderaDesktop and contributed to the open-source community by Cloudera. It is implemented based on the PythonWeb framework Django. By using Hue, we can interact with the Hadoop cluster on the Web Console of the browser to analyze and process data, such as operating data on HDFS and running Ma

Hue is an open-source Apache Hadoop UI system. It was first evolved from Cloudera Desktop and contributed to the open-source community by Cloudera. It is implemented based on the Python Web framework Django. By using Hue, we can interact with the Hadoop cluster on the Web Console of the browser to analyze and process data, such as operating data on HDFS and running Ma

Hue is an open-source Apache Hadoop UI system. It was first evolved from Cloudera Desktop and contributed to the open-source community by Cloudera. It is implemented based on the Python Web framework Django. By using Hue, we can interact with the Hadoop cluster on the Web Console of the browser to analyze and process data, such as operating data on HDFS and running MapReduce Job. I have heard of the convenience and power of Hue for a long time and have never tried it myself. Next, let's take a look at the feature set supported by Hue through the official website:

  • By default, session data is managed based on lightweight sqlite databases. user authentication and authorization can be customized to MySQL, Postgresql, and Oracle.
  • Access HDFS based on File Browser
  • Develop and run Hive query based on Hive Editor
  • Supports Solr-based search applications, and provides visual data views and dashboards (Dashboard)
  • Supports interactive queries for Impala-based applications
  • Supports Spark editor and Dashboard)
  • Supports Pig editor and submits script tasks.
  • The Oozie editor is supported. You can submit and monitor Workflow, Coordinator, and Bundle through a dashboard.
  • Supports HBase browsers to visualize data, query data, and modify HBase tables.
  • Supports the Metastore browser to access Hive metadata and HCatalog
  • Support Job Explorer with access to MapReduce Job (MR1/MR2-YARN)
  • Supports the Job designer to create MapReduce, Streaming, and Java jobs.
  • Support Sqoop 2 editor and Dashboard)
  • Supports ZooKeeper browsers and editors
  • Supports MySql, PostGresql, Sqlite, and Oracle database query editors.

Next, we verify some features of Hue through actual installation.

Environment preparation

Here, the Basic Environment and Its configuration are as follows:

  • CentOS-6.6 (Final)
  • JDK-1.7.0_25
  • Maven-3.2.1
  • Git-1.7.1
  • Hue-3.7.0 (branch-3.7.1)
  • Hadoop-2.2.0
  • Hive-1, 0.14
  • Python-1, 2.6.6

Based on the above software tools, ensure correct installation and configuration. It should be noted that we use Hue to execute Hive queries, and we need to start the HiveServer2 service:

cd /usr/local/hivebin/hiveserver2 &

Otherwise, Hive query cannot be executed through Hue Web control.

Install configurations

I have created a hadoop user. As a hadoop user, I first use the yum tool to install Hue-related dependent software:

sudo yum install krb5-devel cyrus-sasl-gssapi cyrus-sasl-deve libxml2-devel libxslt-devel mysql mysql-devel openldap-devel python-devel python-simplejson sqlite-devel

Then, run the following command to download and build the Hue package:

cd /usr/local/sudo git clone https://github.com/cloudera/hue.git branch-3.7.1sudo chown -R hadoop:hadoop branch-3.7.1/cd branch-3.7.1/make apps

If there is no problem with the above process, we have installed Hue. The Hue configuration file is/usr/local/branch-3.7.1/desktop/conf/pseudo-distributed.ini, the default configuration file does not run properly Hue, so you need to modify the content, corresponds to the Hadoop cluster configuration. The configuration file divides the configuration into multiple segments based on the integration of different software. Each segment contains sub-segments to facilitate configuration management, as shown below (the Sub-segment name is omitted ):

  • Desktop
  • Libsaml
  • Libopenid
  • Liboauth
  • Librdbms
  • Hadoop
  • Filebrowser
  • Liboozie
  • Oozie
  • Beeswax
  • Impala
  • Pig
  • Sqoop
  • Proxy
  • Hbase
  • Search
  • Indexer
  • Jobsub
  • Jobbrowser
  • Zookeeper
  • Spark
  • Useradmin
  • Libsentry

We can easily configure what we need as needed. The following table describes how to modify the configuration file:

Hue configuration section Hue configuration item Hue configuration value Description
Desktop Default_hdfs_superuser Hadoop HDFS user management
Desktop Http_host 10.10.4.125 Host/IP address of Hue Web Server
Desktop Http_port 8000 Hue Web Server Service port
Desktop Server_user Hadoop Process user running Hue Web Server
Desktop Server_group Hadoop Process User Group Running Hue Web Server
Desktop Default_user Yanjun Hue Administrator
Hadoop/hdfs_clusters Fs_defaultfs Hdfs: // hadoop6: 8020 Corresponds to the core-site.xml configuration item fs. defaultFS
Hadoop/hdfs_clusters Hadoop_conf_dir /Usr/local/hadoop/etc/hadoop Hadoop configuration file directory
Hadoop/yarn_clusters Resourcemanager_host Hadoop6 Corresponds to the yarn-site.xml configuration item yarn. resourcemanager. hostname
Hadoop/yarn_clusters Resourcemanager_port 8032 ResourceManager service port number
Hadoop/yarn_clusters Resourcemanager_api_url Http: // hadoop6: 8088 Corresponds to yarn-site.xml configuration item yarn. resourcemanager. webapp. address
Hadoop/yarn_clusters Proxy_api_url Http: // hadoop6: 8888 Configure yarn. yarn-site.xml for web-proxy.address
Hadoop/yarn_clusters History_server_api_url Http: // hadoo6: 19888 Mapreduce. jobhistory. webapp. address corresponding to the mapred-site.xml configuration item
Beeswax Hive_server_host 10.10.4.125 Node host name/IP address of Hive
Beeswax Hive_server_port 10000 HiveServer2 service port number
Beeswax Hive_conf_dir /Usr/local/hive/conf Hive configuration file directory

The content related to the Hadoop cluster and Hive (Hive is configured for the beeswax segment and interacts with Hive through HIveServer2) are configured above ).
Finally, start the Hue service and execute the following command:

cd /usr/local/branch-3.7.1/build/env/bin/supervisor &

Hue function verification

We mainly execute Hive queries on the Hue Web Console, so we need to prepare Hive-related tables and data.

  • Prepare Hive

First, create a database in Hive (Authorize if you do not have the permission ):

GRANT ALL TO USER hadoop;CREATE DATABASE user_db;

Here, the hadoop user is the management user of Hive and can grant all permissions to this user.
Create an example table. The table creation DDL is as follows:

CREATE TABLE user_db.daily_user_info (  device_type int,  version string,  channel string,  udid string)PARTITIONED BY (  stat_date string)ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'STORED AS INPUTFORMAT  'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

The format of the prepared data file is as follows:

0     3.2.1     C-gbnpk     b01b8178b86cebb9fddc035bb238876d0     3.0.7     A-wanglouko     e2b7a3d8713d51c0215c3a4affacbc950     1.2.7     H-follower     766e7b2d2eedba2996498605fa03ed330     1.2.7     A-shiry     d2924e24d9dbc887c3bea5a1682204d90     1.5.1     Z-wammer     f880af48ba2567de0f3f9a6bb70fa9620     1.2.7     H-clouda     aa051d9e2accbae74004d761ec7471100     2.2.13     H-clouda     02a32fd61c60dd2c5d9ed8a826c53be40     2.5.9     B-ywsy     04cc447ad65dcea5a131d5a993268edf

Each field is separated by a TAB. The meaning of each field corresponds to the field user_db.daily_user_info in the preceding table. Then, we load the test data to the partitions in the example table:

LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-05.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-05');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-06.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-06');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-07.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-07');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-08.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-08');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-09.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-09');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-10.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-10');LOAD DATA LOCAL INPATH '/home/hadoop/u2014-12-11.log' OVERWRITE INTO TABLE user_db.daily_user_info PARTITION (stat_date='2014-12-11');

You can log on through the Hive CLI interface to view the table data:

SELECT COUNT(1) FROM daily_user_info;

I have 241709545 records as test data.

  • Hue logon page

After the Hue service is successfully started, you can directly access http: // 10.10.4.125: 8000/in a browser to log on. To enable this function for the first time, you need to enter the Default User and password, and then you can log in, as shown in:

When you log on to the console for the first time, you can select the user as the Hue administrator. You have a high permission to add users and manage the operation permissions of users and their user groups.

  • Hue user Homepage

After successful logon, go to the Hue Web Console homepage, as shown in:

After successful logon, the system will first perform some basic environment configuration checks, which are related to which applications are specified when we actually modify the configuration.

  • Hive query editor page

After the user logs on, select the Hive menu item under Query Editors ,:

When submitting a query, because the query has been executed for a long time, you can wait for the query to be executed. The final result is displayed on the Results tab of the current room, you can also view the Hive background execution status during execution.

  • Job browser page

You can view jobs running in various States on the Hadoop cluster through the Job Browser http: // 10.10.4.125: 8000/jobbrowser, including Succeeded, Running, Failed, and Killed ,:

To view the specific Job execution status, you must correctly configure and start the JobHistoryServer and WebAppProxyServer services of the Hadoop cluster. You can view the relevant data on the Web page. Our example is as follows ,:

If you want to view the execution of a MapTask or ReduceTask corresponding to a Job, you can click the corresponding link. This is similar to the Job Web management interface of Hadoop YARN, which makes monitoring very convenient.

  • User Management and authorization

After successfully logging on as an authorized administrator user, you can click the user in the upper-right corner (yanjun here). The "Manage Users" menu item is displayed in the drop-down list, where you can create a new user, and specify the access permission, as shown in:

Above, I created several users and specified the group to which the user belongs (Groups, supports group management ). In fact, we can set different Hue applications to different groups, and then assign new users to the relevant groups. In this way, we can control the permissions of users to access Hue applications. The user who created and assigned permissions above can log on to the Hue Web management system by setting the user name and password, and interact with various Hadoop-related applications (such as MySQL and Spark.

Summary

Through the above understanding and problems encountered during the installation and configuration process, let's make a summary:

  • If you install and configure Hue Based on the CentOS environment, it may be relatively complicated and may not be easy to complete. I started to configure Based on CentOS-5.11 (Final), and the configuration was not successful, probably because the Hue version used is too high (branch-3.0 and branch-3.7.1 I have tried ), or it may be caused by problems such as the installation of some software packages on which CentOS depends. We recommend that you use a newer version of CentOS, where I use CentOS-6.6 (Final), branch-3.7.1 source code compilation for Hue, and Python requires 2.6 +.
  • With Hue, we may also be interested in user management and permission assignment. Therefore, we can use other relational databases, such as MySQL, as needed, and back up data, to prevent the loss of user data related to the Hue application and prevent problems such as the inability to access the Hadoop cluster. You need to modify the Hue configuration file and change the default storage method sqlite3 to a familiar relational database. Currently, MySQL, Postgresql, and Oracle are supported.
  • If necessary, it may combine the underlying access control mechanism of the Hadoop cluster, such as Kerberos or Hadoop SLA, with the user management and authorization authentication functions of Hue to better restrict and control access permissions.
  • Based on the Hue features we mentioned earlier, we can select different Hue applications based on our actual application scenarios. Through this plug-in configuration, we can start the application and interact with it through Hue, such as Oozie, Pig, Spark, and HBase.
  • If you use a lower version of Hive, such as 0.12, you may encounter problems during verification. You can select a compatible version of Hue Based on the Hive version to install the configuration.
  • Due to this installation and configuration practice, the CDH Software Package released by Cloudera is not used. It may be smoother if CDH is used.

Reference

  • Https://github.com/cloudera/hue
  • Https://github.com/cloudera/hue/wiki
  • Http://cloudera.github.io/hue/docs-3.5.0/manual.html
  • Http://cloudera.github.io/hue/docs-3.5.0/sdk/sdk.html

Original article address: Hue installation and configuration practices. Thanks to the original author for sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.