Installation Environment
Version 2.1.0 corresponds to CDH5.3.0
Impala is a CDH component, and the other Hadoop environment (HDFS, yarn, hive) is ready to install directly through Yum, where download address Impala downloads
Installation content:
The installed user is: root
Hdname (Hive metadata node resides)
Impala Impala-server Impala-state-store Impala-catalog Impala-shell
Other nodes
Impala-server Impala-shell Permissions configuration (all machines should be)
Users and groups named Impala are created during Impala installation and do not delete the users and groups.
If you want Impala and YARN to work together, you need to add Impala users to the HDFs group, and the relevant llama project is known.
Impala when performing a DROP TABLE operation, the file needs to be moved to the HDFs Recycle Bin, so you need to create an HDFs directory/user/impala and set it to Impala user writable. Similarly, Impala needs to read the data under the Hive Data Warehouse, so it needs to add Impala users to the Hive group. Add Satellite Group command
Usermod-g Hive,hdfs,hadoop Impala
The result is as shown
Create a directory of Impala on HDFs and set permissions
Sudo-u HDFs Hadoop fs-mkdir/user/impala
sudo-u hdfs Hadoop fs-chown Impala/user/impala
Set Scoket path
Create a/var/run/hadoop-hdfs on each node
Kdir-p/var/run/hadoop-hdfs
Note: The folder may already exist and you should confirm that you have permission to read and write with Impala
If it already exists, add user Impala to the group to which the file belongs, and modify the permissions for that filegroup: chmod 775/var/run/hadoop-hdfs mysql driver
Driver Download Address mysql-connector-java-5.1.30.tar.gz
Copy the downloaded file to/usr/share/java/and modify the name Mysql-connector-java.jar
Select this file path because the default path for Impala is this, you can view the parameters in/etc/default/impala Mysql_connector_jar
#配置文件设置
Configuration file exists in two places
##/etc/default/impala
This file is the default configuration for Impala, contains the machine information and the associated path configuration, including all the configuration files of JDBC, Impala, which need to be modified is the host information of Metastore and catalog two service installation, the final result is as follows, the highlighted part is the modification place, Where Hdname is the host name of the machine where the catalog and State-store components of Impala are installed, that is, the machine where the metadata MySQL resides
==impala_catalog_service_host=hdname== ==impala_state_store_host=hdname== impala_state_store_port=24000 IMPALA_ backend_port=22000 Impala_log_dir=/var/log/impala impala_catalog_args= "-log_dir=${impala_log_dir} ==-state_store_ host=${impala_state_store_host}== "impala_state_store_args="-log_dir=${impala_log_dir}-state_store_port=${ Impala_state_store_port} "impala_server_args=" \-log_dir=${impala_log_dir} \-catalog_service_host=${impala_catal Og_service_host} \-state_store_port=${impala_state_store_port} \-use_statestore \-state_store_host=${impala _state_store_host} \-be_port=${impala_backend_port} "Enable_core_dumps=true # Libhdfs_opts=-djava.library.path=/us R/lib/impala/lib # Mysql_connector_jar=/usr/share/java/mysql-connector-java.jar # IMPALA_BIN=/usr/lib/impala/sbin #
Impala_home=/usr/lib/impala # hive_home=/usr/lib/hive # hbase_home=/usr/lib/hbase # IMPALA_CONF_DIR=/etc/impala/conf # hadoop_conf_dir=/etc/impala/conf # hive_conf_dir=/etc/impala/conf
# hbase_conf_dir=/etc/impala/conf
##/etc/impala/conf
In fact, the location of this file is determined by/etc/default/impala, and some versions may be under/usr/lib/impala/conf
Hive, HDFs and core three core profiles are copied from the HDFs and hive configuration files, and if there are hbase-site.xml, copy them together.
The results are as follows:
Modify the file Hdfs-site.xml to add the following to the file
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value >/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
<name> dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property >
<property>
<name>dfs.client.file-block-storage-locations.timeout</name>
<value>10000</value>
</property>
Configure file Synchronization
Sync Impala files and conf folders to all IMPALA-SERVER nodes launch Impala
Confirm that all services have been started, including Hive, Impala, view Impala boot service can use command
Ps-ef |grep Impala
If normal, there are the following three services in Hdname, as shown in figure
You can access the relevant services through a URL or through a browser
Hostname: 25010 The default metadata is the node information,
host name: 25000 server information, any server-mounted node can access the
hostname: 25020 Catalog Information
Log in to the Impala terminal using the command Impala-shell on any node and connect to the store in connect Hdname;
Execute command invalidate metadata update metadata
The result is as shown
Error Resolution Reference Error 1
Path Permission Issues
Error connecting:ttransportexception, Could not connect to master:21000
View Impalad in the log file/var/log/impala. Error, errors are as follows
Error:short-circuit Local reads is disabled because
-Dfs.domain.socket.path are not configured.
-Dfs.client.read.shortcircuit is not enabled.
Workaround
Find the value below the corresponding parameter to see if the path exists and whether the Impala user has permission to read and write errors 2
Invalidate Metadata Update metadata error
Error code
Query:invalidate metadata
error:couldn ' t Open Transport for hdname:26000 (connect () Failed:connection refused)
View log information with the following error, unable to read data block information on Datanode
I0507 10:03:36.218281 21562 blockstoragelocationutil.java:177] Failed to query block locations on Datanode 192.168.73.16: 50020:org.apache.hadoop.ipc.remoteexception (java.lang.UnsupportedOperationException): datanode# Gethdfsblocksmetadata is not enabled with Datanode config
at Org.apache.hadoop.hdfs.server.datanode.DataNode.getHdfsBlocksMetadata (datanode.java:1547)
Workaround
Note: The port used here is said to be 26000, in fact, the entire configuration process does not have any port set to 26000, while this error occurs at the login terminal and after a successful connection, which means that this is related to metadata
Workaround:
Configuration item Impala_catalog_args in/etc/default/impala, need to add parameter-state_store_host=${impala_state_store_host}
The results are as follows
This parameter tells the catalog to which host to find the metadata, in fact, in the State-store configuration also has the State_store_host parameter reset