Want to know hadoop cluster configuration best practices? we have a huge selection of hadoop cluster configuration best practices information on alibabacloud.com
-Deleting a document in ES will not immediately remove it from the hard drive, it will only mark that the document is deleted, Lucene produces a. del file, and during the retrieval process the file will be retrieved only at the end of the filter, which in fact will affect the efficiency, we can periodically delete these files, As with the merged index fragment, you can use Curl
Curl-xpost Http://localhost:9200/_optimize?only_expunge_deletes=true
Mac i
function returns null when an error occurs in host parsing. This causes NullPointerException to be thrown when the host string is used later. The code assigned to the host is java.net. URI. Parser. parseHostname (int, int ). If you are interested, take a look.
Comments are provided here for you to see.// Hostname = domainlabel [". "] | 1 * (domainlabel ". ") toplabel [". "] // domainlabel = alphanum | alphanum * (alphanum |"-") alphanum // toplabel = alpha | alpha * (alphanum | "-") alphanu //
, participates in an election vote, and is likely to become a primary node.Passive: A full copy of the data is stored, participating in the voting, not being a primary node.Arbiter: The quorum node, which only participates in voting, does not receive replicated data, and cannot become a primary node.A Repica sets node number is preferably odd (odd).2. ParameterThere are two parameters for the replica set:–replset, the name of the replica set.–oplogsize, the size of the operation log, in megabyte
rationale is the same, but here is a list of host names that are forbidden to access the NN. This is useful for removing the DN from the cluster.
Dfs.max.objects
0
The number of Dfs maximum concurrent objects, the files in HDFs, and the directory blocks are considered to be an object. 0 means no Limit
Dfs.replication.interval
3
NN computes the internal interval of the copied block, usually without writi
SolrCloud + Tomcat + Zookeeper cluster configuration
Overview:
SolrCloud is a distributed search solution based on Solr and Zookeeper. Its main idea is to use Zookeeper as the cluster configuration information center.
It has several special features:
1) centralized configuration
access the NN. This is useful for removing the DN from the cluster.
Dfs.max.objects
0
The number of Dfs maximum concurrent objects, the files in HDFs, and the directory blocks are considered to be an object. 0 means no Limit
Dfs.replication.interval
3
NN computes the internal interval of the copied block, usually without writing to the configuration file. The default
get (String) methods. Method gets the Properties object for the current configuration object, and if the object is empty, the Loadresources (Properties,arraylist,boolean) method is invoked to load the configuration information. The other get and set methods that follow are to operate on the configuration information by calling the Got (String) and set (String,st
HDFS super permission group is supergroup. the user who starts hadoop is usually superuser.
DFS. Data. dir
/Opt/data1/HDFS/data,/Opt/data2/HDFS/data,/Opt/data3/HDFS/data,...
Real datanode data storage path. Multiple hard disks can be written and separated by commas (,).
DFS. datanode. Data. dir. perm
755
The path permission of the local folder used by datanode. The default value is 755.
DFS. Replication
3
1. dfs. hosts records the list of machines that will be added to the cluster as datanode2. mapred. hosts records the list of machines that will be added to the cluster as tasktracker3. dfs. Hosts. Exclude mapred. Hosts. Exclude contains the list of machines to be removed.4. The master record the list of machines that run the auxiliary namenode.5. Slave records the list of machines running datanode and taskt
1 Getting the default configuration
Configure Hadoop, which is primarily configured with Core-site.xml,hdfs-site.xml, Mapred-site.xml three profiles, by default, these profiles are empty, so it is difficult to know which configuration of these profiles can take effect, and the configuration on the Internet may not be
Before installing hive, you need to configure the JDK installation and set up the Hadoop cluster.
Install configuration jdk, Hadoop here is a bit, jdk1.7, Hadoop 2.6.0.
system Environment, VMware opened three virtual machines.
Node1
Node2
Node3
website Download hive pac
Hive
using the System
CentOS 6.5 cluster environment
Components
version
Hadoop
2.6.0
Zookeeper
3.4.6
Hive
1.1.0
Hive Configuration
The decompression is not explained here.
Go to Hive root directory
cd/home/hadoop/development/src/hive-1.1
Brief introduction:Zookeeper is a distributed, open source distributed application Coordination Service that is an open source implementation of Google's chubby and an important component of Hadoop and HBase. It is a software that provides consistent services for distributed applications, including configuration maintenance, domain name services, distributed synchronization, group services, and so on.The go
The version of Hadoop that my cluster uses is hadoop-1.1.2. The corresponding eclipse version is also:hadoop-eclipse-plugin-1.1.2_20131021200005(1) Create a Hadoop-plugin folder under Eclipse's Dropins folder and put the plugin inside. Restart Eclipse again, open the view and the MapReduce view will appear(2) Configure
Cluster configuration is similar, here I briefly say my configuration:The master node system is Ubuntu 14.04 LTS x64 other two nodes in the VM system for the CentOS 6.4 x64JVM is jdk1.7_80Hadoop versions 2.7.1 and 2.7.2 have triedThe problems that arise are:Start HDFs system OK, all started up, JPS see the followingMaster Nodes Secondarynamenode and NameNodeFrom node: DataNodeBut using the Hfds command Dfsa
Services:haddoop components that can be deployed on cluster, such as Hdfs,yarn,hbase.Roles: When the service is configured, it is created by Cloudera Manager. For example, Namenode is a role of the HDFs service.Role group: The management of role can divide the same category of roles (such as datanode) into different role groups. Each role group can have its own series of configurations.Role Instance: A single instance (which can be considered a proces
because too many Java objects are stored in the memory. If the executor has too many cache objects in the JVM, for example, 80 GB up, how many times of full GC occurs at this time, and you will wonder why my program does not move? You can check the GC log. It was originally in GC.
3. Implement Fault Tolerant tachyon Cluster Based on zookeeper
3.0 prerequisites
Hadoop version: 2.2.0.2.0.6.0-101
Zookeeper
Author: those things |ArticleCan be reproduced. Please mark the original source and author information in the form of a hyperlink
Web: http://www.cnblogs.com/panfeng412/archive/2013/03/22/hadoop-capacity-scheduler-configuration.html
Refer to capacity scheduler guide and summarize the configuration parameters of capacity scheduler based on your practical experience. Most of the parts marked as red below
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.