1. Hbase is a subproject of Hadoop. Download the appropriate Hbase version here. Note: versions of Hadoop and Hbase cannot be used randomly. Therefore, you need to know whether to work together before deploying the service. Otherwise, it will be a waste of time. Here we use hadoop 0.20.2 and Hbase 0.20.6. The hadoop co
Version: 0.94-cdh4.2.1hbase-site.xml configuration Hbase.tmp.dir
The local file system TMP directory, generally configured as local mode settings, but it is best to set up, because many files will be set by default to the following
On-line configuration
property>name>hbase.tmp.dirname>value>/mnt/dfs/11/hbase/hbase-tmpvalue>property>
Default v
HBase entry, hbaseI. Basic concepts of HBase
1. Row key
The Row primary key can only rely on the Row key when querying HBase. HBase does not support conditional queries and other query methods similar to some mainstream databases. Reading records only depends on the Row primary key and performs global scanning, you can
#!/usr/bin/env Bash
# No Comments made
#/**
# * Licensed to the Apache software Foundation (ASF) under one
# * or more contributor license agreements. See the NOTICE file
# * Distributed with the work for additional information
# * regarding copyright ownership. The ASF licenses this file
# * To you under the Apache License, Version 2.0 (the
# * "License"); Except in compliance
# * with the License. Obtain a copy of the License at
# *
# * http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless
, when you use Mr to process a large number of small files, you generate too much maptask, and the thread management overhead increases the job time. For example, processing 10000M files, if each split is 1M, there will be 10,000 maptasks, there will be a lot of thread overhead, if each split is 100M, then only 100 maptasks, each maptask will have more things to do, The management overhead of threads will also be much reduced.Improved Strategy:There are a number of ways to get
Transferred from: http://www.cnblogs.com/nexiyi/p/hbase_config_94.htmlThe goal is to look at the configuration of the production environment and the default configuration.Hbase.hregion.max.filesize:100ghbase.regionserver.hlog.blocksize:512mhbase.regionserver.maxlogs:32.............Version: 0.94-cdh4.2.1hbase-site.xml configuration Hbase.tmp.dir
The local file system TMP directory, generally configured as local mode settings, but it is best to set up, because many files will be set by de
databases) and upload it to HDFS
This step is beyond HBase's consideration. No matter what the data source is, you only need to upload the data to HDFS before proceeding to the next step.
2. Prepare data using a MapReduce job
This step requires a MapReduce job, and in most cases we need to write the Map function by ourselves. The Reduce function does not need to be considered and is provided by
Hive was born to simplify the writing of the MapReduce program, and the people who did the data analysis with MapReduce knew that many of the analysis programs were essentially the same, except for the different business logic. In this case, a user programming interface such as Hive is required. Hive itself does not store and calculate data, it relies entirely on the table's pure logic in HDFs and mapreduce,hive, the definition of tables, and the meta
HBase is a distributed database built on HDFS and is the model of the HBase table:HBase This database actually has a lot of similarities with traditional relational databases, rather than mongodb,memcached and Redis completely out of the table concept, but hbase is a central database, whereas traditional relational dat
example above is subject to change, and business generally requires only the most recent values, but sometimes it may be necessary to query to historical values.Very large data volumeWhen the data volume is getting larger, the RDBMS database can't hold up, there is a read-write separation strategy, through a master dedicated to write operations, multiple slave responsible for read operations, server cost multiplier. As the pressure increases, master can't hold up, at this time to separate the l
the HDFS file system.
HRegionServer manages HRegion objects in some columns;
Each HRegion corresponds to a Region in the Table. HRegion consists of multiple hstores;
Each HStore corresponds to the storage of a Column Family in the Table;
Column Family is a centralized storage unit. Therefore, it is more efficient to put columns with the same IO Characteristics in a Column Family.
HStore:The core of HBase
scenarios and operational Records
Around 10:50, received operation and maintenance personnel notice, hbase cluster B all nodes down, the following records to restore all operations of the cluster.
Login to HBase ui:http://192.168.3.146:60010/, unable to log inLogin to HBase Shell to view:
>status ' Simple '
5 dead servers
All Regionserver really hung up and pu
. If you are unable to assign from the. META. Read region information in table;4. When adding a new HBase cluster to a running HBase cluster, if the ZK/hbase/unassigned node has no data;5. When using the thread pool to bulk allocate region, if an uncaught exception occurs, the implementation is as follows:6. An exception occurred while starting the service thread
HDFS was designed to store large files (such as log files) for batch processing and sequential IO. However, the original intention of the HBase design built on HDFS is to solve the random read/write requests of massive data. How can we combine the two components with the opposite original intention? This hierarchical structure design mainly aims to make the archi
Problem Description
Use Hadoop fs–du–h to view the size of the HDFs directory, find that the actual size and backup size is not proportional, the copy is 3, the theory should be 3 times times the relationship, as follows:Problem Analysis
Fsck checks for HDFs and discovers that the replica is missing
A partial replica is found to be missing, forcing the number of copies above to be set to 3;
Problem So
My nonsense: This article provides sample code, but does not describe the details of mapreduce on the HBase code layer. It mainly describes my one-sided understanding and experience. Recently, we have seen Medialets (Ref) share their experience in using MapReduce in the website architecture. HDFS is used as the basic environment for MapReduce distributed computing.
My nonsense: This article provides sample
Deploy Hbase in the Hadoop cluster and enable kerberos
System: LXC-CentOS6.3 x86_64
Hadoop version: cdh5.0.1 (manmual installation, cloudera-manager not installed)
Existing Cluster Environment: node * 6; jdk1.7.0 _ 55; zookeeper and hdfs (HA) installed), yarn, historyserver, and httpfs, and kerberos is enabled (kdc is deployed on a node in the cluster ).
Package to be installed: All nodes> yum install
any KIND, either express OR implied. * See the License for the specific language governing permissions and * limitations under the License. */ -Configuration> Property> name>Hbase.rootdirname> value>Hdfs://master:9000/hbasevalue> Property> Property> name>hbase.cluster.distributedname> value>Truevalue> Property> Property> name>Hbase.zookeeper.quorumname> value>Master,slave1,slave2value>
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.