In Hadoop's MR operation, Hbase can be used as the input data source for calculation. The following describes how to use Hbase as the HTable iterator Scan: publicvoidsetBatch (intbatch) publicvoidsetCaching (intcaching) publicvoidsetCacheBlocks (booleancacheBlocks) publicvoidsetB
In Hadoop's MR operation, Hbase can be used as the input data source for calculation. The following are some tips for using Hbase as the HTable iterator Scan: public void setBatch (int batch) public void setCaching (int caching) public void setCacheBlocks (boolean cacheBlocks) public void setB
In Hadoop's MR operation, Hbase can be used as the input data source for calculation. As an HTable iterator, Scan has several usage skills.
The method involved is as follows:
public void setBatch(int batch)public void setCaching(int caching)public void setCacheBlocks(boolean cacheBlocks)
Public void setBatch (int batch ):
To set the number of columns to retrieve records, the default value is unlimited, that is, all columns are returned.
Public void setCaching (int caching ):
The number of lines read from the server each time. The default value is set in the configuration file.
Public void setCacheBlocks (boolean cacheBlocks ):
This parameter indicates whether a block is cached. The default cache is used. Three methods are available: memory, cache, and disk. Generally, data is read from memory-> cache-> disk. When MR is used, data is non-hotspot, therefore, no cache is required.
Therefore, it is best to set MR as follows:
Scan. setCacheBlocks (false); scan. setCaching (200); // memory usage is high, but rpc does not scan. setBatch (6); // The column you need
?
Existing
0People comment, slam->
Here<-Participate in the discussion
ITeye recommendation
- -Software talents free of language and low guarantee paid study in the United States! -
Original article address: Tips for using Hbase Scan in MR. Thank you for sharing it with me.