HADOOP 儲存圖片方案———

HADOOP 儲存圖片方案————準備工作

最後更新：2018-12-06 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

1：he HBase API cannot do positioned reads of partial byte ranges of stored objects, while the HDFS API can.

2：There are two basic ways of serving image files: storing the image in HBase itself, or storing a path to the image. HBase has successfully been used by a large-scale commercial photo sharing site for storing and retrieving images -- although they have had to carefully tune and monitor their system (see the HBase mailing list for details).

If you store your images on HDFS and only keep a path in HBase you will have to ensure that you will not have too many images as HDFS does not deal well with a lot of files (depends on the size of RAM allocated to your namenode, but there is still an upper limit).

Unless you plan on storing meta data along with each image, you may be able to get away with a very simple schema for either storing the data or the path to the image. I am imagining something like a single column family with two column qualifiers: data, and type. The data column could store either the path or the the actual image bytes. The type would store the image type (png, jpg, tiff, etc.). This would be useful for sending the correct mime type over the wire when returning the image.

3：HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist on HDFS for high-speed lookups.

4：echo ruok | nc loclhost 2181; to check zookeeper.

5:一開是我單獨運行了 zookeeper,然後start-habse時候又提示綁定zkserver 2181失敗，於是關掉zookeeper(查看2181的程式是JAVA，於是killall java).重新開啟 start-hbase沒有綁定錯誤了.（這個錯誤是再LOGS中的，命令列沒有提示任何內容）

 Apache HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process. You can also manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use. To toggle HBase management of ZooKeeper, use the HBASE_MANAGES_ZK variable in conf/hbase-env.sh. This variable, which defaults to true, tells HBase whether to start/stop the ZooKeeper ensemble servers as part of HBase start/stop.

6：Will not attempt to authenticate using SASL (unknown error)

/etc/hosts should look something like this:            127.0.0.1 localhost            127.0.0.1 ubuntu.ubuntu-domain ubuntu

本來看了官網的這個提示了的，不過當時可能是一時發神經，想不改試試看，我本就localhost 是127.0.01但，pc name是1287.0.1.1. 後來就遇到上面這個問題，wast lots of time. it's ok till changing he pc name ip.

7:PC之間時間不同步（hbase）(get from other's website, log it for funture )

FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Master rejected startup because clock is out of syncorg.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server suc-pc,60020,1363269953286 has been rejected; Reported time is too far out of sync with master.  Time difference of 39375ms > max allowed of 30000ms

　　小問題，一看就知道錯誤發生在哪。在hbase中，允許小的時間偏差，但是上面39秒的時間偏差就有點大了。如果你是連網的話，可以用ntpdate 219.158.14.130進行同步。219.158.14.130是網通北京的時間伺服器，如果不行你可以用別的伺服器進行同步。

8：https://github.com/dhardy92/thumbor_hbase

https://github.com/globocom/thumbor/wiki

Thumbor is a smart imaging service. It enables on-demand crop, resizing and flipping of images.

HBase is a column oriented database from the Hadoop ecosystem.

This module provide support for Hadoop HBase as a large auto replicant key/value backend storage for images in Thumbor.

9：http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS

這裡有篇討論挺好的。I am using HBase to store a few things, one is the meta information on the data that is stored (PDFs, images, movies etc.) and also the binary location. I am writing the files as they are uploaded directly to HDFS in separate files or into one file if indicated by the user. I use an implicit batch number for the upload. A user can ask for a new explicitly and then use then that ID to upload many objects and in the end call commit(batchId). In this mode I am writing the objects into one HDFS file.

10：http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-td4036184.html

這裡已有個討論，JACK已經配置過HBASE儲存圖片並運行了2年，幾乎沒有發生過錯誤。得仔細看看。

We stored about 1 billion images into hbase with file size up to 10MB.  Its been running for close to 2 years without issues and serves delivery of images for Yfrog and ImageShack.  If you have any questions about the setup, I would be glad to answer them.

i have a better idea for you copy your image files to a single file on hdfs, and if new image comes append it to the existing image, and keep and update the metadata and the offset to the HBase. Because if you put bigger image in hbase it wil lead to some issue. HDFS reads are faster than HBase, but it would require first hitting the index in HBase which points to the file and then fetching the file. It could be faster... we found storing binary data in a sequence file and indexed on HBase to be faster than HBase, however, YMMV and HBase has been improved since we did that project....

11:hadoop java程式感覺開發比較麻煩，應為要產生JAVA，於是我用ECLISPE的外掛程式工具FAT JAR

運行hadoop -jar test.jar hdfs://localhost/user/root/hello.txt, 報錯，發現程式在試著連結localhost/127.0.0.1:8020. Already tried 1 time(s).但我的core-site.xml中fs配置連接埠是9000.所以改成hadoop -jar test.jar hdfs://localhost:9000/user/root/hello.txt。運行成功。\

12:怎麼在本地機子上（我是在本地開發hadoop程式），伺服器上運行hadoop和habse的。

如果要在本地直接運行HADOOP程式，並動作伺服器上的HDFS和HBASE。

　　1:要在本地安裝HADOOP和HBASE。說是安裝其實就是下載HADOOP和HBASE程式，直接解壓就行了。

　　用ECLIPSE開發的HADOOP和HBASE程式時，匯入HADOOP和HBASE中的LIB檔案。

　　2：修改HADOOP 和HBASE中的設定檔，例如：

　　　　core-site.xml

　　這裡的 hadoopinokpc:9000是伺服器的core-site.xml配置，我這裡在/etc/host中綁定了hadoopinokpc為伺服器的IP

感覺就像解壓的HADOOP只是個工具，本地HADOOP程式運行時，已會讀取這個設定檔去連結HDFS。

<configuration>        <property>                <name>fs.default.name</name>                <value>hdfs://hadoopinokpc:9000</value>        </property></configuration>

　　mapred-site.xml

<configuration>        <property>                <name>mapred.job.tracker</name>                <value>hadoopinokpc:9001</value>        </property></configuration>

　　hdfs-site.xml

<configuration>        <property>                <name>dfs.replication</name>                <value>1</value>        </property></configuration>

　　hbase-site.xml

2了，在代碼裡面可以直接修改設定檔的。比如下面這樣，就不用管本地的配置是什麼鳥了。

        config.set("hbase.zookeeper.quorum", "hadoopinokpc");        config.set("hbase.zookeeper.property.clientPort","2181");

13.java.net.ConnectException: Connection refused: no further information

org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:60000
伺服器上master地址和localhost/127.0.0.1:60000對不上。
查看http://192.168.3.206:60010/master-status的master地址。

14.怎樣設定master 和regionserver ip地址。 MASTER AND regionserver must be set by DNS.

  <property>    <name>hbase.rootdir</name>    <value>hdfs://hadoopinokpc.inoknok.com:9000/hbase</value>  </property>  <property>    <name>hbase.zookeeper.property.dataDir</name>    <value>hdfs://hadoopinokpc.inoknok.com:9000/zookeeper</value>  </property>  <property>    <name>hbase.zookeeper.quorum</name>    <value>192.168.0.29</value>  </property>  <property>        <name>hbase.zookeeper.property.clientPort</name>        <value>2181</value>  </property>  <property>        <name>hbase.master.port</name>        <value>60000</value>  </property>  <property>        <name>hbase.regionserver.port</name>        <value>60020</value>  </property>   <property>        <name>hbase.master.dns.interface</name>        <value>eth0</value>  </property>  <property>        <name>hbase.regionserver.dns.interface</name>        <value>eth0</value>  </property> <property>        <name>hbase.master.dns.nameserver</name>        <value>192.168.0.254 </value>  </property>  <property>        <name>hbase.regionserver.dns.nameserver</name>        <value>192.168.0.254 </value>  </property>  <property>        <name>hbase.cluster.distributed</name>        <value>true</value>  </property>  <property>        <name>hbase.config.read.zookeeper.config</name>        <value>false</value>  </property>

15；ERROR: org.apache.hadoop.hbase.exceptions.MasterNotRunningException: java.io.IOException: Can't get master address from ZooKeeper; znode data == null

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HADOOP 儲存圖片方案————準備工作

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support