HADOOP 儲存圖片方案————準備工作

來源:互聯網
上載者:User

1:he HBase API cannot do positioned reads of partial byte ranges of stored objects, while the HDFS API can.

2:There are two basic ways of serving image files: storing the image in HBase itself, or storing a path to the image. HBase has successfully been used by a large-scale commercial photo sharing site for storing and retrieving images -- although they have had to carefully tune and monitor their system (see the HBase mailing list for details).

If you store your images on HDFS and only keep a path in HBase you will have to ensure that you will not have too many images as HDFS does not deal well with a lot of files (depends on the size of RAM allocated to your namenode, but there is still an upper limit).

Unless you plan on storing meta data along with each image, you may be able to get away with a very simple schema for either storing the data or the path to the image. I am imagining something like a single column family with two column qualifiers: data, and type. The data column could store either the path or the the actual image bytes. The type would store the image type (png, jpg, tiff, etc.). This would be useful for sending the correct mime type over the wire when returning the image.

3:HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist on HDFS for high-speed lookups.

4:echo ruok | nc loclhost 2181;    to check zookeeper.

5:一開是我單獨運行了 zookeeper,然後start-habse時候又提示綁定zkserver 2181失敗,於是關掉zookeeper(查看2181的程式是JAVA,於是killall java).重新開啟 start-hbase沒有綁定錯誤了.(這個錯誤是再LOGS中的,命令列沒有提示任何內容)

 Apache HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process. You can also manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use. To toggle HBase management of ZooKeeper, use the HBASE_MANAGES_ZK variable in conf/hbase-env.sh. This variable, which defaults to true, tells HBase whether to start/stop the ZooKeeper ensemble servers as part of HBase start/stop.

 

6:Will not attempt to authenticate using SASL (unknown error)

/etc/hosts should look something like this:            127.0.0.1 localhost            127.0.0.1 ubuntu.ubuntu-domain ubuntu

本來看了官網的這個提示了的,不過當時可能是一時發神經,想不改試試看,我本就localhost 是127.0.01但,pc name是1287.0.1.1. 後來就遇到上面這個問題,wast lots of time.  it's ok till changing he pc name ip.

7:PC之間時間不同步(hbase)(get from other's website, log it for funture )

FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Master rejected startup because clock is out of syncorg.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server suc-pc,60020,1363269953286 has been rejected; Reported time is too far out of sync with master.  Time difference of 39375ms > max allowed of 30000ms

  小問題,一看就知道錯誤發生在哪。在hbase中,允許小的時間偏差,但是上面39秒的時間偏差就有點大了。如果你是連網的話,可以用ntpdate 219.158.14.130進行同步。219.158.14.130是網通北京的時間伺服器,如果不行你可以用別的伺服器進行同步。

 

8:https://github.com/dhardy92/thumbor_hbase

https://github.com/globocom/thumbor/wiki

Thumbor is a smart imaging service. It enables on-demand crop, resizing and flipping of images.

HBase is a column oriented database from the Hadoop ecosystem.

This module provide support for Hadoop HBase as a large auto replicant key/value backend storage for images in Thumbor.

9:http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS

這裡有篇討論挺好的。I am using HBase to store a few things, one is the meta information on the data that is stored (PDFs, images, movies etc.) and also the binary location. I am writing the files as they are uploaded directly to HDFS in separate files or into one file if indicated by the user. I use an implicit batch number for the upload. A user can ask for a new explicitly and then use then that ID to upload many objects and in the end call commit(batchId). In this mode I am writing the objects into one HDFS file.

10:http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-td4036184.html

這裡已有個討論,JACK已經配置過HBASE儲存圖片並運行了2年,幾乎沒有發生過錯誤。得仔細看看。

 

We stored about 1 billion images into hbase with file size up to 10MB.  Its been running for close to 2 years without issues and serves delivery of images for Yfrog and ImageShack.  If you have any questions about the setup, I would be glad to answer them.

 

i have a better idea for you copy your image files to a single file on hdfs, and if new image comes append it to the existing image, and keep and update the metadata and the offset to the HBase. Because if you put bigger image in hbase it wil lead to some issue. HDFS reads are faster than HBase, but it would require first hitting the index in HBase which points to the file and then fetching the file. It could be faster... we found storing binary data in a sequence file and indexed on HBase to be faster than HBase, however, YMMV and HBase has been improved since we did that project.... 

 

 

11:hadoop java程式感覺開發比較麻煩,應為要產生JAVA,於是我用ECLISPE的外掛程式工具FAT JAR

運行hadoop -jar test.jar hdfs://localhost/user/root/hello.txt,  報錯,發現程式在試著連結localhost/127.0.0.1:8020. Already tried 1 time(s).但我的core-site.xml中fs配置連接埠是9000.所以改成hadoop -jar test.jar hdfs://localhost:9000/user/root/hello.txt。運行成功。\

12:怎麼在本地機子上(我是在本地開發hadoop程式),伺服器上運行hadoop和habse的。

如果要在本地直接運行HADOOP程式,並動作伺服器上的HDFS和HBASE。

  1:要在本地安裝HADOOP和HBASE。說是安裝其實就是下載HADOOP和HBASE程式,直接解壓就行了。

     用ECLIPSE開發的HADOOP和HBASE程式時,匯入HADOOP和HBASE中的LIB檔案。

  2:修改HADOOP 和HBASE中的設定檔,例如:

    core-site.xml

  這裡的 hadoopinokpc:9000是伺服器的core-site.xml配置,我這裡在/etc/host中綁定了hadoopinokpc為伺服器的IP

感覺就像解壓的HADOOP只是個工具,本地HADOOP程式運行時,已會讀取這個設定檔去連結HDFS。

<configuration>        <property>                <name>fs.default.name</name>                <value>hdfs://hadoopinokpc:9000</value>        </property></configuration>

  mapred-site.xml

<configuration>        <property>                <name>mapred.job.tracker</name>                <value>hadoopinokpc:9001</value>        </property></configuration>

  hdfs-site.xml

<configuration>        <property>                <name>dfs.replication</name>                <value>1</value>        </property></configuration>

  hbase-site.xml

 

2了,在代碼裡面可以直接修改設定檔的。比如下面這樣,就不用管本地的配置是什麼鳥了。
        config.set("hbase.zookeeper.quorum", "hadoopinokpc");        config.set("hbase.zookeeper.property.clientPort","2181");
13.java.net.ConnectException: Connection refused: no further information 

org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:60000 
伺服器上master地址和localhost/127.0.0.1:60000對不上。 
查看http://192.168.3.206:60010/master-status的master地址。

14.怎樣設定master 和regionserver ip地址。 MASTER AND  regionserver must be set by DNS.
  <property>    <name>hbase.rootdir</name>    <value>hdfs://hadoopinokpc.inoknok.com:9000/hbase</value>  </property>  <property>    <name>hbase.zookeeper.property.dataDir</name>    <value>hdfs://hadoopinokpc.inoknok.com:9000/zookeeper</value>  </property>  <property>    <name>hbase.zookeeper.quorum</name>    <value>192.168.0.29</value>  </property>  <property>        <name>hbase.zookeeper.property.clientPort</name>        <value>2181</value>  </property>  <property>        <name>hbase.master.port</name>        <value>60000</value>  </property>  <property>        <name>hbase.regionserver.port</name>        <value>60020</value>  </property>   <property>        <name>hbase.master.dns.interface</name>        <value>eth0</value>  </property>  <property>        <name>hbase.regionserver.dns.interface</name>        <value>eth0</value>  </property> <property>        <name>hbase.master.dns.nameserver</name>        <value>192.168.0.254 </value>  </property>  <property>        <name>hbase.regionserver.dns.nameserver</name>        <value>192.168.0.254 </value>  </property>  <property>        <name>hbase.cluster.distributed</name>        <value>true</value>  </property>  <property>        <name>hbase.config.read.zookeeper.config</name>        <value>false</value>  </property>

 

15;ERROR: org.apache.hadoop.hbase.exceptions.MasterNotRunningException: java.io.IOException: Can't get master address from ZooKeeper; znode data == null 

 

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.