Hadoop源碼分析—-Client的open、seek和read操作

最後更新：2018-12-04 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

hadoop雖然沒有提供POSIX那樣的操作，但是提供的基本的檔案操作open，create，delete，write，seek，read還是令使用者可以方便的操作檔案。下面是一段尋常的hadoop開啟檔案並且讀取檔案內容的代碼：

hdfs=hdfsPath.getFileSystem(conf);inFsData=hdfs.open(p);inFsData.seek(place);inFsData.readLong();

hdfs是FileSystem的執行個體，FileSystem是一個抽象類別，根據conf中url的內容，返回的hdfs可能是本地檔案系統的執行個體，也可能是Distributed File System的執行個體。hadoop檔案操作的實際類是DistributedFileSystem

下面來看一下DistributedFileSystem的open操作：

  public FSDataInputStream open(Path f, int bufferSize) throws IOException {    statistics.incrementReadOps(1);    return new DFSClient.DFSDataInputStream(          dfs.open(getPathName(f), bufferSize, verifyChecksum, statistics));  }

可以看出open操作是返回一個FSDataInputStream的輸入資料流，open裡面產生了DFSClient中內部類DFSDataInputStream的對象，對象的其中參數是DFSClent的open函數傳回值下面是DFSClient的open函數

  public DFSInputStream open(String src, int buffersize, boolean verifyChecksum,                      FileSystem.Statistics stats      ) throws IOException {    checkOpen();    //    Get block info from namenode    return new DFSInputStream(src, buffersize, verifyChecksum);  }

這個open函數返回的是DFSInputStream對象，下面是DFSInputStream的建構函式：

    DFSInputStream(String src, int buffersize, boolean verifyChecksum                   ) throws IOException {      this.verifyChecksum = verifyChecksum;      this.buffersize = buffersize;      this.src = src;      prefetchSize = conf.getLong("dfs.read.prefetch.size", prefetchSize);      openInfo();    }

下面是DFSInputStream的openInfo函數，這個函數式整個open系列的核心操作。

 synchronized void openInfo() throws IOException {      LocatedBlocks newInfo = callGetBlockLocations(namenode, src, 0, prefetchSize);      if (newInfo == null) {        throw new FileNotFoundException("File does not exist: " + src);      }      // I think this check is not correct. A file could have been appended to      // between two calls to openInfo().      if (locatedBlocks != null && !locatedBlocks.isUnderConstruction() &&          !newInfo.isUnderConstruction()) {        Iterator<LocatedBlock> oldIter = locatedBlocks.getLocatedBlocks().iterator();        Iterator<LocatedBlock> newIter = newInfo.getLocatedBlocks().iterator();        while (oldIter.hasNext() && newIter.hasNext()) {          if (! oldIter.next().getBlock().equals(newIter.next().getBlock())) {            throw new IOException("Blocklist for " + src + " has changed!");          }        }      }      updateBlockInfo(newInfo);      this.locatedBlocks = newInfo;      this.currentNode = null;    }

其中callGetBlockLocations是通過RPC和namenode通訊來訪問該檔案的前prefetchSize個塊（設定檔裡的，預設為10）。把這10個塊的位置存放在這個流中。後面有一個updateBlockInfo函數是選最後一塊的datanode的資訊與namenode上的資訊做比較，若不一致，則遵從datanode上的資訊（因為namenode和datanode上的資訊可能存在不一致）。

然後的seek和read函數都是針對於stream的。下面看下DFSInputStream的seek函數

 public synchronized void seek(long targetPos) throws IOException {      if (targetPos > getFileLength()) {        throw new IOException("Cannot seek after EOF");      }      boolean done = false;      if (pos <= targetPos && targetPos <= blockEnd) {        //        // If this seek is to a positive position in the current        // block, and this piece of data might already be lying in        // the TCP buffer, then just eat up the intervening data.        //        int diff = (int)(targetPos - pos);        if (diff <= TCP_WINDOW_SIZE) {          try {            pos += blockReader.skip(diff);            if (pos == targetPos) {              done = true;            }          } catch (IOException e) {//make following read to retry            LOG.debug("Exception while seek to " + targetPos + " from "                      + currentBlock +" of " + src + " from " + currentNode +                       ": " + StringUtils.stringifyException(e));          }        }      }      if (!done) {        pos = targetPos;        blockEnd = -1;      }    }

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop源碼分析—-Client的open、seek和read操作

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support