Web diagram basic types and storage structures in Nutch

Source: Internet
Author: User

class Node represents Web in the graph, the basic information includes: The number of links, the number of links, the score of the chain, and the meta-data. the score of the chain is obtained by dividing the chain score by the number of chains.

< Span style= "font-family: Chinese Italic" > linkdatum web

< Span style= "font-family: Chinese Italic" > linknode node two parts.

< Span style= "font-family: Chinese Italic" > loopset

< Span style= "Font-family:times New roman,serif" >web parse-data crawl-fetch ) generated, including three parts: The chain database, the chain database and the node library.

< Span style= "font-family: Chinese Italic" > set web w

the list of out-of-chain databases is w/outlinks/current ;

The old-out chain database is located in the directory W/outlinks/old ;

in the directory where the chain database is w/inlinks

the directory where the node library is W/nodes ;

The Ring database is located in the directory W/loops ;

directory where the path is w/routes ;

< Span style= "font-family: Chinese Italic" > The link dump database is in the same directory as w/linkdump

< Span style= "font-family: Chinese Italic" > The chain database is mapfile linkdatum

< Span style= "font-family: Chinese Italic" > into the chain database is mapfile linkdatum

< Span style= "font-family: Chinese Italic" > Node database is mapfile node

< Span style= "font-family: Chinese Italic" > ring database is mapfile text loopset

link dump database to MapFile , the key is a link Text , the value is Linknodes that represents the in-chain information for each link.

Web diagram basic types and storage structures in Nutch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.