BT seed backend manager for DHT magnetic search using C # + database design (open source)

Source: Internet
Author: User

Directly on the program interface, to understand what the overall workflow is like, for the server to hang up the test, the need for fixed IP, space about 10G (mainly BT seeds occupy space is too large), it is best to have SQL Server as a storage database, the current use of Access database as a test, Afraid of late data over million, the site query operation is very slow.

If the program runs long enough, basically the seeds on the network will come over, the equivalent of the search artifact.

Open Source Address: https://github.com/h31h31/H31DHTMgr

Program Download: H31DHT download

Also provides access patterns for ASP websites:

Status in the running:

This article mainly introduces the DHT magnetic search hash file processing operation process.

The background processing program mainly uses the C # inside reads the file class to carry on the reading, at present the file format divides into two kinds, one is from Http://torrage.com/sync downloads the file to carry on the processing,

The other is the use of C + + program to the hash file in the DHT network to search for storage of custom files,

There is a hash value and time IP information, you can use these IP values to analyze the other computer on the other side of the BT files, of course, the hash value corresponding to the file corresponding to the current active in the network files, and the above site download fixed file values are different,

With these IP values, individuals can also do some behavioral analysis work, such as which city IP is downloading what type of file.

The following is a description of the database design work:

1. Initially consider that SQL Server will have an impact on the speed of their own computers, do not install SQL Server database, using Access for processing operations;

2.ACCESS database The size of each table is best controlled not more than 4G, so the design of no more than 100 million data per table;

3. After parsing the BT seed file, the file name is extracted and stored in different tables according to the file type, mainly divided into 6 categories, movies, music, pictures, bookcase, programs, other classes,

4. Due to the different language in the BT seeds, there are Chinese, English, Japanese, Korean and other languages, for the search interface if all stored in a piece, there is no problem, but will affect the query speed, because the Chinese people generally like to query in Chinese, if you want to check the Japanese, the choice of options, This will greatly improve all of the table queries, because each table's files are basically for several languages;

5. For the list of files in the BT seed directly using 100,000,001 tables, if more than the direct storage of the second table, because the main table has stored its own file list in which table key field;

6. In addition to the database information is more rigorous, because the seed file contains a lot of advertising information, such as the video seeds often have web site url,txt,mht and other information links, the program after preliminary judgment directly not stored in the File table database, occupy the database space, affect the query speed, In addition, the query to show the list is not good to see.

7. For a seed there are often >200 multiple seed files are not stored, a seed has a lot of files is also a waste of space, and this seed preservation is basically meaningless, direct pass;

8. The filtering measures taken in the file name contain the information of the website, which is very helpful for the query.

database table Design list:

Table that stores the name of the DHT file:

To store the seed file list:

--------------------------let's introduce how DHT works--------------------------------

The DHT network is essentially a network for querying, which is used to query which computers are downloading a resource. Each resource has a 20-byte-length ID for the indicator, called Infohash. When a program joins the network as a DHT node, there will be other nodes to query you, and when you respond, the other person will record you. The other party will also ask the other node, when the other side starts to download the infohash corresponding resources, he will tell all the nodes that have been asked, including you. At this point, you can be sure that the corresponding resource for this infohash is valid in this network.

See what Kevin wrote about how this network works: the DHT web crawler and the Web page that wrote a magnetic search.

What can I do when I get to Infohash? The key point is that the magnetic link (magnet URL) We are using now corresponds to the Infohash. That is to get infohash, is equal to get a magnetic link. But the crawler also needs to build the information about the resources that originate from the seed files. The seed file actually corresponds to a resource, and the seed file contains information such as the resource name, description, file list, file size, and so on. When we get to Infohash, we actually get the corresponding computer address, and we can download the corresponding seed file on these computers.

After acquiring the Infohash, the corresponding seeds are downloaded directly from some Web sites that provide a mapping of the magnetic chain to the seed file service.

http://torrage.comhttps://zoink.ithttp://bt.box.n0808.com

-------------------------Our background processing program to download the seed from the above work---------------------------

The following highlights the module design of our program:

The main aspects of the 1.h31sql class are a class for initial encapsulation of access and SQL Server database operations;

2.h31debug is primarily a log output class

3.h31down is mainly the download of the BT seed file class;

4.TorrentFile class is used to parse the BT seed file class, due to the current possible BT seed format is not on, some file parsing is not on;

5. MainForm the main program interface.

------------------------------------

The following are the main processes that the program processes data:

1. Read local files and extract data using regular expressions

+ View Code

2. Download the BT seed file from the website for processing

+ View Code

3. Read file names and lists through the BT file parsing class

4. Filter out www by correct regular expression. Bbs. Website information, as this can affect search results.

+ View Code

5. Determine the file name is in English, Japanese and other information stored in a different table;

+ View Code

Open Source Address: https://github.com/h31h31/H31DHTMgr

Program Download: H31DHT download

The next article prepares an article for the study of DHT. If you recommend a higher degree, I next two days on the open source C + + written H31DHT data fetching data program, the program is written using VS2005.

The first run H31dhtmgr program may not have data, you can first download a txt file from the Http://torrage.com/sync back for processing;

The first run H31DHT Data crawler may be a long time to have a few data back, DHT network seems to fixed IP comparison like, return more information, so ADSL crawl speed will not be fast.

Because DHT gets a lot of content of the seed content with AV, so does not provide ASP Web site query code work, if provide foreign server test, can provide ASP website query code.

Want to be interested in friends together to change the background management program more perfect.

BT seed backend manager for DHT magnetic search using C # + database design (open source)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.