EMule protocol topics

Source: Internet
Author: User
Document directory
  • High-ID login process
  • Logon denied Process
  • Information exchange at the beginning of the Connection
  • File Search
  • Callback mechanism

Source: http://www.tinydust.net/prog/diary/2005/11/emule.html
Author: tinyfool

Recently, I am very interested in P2P. In fact, P2P not only applies to the spread of piracy, but also has broad prospects for many reasonable and legitimate demands, including at least redundant information storage and content distribution, break through information supervision and so on. In the Google file system document
As mentioned above, when a new version of the application is distributed in Google, a large number of clients need to connect to the server to download the new version, which seriously affects the system operation.
They plan to solve this problem using P2P technology. Over a period of time, online game client upgrades have been using or partially using P2P solutions. These are all P2P applications in the software distribution field.
.

EMule is the most widely used P2P application, so the research started with eMule.

The most important reference text is eMule Protocol Version 1.0, the original address. The analysis process is a brief translation plus my thoughts, but my goal is not a complete translation. My thoughts and comments will be marked in red.

--------
EMule is based on the edonkey protocol. The telegraph network is composed of hundreds of electric generation servers and millions of electric generation clients. The client must connect to the server to obtain the network service. The connection must be maintained until the client is closed. The server provides centralized indexing services (similar to Napster) without communication between different servers.

A server list and a local shared file list are pre-configured for each client. The client connects to the server through a single TCP to log on to the network and obtain the desired file information and available client information. (In this way, both the electricity generator and the donkey cannot be completely decentralized, although the files are stored on the client .)
The lightning client uses hundreds of TCP connections to connect to other clients for file upload and download. Each client maintains an upload queue for its shared files. The client that is being downloaded is added to this queue
Bottom, and then gradually move forward until they reach the top of the queue to start downloading files. A client may download the same file from multiple other client hosts and obtain different parts from different clients. Customer
You can also upload part of the data of a file that is not completely downloaded. (The file can be transmitted in multiple parts, which greatly improves the efficiency, but also causes some problems. For example, after the source exits early, all clients are incomplete data .) In the end, the plug-in expands the capabilities of the plug-in to allow clients to exchange information about servers, other clients, and files. (This capability has begun to reduce the significance of the center .) Note that the communication between the client and the server is based on TCP.

The server uses an internal database to store client and file information. The telegraph server does not store any files. It is the central index of the file location information. Another function of the server is being questioned. It serves as a bridge between clients connected through the firewall. Such clients cannot accept the introduced connections. The bridge function greatly increases the server load. (This function has put the server on excessive loads, greatly reducing the server's capabilities. It should be abandoned in the design. Currently, most of the servers in the application have disabled this function, that is, two low
The client of ID cannot transmit data .) Edas uses UDP to enhance the communication between the client and the server and other clients. However, the client's ability to send and receive UDP information is not mandatory for daily client operations. Even if the Firewall prevents clients from sending and receiving UDP information, the client can still work perfectly.

Client-to-server connection
Figure 1 Electrical Network Diagram

When the client is started, TCP is used to connect to an electric Terminal Server. The server provides the client with a client ID, which is valid only during the lifecycle of the client server connection (Note that if the client is
ID, the IDS obtained on all servers are the same until the IP address changes ). After the connection is established, the client sends the list of files it shares to the server. The server saves this list
In its internal database, this database usually contains hundreds of thousands of available files and active clients. The client also sends a download list containing the files it wants to download. Customers of the electric power supply will be provided later
TCP information exchange format details between the client and the server.

After the connection is established, the telegraph server sends a list to the client, which contains the clients (these clients are called the source) that have the files required by the client ). Then, the client establishes a connection with clients that have the required files.

Note that the TCP connection of the client server is maintained throughout the Client Session. After the handshake is initialized, the transaction is mainly triggered by user activity: Sometimes the client sends a file search request to the server,
The server returns a query result. After a transaction is queried, it is usually a source query for the specified file. The query result is a list of sources that can be downloaded from the file.

The client uses UDP to communicate with a server other than the login server. The purpose of UDP information is to increase the file search capability, source search capability, and maintain the connection.

Connection between client and client

The client is generally connected to other clients (that is, the source) to download a file. A file is divided into multiple parts. The client downloads the same file from multiple clients (sources) and different parts of the file from different sources (so that different parts can be downloaded at the same time. If multiple sources exist, download efficiency will be extremely high ).

After the two clients are connected, they will exchange the capacity information and negotiate to start the download (or upload, depending on the angle of view. Each client has a download queue to store the users waiting for download.
User List. When the download peer column of the client is empty, the download request will be immediately accepted (unless the requester has been blocked ). If the column for downloading is not empty, the new download request will be placed in the queue.
. No effort will be made to serve more clients. Each download client must be at least 2. K Bytes/second. The download status of a client that is being downloaded may be a peer column level (queue
Ranking) waiting for client preemption higher than him, the queue level of the client that is being downloaded 15 minutes before the download process will increase to avoid bumps (here, the bumps mentioned here mean, A client frequently switches from the download status to the waiting status, and then switches back. This kind of frequent switching is called bumps, which is a waste of resources, so avoid them .).

When the client that is being downloaded reaches the top of the download queue, the client that provides the upload to initialize a connection to send the required file fragments to it. A client may wait for multiple source clients.
In the column, the same file segment is required to be registered on each client. When the waiting client actually downloads the file segment, it will not notify the source client of the request to delete it, but only in
Some source clients only reject upload requests when queuing to the top.

The power supply uses a reputation system to encourage uploads. To prevent counterfeiting, the power supply uses RSA public key encryption technology to protect reputation systems.

The eDonkey
No defined messages. These messages are called extended protocols. Extended protocols are used to implement credit systems and exchange information (for example, server list updates and source updates ).
Parts are compressed to improve the sending and receiving efficiency. In the client connection, the client uses UDP to periodically the status of other clients.

Client ID

The client is a 4-byte identifier provided by the server during handshake with the server. The client ID is only available during the life cycle of the TCP connection of the client server.
(High ID), which has the same client ID allocated on any server, unless the IP address changes. Client ID is divided into low ID (low
ID) and high ID. The power supply server usually assigns a low ID to clients that cannot be connected. If you have a low ID, the client may be limited to the use of the client in the electric terminal network, and even cause the server to reject the connection. High ID is composed
The IP address of the client is calculated based on. The assignment and representation of client IDS can be viewed from the perspective of the electro-optic protocol. The client with a high ID allows other clients to freely connect to the TCP port
Think 4662 ). Clients with high IDs are not subject to any restrictions in the electric healthcare network. When the server cannot open a connection to the client's electrical terminal port, the server gives the client a low ID. This is mainly
The client is installed with a firewall to organize external connections. In the following cases, the client will get a low ID:

  • When the client accesses the Internet through NAT or proxy server.
  • When the server is busy (the connection counter of the server times out and the client cannot be connected ).

The high ID is calculated using the following method: assume that the IP address of the machine is x.y. Z. W, and the client ID is
X + 28 * Y + 216 * z + 224 * w (big
Endian is at the top ). Low ID is always less than 15777216 (0x1000000). I don't know how it is calculated. (In the original protocol, it seems that the low ID algorithm is not important as long as the condition is met .), Note that the low IDs obtained from different servers are different.

The client with a low ID does not have a public IP address for other clients to connect to. Therefore, all communications must go through the e-mapreduce server. This will increase the server load, so the server is unwilling to accept low-ID clients. Similarly, this indicates that the client with a low ID cannot be connected to the client with a low ID on other servers, because the electric terminal does not support bridging between servers.

A callback mechanism is introduced to support the low-ID client electro-optic protocol. Using this mechanism, a high-ID client segment can require (through the server) a low-ID client to connect to it for file exchange.

(Currently, most servers do not reject low-ID client connections because they basically do not help the client transmit files. As a result, clients with low IDs cannot be transmitted .)

User ID

Edas supports reputation systems to increase file sharing. The more things a user uploads to other clients, the more prestige it gets, and the faster it will forward in their waiting queue. The User ID is 1.
A 128-bit (16-byte) GUID is generated by concatenating a random number. The 6th and 15th bits are not randomly generated. They are respectively 14 and 111. The User ID is only available in client and server sessions.
The User ID is unique to identify the client. User IDs play a major role in the Reputation System. Attackers impersonate other users in order to obtain the right of their reputation. Edas provides encryption solutions.
Customer fraud. The RSA method is used to encrypt information exchange.

File ID

The file ID is used to uniquely identify a file on the network and detect and repair File Corruption. Note that the unique identification and cataloguing of a file by the electric terminal are independent of the file name. The Globally Unique id calculated by the hash of the file content is used to identify the file. There are two types of file IDs: one is used to generate a unique identifier and the other is used to detect and repair File Corruption.

File hash

The file is identified by a 128-bit guid, which is calculated by the client's hash of the file content. GUID is calculated using the md4 algorithm. When calculating the file ID, the file is divided
The size of 9.28mb. A guid is used to calculate the hash of each file block separately and then combine them into a unique file ID. After downloading the file block from the client, the hash and file of the block are calculated.
The file block hash sent by the upload end is compared. If it is different, it indicates that the file block is damaged. The client will overwrite the block by block (kb at a time) until the hash calculation indicates that the file block has been repaired.

Root hash

The root hash is calculated by using the sha1 Algorithm for each file block. The size of each computing unit is kb. It provides higher levels of reliability and error recovery.

Power Register Protocol Expansion

Although eMule is fully compatible with eDonkey, it implements some extensions to enhance its functions. Extended focus on communication between clients, especially security and UDP tools.

Software and Hardware restrictions

Server settings include two types of limits on the number of active users, software and hardware. The hardware limit is greater than or equal to the software limit. When the number of active users reaches the software limit, the server stops accepting new low-ID client connections. When the number of users reaches the hardware limit, the server does not accept any connections.

Client Server TCP Connection

Each client uses TCP to connect to a server. The server assigns an ID to the client to uniquely identify the client in the Session (the High ID is always allocated based on its ID address ). The client can only be operated after a server connection is established. The client cannot connect to multiple servers. without user intervention, the client cannot dynamically change the server.

Establish a connection

When the client creates a connection, it may connect to multiple servers at the same time. Only the successful login process is used, and other connections will be abandoned directly.

There are two connection establishment cases:

  1. High ID ?? The server assigns a high ID to the client.
  2. Low ID ?? The server assigns a low ID to the client.
  3. No ?? The server rejects the client connection.

Of course, there are cases where the server crashes and cannot be connected.

High-ID login process high-ID login process

Describes the message exchange process for high-ID login. In this case, the client creates a connection to the server and sends the login message to the server. The server uses another TCP Connection
Connect the client and perform a handshake between the client and the client to confirm that the client can accept connections from other client. After the client-to-client handshake is completed, the server closes the second connection,
And the ID sent to the client as the end of the client server handshake. You may notice that the eMule info in is gray. This is because the message is an extension of the eMule protocol.

Low-ID login process

Low-ID login process


Describes the process for generating low-ID connections. In this case, the server cannot connect to the client (the handshake from the client to the client), so a low ID is assigned to the client. Generally, server messages include
This warning "WARNING [server details]-You Have A lowid. Please review your network
Config and/or your settings .". Whether it is a high ID or a low ID, the handshake is composed of the ID
When the change message ends, the message provides a client ID for the client's session with the server below.

Logon denied Process
Logon denied Process

Describes the process of Logon rejection. When the client has a low ID or the server has reached the hardware capability limit, the server may refuse to log on. The server message contains a brief description of the reason for rejection.

Information exchange at the beginning of the Connection

After a connection is established between the client and the server, some setting messages are exchanged. These messages are used to update the status information at both ends. The client first sends its shared file list to the server, and then requests to update it.
Server list. The server sends its status and version, then sends the list of Emule servers it knows, and provides some self-recognition details. Finally, the client queries the source (which can be used to download its downloaded files ).
The server returns a series of messages, knowing that all source lists are obtained by the client.

File Search

File Search is triggered by users. This operation is simple. After a search request is sent to the server, the server returns a search result. When there are many results, the search result messages are compressed. Then, use
If you select to download one or more files, the client will request the source of the selected file, and the server will return a list of the source of the requested file. An optional server status information may be sent before the source message is found.
To the client. This status information contains the number of current users and files supported by the server. Note that this is a UDP supplemental message to enhance the client's source locating capability. After confirming that these sources are new,
Connect the eMule client and add them to the source list. Connect to these sources sequentially according to the received sources. There is no priority mechanism to determine which source to connect first. However, when a source is
When multiple files in the download list need to be downloaded, there is a complementary mechanism to solve the problem (note that eMule only allows one transmission connection between two clients ). This selection algorithm is used to send the specified file to the user.
Priority. If there is no priority, it is in alphabetical order.

Callback mechanism

The callback mechanism is designed to overcome the problem that low-ID clients cannot accept connections, so that they can share files with other clients. This mechanism is simple: if both client a and client B are connected to the same
On the eMule server, a needs a file on B, while B is a low ID. A can send a callback request to the server, and the request server requires B to connect to a in turn. The server already has
A TCP connection sends a callback request message to B and provides the IP address and port of A to B. B can connect to a and send the file to a without more server involvement. Obviously, only the high-ID Client
The low-ID client callback can be required (the low-ID client is unable to accept the connected connection) (that is why the High ID can be connected to any source, the reason why low IDS can only be connected with high IDs ). This also allows two low-ID clients to exchange files through the server, the server as a transit. However, most servers are no longer supported.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.