Research on jxta cms search technology

Last Update:2018-12-08 Source: Internet

Author: User

Tags dedicated server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1Introduction

The traditional information retrieval technology is unable to accommodate the massive amount of information that grows exponentially on the Internet, P2P (Peer-to-Peer, Peer-to-Peer) applications use new technologies to retrieve files and information on the Internet. JXTA is an open network platform for P2P computing [1 ~ 4] JXTA Content Management Service (CMS) adopts a Real-Time Method to search for resources you want, so that Peer nodes can share and download Content. Currently, research on jxta cms-based search is still rare in literature. This article introduces the simple search of CMS, Metadata-based search and distributed search.

2CMSSimple search

Any data to be shared is called Content ). Each shared content has a unique content ID and content advertisement. The Content ID uses the unique 128-bit MD5 checksum generated by the binary data of the content itself. By MD5 checksum, it is easy to tell whether two files shared by two different Peer nodes are the same. Content advertisements are stored in XML format and used to describe the metadata of the content, including the content name, length, MIME type, content identifier, and content description information. For simple search, the CMS algorithm sends a query string to each Peer. After receiving the query string, the Peer obtains the file name and description of the shared content. If the Peer determines that the content meets the requirements, it will return the advertisement of the content to the Peer sending the search request. Note that a simple search only accesses the CMS of the other party and searches for locally qualified content. You do not need to request the Peer to start the CMS service.

The most important class of the jxta cms simple search program is ListContentRequest [5]. The main function of the ListContentRequest class is to send a query string to a remote Peer, and then listen to the pipeline to obtain the returned results. When the result is returned, JXTA calls the policymoreresults method in the ListContentRequest class. After the result is returned, call the getResults method in the ListContentRequest class to obtain all content advertisements that meet the conditions.

3Based onMetadataSearch

To allow users to quickly search for the desired content, you can add Metadata information to the shared file. The purpose of Metadata is to allow users to search based on Metadata information, rather than searching based on the file name and the sub-string of the file description. Metadata searches are actually local searches. The results returned by the search are actually content advertisements rather than content. Therefore, even if there are many results, the actual traffic is still small. Second, because the returned result set is cached locally, the next search will greatly increase the speed.

Currently, the ListContentRequest class of JXTA does not support Metadata search. Therefore, to search for Metadata, you must first obtain all the initially qualified ads by using the ListContentRequest algorithm of the substring matching algorithm, then, wait for ListContentRequest to return the result set and use Metadata to search for the result set. The working principle is as follows: ① send a query string to a remote Peer. ② The remote Peer sends an advertisement that initially meets the query requirements to the requesting Peer. After receiving the advertisement, Peer1 starts the Metadata search and searches for the Metadata information to filter the content that matches the query string specified by the user. Search Principle 1 is shown in.

4CMSDistributed search technology

4.1 Working Principle of CMS distributed search

Distributed search is one of the development goals of CMS ~ 7]. In the distributed search model, there is no dedicated server, and each Peer has similar functions. The P2P network environment is used to locate other peers.

In this model, Peer is configured in a grid, and search requests are transmitted among these peers. In fact, CMS already supports most of the functions required to achieve the above goals, because each LIST_REQ and GET_REQ message contains an advertisement for the input pipeline for sending response messages, in this way, the Peer can forward requests from other Peer, because the pipeline ID will be parsed as the Peer's pipeline that originally sent the request. To implement the above model, you must add a TTL mechanism or similar mechanism to set the maximum number of nodes that can be forwarded by a search request.

Peer searches for the adjacent Peer on the network and sends a LIST_REQ request message. After receiving the message, other Peer queries whether the local content meets the search requirements. If yes, the LIST_RES response message is sent along the sending path of the request message, and the advertisement containing all the queried content is returned. No matter whether the local file content meets the query request, other Peer will continue to transmit the query request message in the network by spreading mode until the TTL field value is reduced to 0.

Once the Peer that responds to the content queried by the response is located, it establishes a connection with the response Peer to download the content queried by itself. That is, the GET_REQ message is sent to request the content to be downloaded first, the responding Peer sends the GET_RES response message, including the data. JXTA uses the pipeline technology to avoid Protocol issues when establishing connections and transmitting data. The principle of distributed search is shown in figure 2.

4.2 Advantages and Disadvantages of CMS distributed search

Distributed search can meet the unparalleled depth of traditional CMS search [6] and meet the requirements of Internet users. Using the diffusion mechanism, nodes on the network not only have the local search function, but also have the automatic message propagation function. In this way, the forwarded message node is directly connected to other nodes by sending the message, and the original request end is indirectly connected to a large number of nodules. The search range can increase exponentially in a few seconds, information Resources on millions of PCs can be searched within several minutes. Theoretically, distributed search can obtain all open information resources on the network. The wider the network range, the more available resources available, and the more distributed search can reflect its advantages.

In addition, with the CMS distributed search technology, any network user can scan active nodes and search for required information, and then download the information directly from the node, information replication between multiple nodes of the network improves the availability of information and enables it to provide services for more users. Therefore, the network can quickly accumulate a wealth of information. The distribution and redundancy of information resources prevent the network from being "spof". At the same time, "access denial of attack" for a single server is no longer effective. It can be seen that the Distributed Network improves the fault tolerance and robustness of the network.

However, the CMS distributed search technology is also insufficient, and the search and positioning of Peer Points in the network are achieved through diffusion. With the expansion of the network scale, the method of locating the peer through diffusion will cause a sharp increase in network traffic. Like the problems faced by other P2P network models, the method is prone to malicious attacks, such as the attacker sending junk query information, network congestion.

4Conclusion

The CMS-based distributed search program compiled by the author in JAVA has been initially tested in a simulated environment. The future research direction is to further improve the distributed search model and algorithm of CMS and explore related security mechanisms. Since CMS was born soon and is in the early stage of exploration, in a specific application environment, CMS distributed search still faces many special problems. Research and breakthroughs in this area are worth looking forward.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More