As analyzed in the previous article, three types of services are provided in the background systems of full-text search, data mining, and recommendation engine: Synchronous service, asynchronous service, and background service. For synchronization services that can use Web Service, XML Over HTTP or Restful services, I used Jason over HTTP in the project, mainly considering the high efficiency of Json parsing by Javascript, but it depends on your preferences. For the implementation of asynchronous services, if Java is used as the programming language, you need to select JMS. The background service is mainly scheduled tasks. You can use the Timer service in the new JEE version or directly use Timer.
The simplest way to Implement Asynchronous services in JMS is to use message-driven Bean, but JMS has two mechanisms: Queue and Topic, so which one is better? Generally, in systems that require loose coupling, because the business logic is very complex, multiple systems and functions need to be integrated, and systems and functions may be added at runtime, generally, message bus mechanisms are used, such as OpenESB. This feature is also needed in my system, but I do not want to introduce the heavyweight ESB technology, so I chose the JMS Topic mode to Implement Message-driven Bean. In this way, even after the system goes online, you need to add new features, such as adding the user's credit card information to the cache when registering a new user (the login name and password are only added to the cache ), in this case, you can dynamically deploy a new message-driven Bean, so that the system can add new features without any modifications to implement similar message bus mechanisms.
To this end, I have created the message-driven Bean-MainMdb in the system to listen to the Topic jms/MainTopic.
For example, when a user posts a blog, the Web Front-end system first saves the blog information to the database and then sends a message to the JMS/maintopic. The message category is added to the blog, the parameter is the blog number. The message-driven bean in the background receives the message. In the onmessage method, find the blog details by using the blog number, then, the text information such as title, Tag, keyword, abstract, and body is indexed for full-text search, then, we will calculate the frequency of appearance of each word in this blog and its relevance (TF/IDF, which will be introduced in subsequent articles in this series) after word segmentation ), create a glossary vector space, run an automatic clustering algorithm to find similar blogs based on content, and perform automatic clustering analysis with users as needed to find users who may like this blog. Generally, it takes a long time to create a full-text index and perform Content-based recommendation engine computing. If a synchronization service is used, the user waits for a long time and the user experience is poor, however, if asynchronous service is used, the success information can be immediately displayed to the user, and time-consuming work can be completed in asynchronous mode. However, if you perform further operations, such as full-text retrieval, viewing other blogs, or other users, the system will have to wait a long time, full-text indexing and content-based recommendation operations have been completed, so the results can be presented in Quasi Real Time.