background:Recently developed a large warehouse management platform project, the predecessor of which is countless versions of the historic CS-based Windows desktop program. Then for each customer, we may need to customize special features that are more personalized. So, there is a core research and development team to develop a core functional version of the software in a year or two, and then take it out to promote sales, often sales success, do the pre-sales colleagues are to take a lot of customized needs to come back; then a customized and personalized extension of the core functionality began, The completion of development can go to the customer site implementation; Finally, some of the maintenance colleagues will put this customer's system into their daily work list. Cycle. Admittedly, this approach can be seen in most software companies, but my company is an idea company, how can we let such a routine to the end!? Here is about why to do platform description omitted, in short is to do platform, make SaaS, do not buy software sold account. But this time there is a matter of user experience: In the past, using the CS program, in each client cache a large amount of data (usually a few g), so the customer retrieval of information is very fast, if you change to a BS architecture, we have no place to store so much data, And by way of SaaS, all of the customer's data will be stored on the platform, where things go, as if the failure of the project is doomed. In everyone's head, depressed, we found a good development in the lakes and rivers of technology: search engines. It's useless, but I've heard it! So a part of the work around the search engine began.
Search Engine:Wikipedia gives the definition: search engines automatically collect information from the Internet, after a certain collation, to provide users with a system of query.
working principle:1, collect information: Search engine information collection is basically automatic. Search engines use automated search bots called Web spiders to connect hyperlinks on every page. The robot program is based on hyperlinks to the Web page, like the "spread like wildfire, hundred" in everyday life, starting from a few pages and connecting to all the other pages on the database. In theory, if there is a proper hyperlink on a webpage, the robot can traverse most pages. 2. Organize information: The process of organizing information by search engine is called "indexing". The search engine not only needs to save the collected information, but also to arrange them according to certain rules. In this way, the search engine does not have to re-check all of its saved information and quickly find the information. Imagine that if the information is not randomly stacked in the search engine database at random, then every time it is looking for information, it has to completely check the whole database, so that no faster computer system is useless. 3, accept the query: the user to the search engine issued a query, search engine accept the query and return information to the user. Search engine every moment to receive from a large number of users almost simultaneously issued queries, it according to each user's requirements to check their own index, in a very short time to find the user needs information and return to the user. At present, search engine returns are mainly provided in the form of web links, so that through these links, users will be able to reach the page containing the information they need. Often, search engines provide a short summary of these pages under these links to help users determine if the page contains the content they want.
the right direction:Through the background and the search engine principle description, preliminary can judge, the direction is correct. First, collect information. Our information comes from the daily business of the warehouse management platform, the data itself is stored in the platform database, search engine to do the collection of information, should be able to find ways to the business system data collected. Whether the method adopts active or passive, in short, the information is to collect, but the scope from the entire Internet into our self-built platform. Second, organize the information. Platform data storage is to follow the Rmdbs, the form has its inherent characteristics, whether it is conducive to retrieval, at least at the beginning of serious to do a search is debatable. On the basis of this assumption, we must arrange and organize the information collected by the search engine. Third, accept the inquiry. No doubt, this is our ultimate goal, is also the user can feel the only function---query, and is efficient query.
How to choose? With the development of Internet technology. There are countless open-source search engines, such as Lucene, Sphinx, Xapian, Nutch, Datapark Search, Zettair, Indri, Terrier, Galago, Zebra, SOLR, ElasticSearch, whoosh and so on. The scope of our selection is also basically determined in this area. Combined with the company's actual situation attaches the following principles: 1, the application of a wide range (others put the hole almost) 2, easy to use (save time and cost of learning) 3, community active (do not worry about a few days it disappeared from the lake) 4, the technology span is not small (reduce learning costs) comprehensive, the final preliminary selection The SOLR as the first candidate for the project. Mainly because it is built on Lucene, based on full-text indexing, providing restful API, the threshold is not high, has been widely used in the market.
Vision:When we started to learn about SOLR, I looked at a lot of websites but there was very little information in Chinese, and most of them explained how to install them, but it was not very useful. I hope to combine the actual use of my project, to record this experience, as much as possible to enrich the Chinese language information. The following share will start around the Apache SOLR 6.3.
Apache SOLR: Enhancing the search experience why SOLR