I read the "This is the search engine: the core of technical details," The book, see the text in chapter three of the following words aroused my thinking:
This kind of reconstruction strategy is better suited for small document collections because the cost of fully rebuilding the index is high, but the current mainstream business search engine generally uses this way to maintain the update of the index, which is related to the characteristics of the internet itself.
There are currently four index update strategies: Full rebuild strategy and merge policy in-situ update Strategy mix Strategy
If you want to know about these 4 index update strategies, see "This is the search engine: the core of technical details," the author's blog: Search engine index Update strategy
I have compared these four strategies to find the main reason:
Although the cost of rebuilding the indexing strategy is high, the strategy is the only one that can guarantee that the index is still in effect during the rebuild, and for commercial search engines, it should be ensured that the system can function properly at any time.
The re-merge policy and the in-place update strategy are combined with the old index after the incremental index is created, and the old index is not effective during the merge, and the so-called mixed strategy only chooses to use the merge strategy and the in-situ update strategy According to the situation, and still cannot guarantee the index to be in effect