This is a creation in Article, where the information may have evolved or changed.
This article to the series to tell a small paragraph, before beginning to write these articles, just want to write their recent use of the search engine Golang said, prepared about 3, 4 of the volume, but a write down, found a bit can't hold, write to the back actually and golang nothing related, Mainly in the search engine architecture and some data structures, I think this is more useful than writing code comments, and by writing down this, I myself to these data structure understanding also deeper.
One months, wrote 14, although 14 text is not enough to illustrate a search engine implementation, but I think basically some important things I have said, the search engine itself is actually relatively simple, and we do not involve the search engine crawler parts, so it is not very complicated, Complexity is something that is complex on the periphery of some algorithms, such as participle, such as sorting.
Let's see what we've been saying.
After a head start, I'll talk about the basic concepts of the search engine, including docid
the concepts, and the layering of the whole search engine.
And then said inverted index technology, search engine is the most basic concept of the bottom and core.
The inverted index is divided into two parts: dictionary and inverted file, followed by several implementations of the inverted index dictionary, including the hash table and the B + tree, and a brief introduction to the Skip list.
Inverted index after the technology is finished, the steps of the inverted index construction, including the construction of the full and incremental, as well as various construction methods, finally said my construction method.
Inverted platoon principle, data structure, the way of construction after the introduction of an introduction to the index of the positive row.
Inverted and positive rows two basic data structures when they're done. The segmented strategy for indexing, which is a technique that appears for more flexible use of indexes.
After the paragraph is the index layer, the index layer first said the data how to make the deletion.
Finally, this paper introduces the retrieval of data, including the process of finding the intersection and the assembly, which is actually the engine layer above the index layer.
Then, in addition to the above, interspersed with some of the search engine's peripheral technology, including text relevance, sorting, machine learning and long tail words of some problems.
According to this, the first part should be almost finished, a single version of the search engine only need some of the above things can be fully realized, but also welcome to my GitHub to see, the current open source is a simple stand-alone version of the search engine, the main structure is as follows, very simple, welcome to submit the bug , because it is own toy project, did not write unit test, the code of _TEST.GO file is functional test.
However, this series is far from the end, about the search engine there is a lot to say, and index fragmentation and distribution have not yet started, but the first part of the stand-alone version of the end, will use a series to say the search engine distributed implementation, of course, the code will follow the update.
After the article update or will be around the search, recommendation and advertising three aspects, I think these three are in fact one, with the same technology, the algorithm three looks different, in fact, the bottom is not too much, especially ads, is simply recommended + search combination, so the article in the architecture and algorithm will also be involved, Of course, there will be more articles, such as today's article.
Finally, since it is a search engine written by Golang, let's talk about Golang.
With a quick year of Golang, I was transferred from C + +, before nearly 10 years in and C + +, in addition to C + +, used Python,objective-c,c#,lua,erlang, sorry, did not use the world's best language PHP, Have not used the world's hottest language Java (to write Hadoop jobs should not be used Java bar), I personally think that Golang design is completely an engineering language, engineering is very strong, it is very suitable for background type development.
There is no language dispute here, just write some of my feelings, in general, I personally think Golang is very good.
We'll make a deal first.
The development efficiency and the previous C than do not know how many times higher, the basic data structure of the whole, in fact, is mainly map.
More comfortable is the sense of use and C is not very different, there is a pointer oh, this is very important for a C programmer oh.
The interface with live can write good Golang code, design patterns that set not set to go, of course, design mode that set of ideas I think it is very useful.
Error return value, this may be a personal hobby, I feel very good, do not have to catch anything unusual, especially Java kind (although not written, but the Java code has seen a lot of) an exception can be like a dream space to throw a few layers is a nightmare, error on the wrong chant, return error on the line, What a strange thing to do.
The process and the pipeline will not say, various discussions have been many.
High-order functions, closures casually, many new functional programming ideas can be more comfortable to use in Golang
Debugging is good, you can use GDB, and then there is a performance test tool is also very good, basically these two plus printf will be able to handle almost all the bugs and performance bottlenecks.
Bad, huh?
Garbage collection should be considered an advantage, so that you do not care about the new object when released, but as an old programmer, sometimes (in fact, many times) you do not know what your program's memory space is, especially a search engine development, originally ate memory, you are not sure how much memory, Use in what place, still very uncomfortable, as oneself new own delete comes directly, although own management there may have memory leak, that change bug is good, change to not reveal Bai. Of course, you can say the benefits of 10,000 languages with automatic garbage collection, but I can also say one of the benefits of managing memory: Freedom.
No generics, no generics, no generics! Inability to spit out grooves. No addition is said to be allowed.
Dynamic loading is not supported. So, this is actually very important, especially non-stop services, such as search engines, if you can dynamically load. So, modify the algorithm, modify the strategy, minute-to-line instead of stopping, but it seems like 1.6 support? I'm not paying attention. I don't know, I'm still using 1.4.2.
Overall, I still very highly recommend Golang, next time try rust, see how, but I personally not too optimistic about rust, now programming language so much, and PHP this ace language, a brand new language behind not a strong company support, finally no matter how good can only hehe.
Finally, my open source project (Https://github.com/wyh267/FalconEngine), I will persist in writing, into a distributed search engine, in the hope that the performance can reach the level of ES, but also welcome attention, Before to be familiar with the search engine data structure, so I have a lot of basic code, in order to allow more people to use, but also for stability I will introduce some open-source components rather than build their own wheels.
If you want to see all the previous articles, you can follow the public number and then click on the Public menu:) or see the SF column directly.
Please pay attention to my public number, the article will be sent here first:) scan or search XJJ267 or search for Chinese West Plus language .