This is a creation in Article, where the information may have evolved or changed.
Poseidon system is a open-source log search platform, has been used in the production process, can be trillions, hundreds of PB size of log data to quickly analyze and retrieve specific strings. Because Golang unique support concurrent programming, Poseidon core search engine, transmitter, query Agent is developed with Golang, in the core engine query, multi-day query, multi-day Data asynchronous download a large number of use Goroutine+channel.
Good morning to all of you, I am Guo June, I am very happy to communicate with you here today. My speech topic today, Golang in the trillion search engine application. Poseidon in Greece means Poseidon, and here is the ruler of the massive data set.
My work has been facing a lot of users, last year I contacted Big data and massive data such scenarios, in today's speech, mainly involves the following aspects:
Design goals
Go application scenarios and challenges encountered
How to deal with it?
Changes in open source
Summarize
Design goals
First of all, why do you want to do this system? This is a security company, APT (High risk threat persistence event). In tracing the apt event, we usually look for a sample to do something in the same time. In the vast number of logs to find this information, when the luck is not good congestion, about two or three hours can run out, if bad luck, run too much of the task of plugging will be repaired, may be two days a day to come out of the data, it is clear that such efficiency is not high.
Our design goals, our total data volume retained three years of historical data, a total of 100 trillion, the size of three PB. Second-level interactive search response, from the front-end of the request to a day of data, we will return you within a few seconds. We previously set the second level of 60 seconds to return, in fact, after the test results are in 3 seconds to 5 seconds, 90% requests within 10 seconds. Every day to support 200 billion of the amount of data poured, the original data only one copy, the existing MR task is not aggressive. ES raw data not only one copy, will save another copy, we are so large data volume, the preservation of copies, maintenance costs and costs are very large. ES can not support trillion level data volume, now the industry to do 100 billion, we only do more than 300 g. Then the custom word-breaker policy, each of our business log format is different, the word-breaker policy needs to be particularly flexible, and then the failover node load balancing, automatic recovery, support the original log batch download.
Figure 1
Figure 1 is our overall process, which is more complex, and we've had colleagues share this architecture before. If sharing the architecture today may not be enough time, Figure 2 is a very simple rough picture of it.
Figure 2
Go application scenarios and challenges encountered
The original log first. At the time of conversion we extracted every 128 lines of raw logs as a document, and multiple documents were joined together to form a file. Someone here will ask why choose 128 lines, our daily log volume is 70 billion, according to each line of a document we have 70 billion documents. One line of log a document, 70 billion documents occupy too much space, 70 billion data will swell. Select 128 line is because: first, 70 billion in addition to 128, about 546 million, in a certain range can withstand; second, because our IDs are digital form, sent in the form of a number, we compress the numbers, we must take a variety of compression methods, we use in this place the interpolation, Compression for 128 numbers is better. Compressing 128 rows of logs compared to compressing 1 rows of logs is much higher. We daily original log, I said the business daily original log has 60, after compression we can hit 10 or so, this is the daily data. When we output, this is the original log, and finally to the original log to find, and finally build the data. Because we want to deposit into the time, just I said a word, many people do not understand, multiple links together to form a file. There is a very big advantage, inside the data I put into another file, I have been superimposed, and finally this file can be decompressed. In other words, the file is exported to a file, as a file, I take a section from the file, I can extract it out, this is a very large feature. Because I need to read a log, I definitely want to know where I read from somewhere, I want to know I read the compressed file, extracted is 128 lines of log. We put the entire original data in here, to build the index and the original data, is basically such a process. First look at the offline engine, the client request log, including PC Defender, network and browser, and so on, this is equivalent to the traditional search engine crawler. The following is a detailed, offline generation of Docgz, Docgzmeta, and then building the original data. Online engine, the Web we do simple page development, to the proxy cluster, then sent to the searcher cluster, and then go to Readhdfs, Readhdfs This service is developed with Java, Java development has a lot of pits, but have to use, Because Java is still the most appropriate language for operating Hadoop.
To describe the data structure. We use Protrbuffer to describe the core data structure. Each ID is divided into two paragraphs, the DocID is the number of my document, and the second is the RowIndex, each of which corresponds to a multi-line log, I here face should be 128 lines inside which line of the log, is the positioning of this. We describe it in the form of a map, which is a list formed by DocID, each of which corresponds to multiple docidlist. In map and string, I need to find the map first and then take out the data. As shown in 3.
Figure 3
Talk about the core technology of search engine. First inverted index, inverted index has a trend, docidlist very long. We have a participle will be calculated first Hashid, know that after Hashid to query when we want to do a platform, to query which business, such as I want to check the network and so on, we to the business of the abbreviated stitching on Hashid, and then to query the time, query which day of the data, Our engine is not real-time, because the volume of data is too large to do real-time, only to do today check yesterday. Then parse Invertedindex get the corresponding document information in the inside, find this location, the original data we all need to extract, and then extract. We know that a certain participle corresponds to the docidlist is which, according to Docidlist to check the map information in which place, obtain and then spell a path, the original data to take out. Take out the original data, there will be 128 lines of log in a file, the 128 lines of log doc inside Rowindx find the document in which line, do filter on it. With very simple words to summarize, because Docid longer, we save a position, our docidlist each Docid corresponding document also is more, we read the original document, also will save a position, in the computer field, All sorts of difficult problems can be solved by adding an indirect middle layer to solve the problem. As shown in 4. This sentence has a good try in our system, not just this one.
Figure 4
Let's take a look at idgeneratror. According to daily business 27700亿来 calculation, after the word is 10 billion, each participle corresponds to 277 lines of log, which is the average, Docid 2.77 trillion per day. As per 4 bytes, the Docid number alone will be nearly 11TB. This is handled here, using a segmented interval to get a reduced QPS, and the daily ID is re-allocated starting from 0. We Docid the inverted index at 2.4T per day. Every day 2.77 trillion we do a little bit afraid, we think of a way, our business name plus time as a key, the daily ID from zero to re-assign, so that I can not be too high on a daily basis, and the Docid is not too large, if too large, the data will be more expansive. I have now built the index is which business, what time period, which day, I want to request which section this time, if I asked for 1 to 100 this section, in the Idgeneratro will be in advance to set aside 1 to 100 of this gap.
Proxy/searcher detailed design. Searcher core engine is to go to the four-level index to do things, including filtering and fuzzy query and so on, these are not the backbone of the business I did not say. Take out the map data from the inside, and then take the raw data, after the data, we have a lot of raw data is very large, about dozens of trillion, if placed in the front of the processor, the front will directly die, we will be the original data larger business, on the page to show you, click to view the original data so a link, Point it out and then come back and ask again, this is a very simple architecture. As shown in 5.
Figure 5
Searcher the concurrency model. Because reading the four-level index, the process of reading Docid is exactly the same, so I am here to read Docid examples, such as I get docidlist data, I will give each Docid assigned a goroutine, stitching out doc Path, read the original log, and then done Filter, and finally back to the front end. As shown in 6.
Figure 6
How to apply
The first bottleneck. the basic components of our team are all C + +, our team core business, and the online engine, the core engine is C + + to do. We use GDB for debugging, too many processes, with C + + components to start to lazy, and then edit into C, and then put into go inside. Every read Docid, every file will read, our application is often hanging, then there is no reason, finally we see the execution of CGO, we received a signal is signal exit, and then we do GDB debugging, said is the process too much, Because CGO will create a new m when it executes.
solution: use go to re-implement again, the component as an HTTP service, Go client calls, do centralized processing.
a second bottleneck. in the system, we use a large number of goroutine, panic in the main writing process can not be processed off.
Solution: We have a struct in the channel type, encapsulate the normal data and error, take out the data in the main process, and unify the processing.
Experience summary.
Even if you are proficient in many languages, it is best not to mix and to be very cautious about introducing solutions in other languages.
Do not fully believe recover, it can not restore the runtime of some panic.
Take a look at our proxy multi-day concurrency query design. As shown in 7. There are two scenarios for querying for multiple days. The first scenario adds a multi-day query, which makes our core query engine very bloated, and we still add a middle tier to that sentence. Turn multi-day into a single day, and then in the proxy to get all the single-day data, formed a multi-day query.
Figure 7
We also have another project, request Poseidon data, we think of two solutions, the first solution, you do in your own third-party system cache, or we do cache, we are such a choice. If a third-party system caches, all queries, caches can only be used in third-party systems. If we cache them here, they send a request to us, and all the other third parties may be able to use them. We do this, the first request Searcher to get the data of the day, such as one months of data, request Searcher single-day data, if every goroutine to check one day, every goroutine get Searcher one day data, to solve it out, See if it's the wrong data. If it is the wrong data, directly to the client to return this data error, not to the client the entire error, because it is only one day a piece of data error. And not when we query 30 days of data, as long as one day a certain data error, directly back to the user, I this system is not available. If it is not the wrong data, there are many request parameters depending on the request parameters. In addition to these, there is the time to query, according to this to do a cace Key, and then call back to the front.
We ran into a problem where every user would run the entire indexing process, meaning the user would give us real-time testing. Within the same time period, the same data will not go through the entire READHDFS process within the cache time. Build index programmatic, we will have monitoring, if programmed we will know that the program hangs the alarm perception, but the data error is unknown, we have not yet done this monitoring. But this data error is unknown, we repair the index will spend a lot of time, to re-write the log, run Docid, but also to resolve the vulnerability.
Our solution, the first to reduce the cache time, within the tolerable error data time, user queries can be found in a timely manner, recovery of two days a day data can also be cached for 30 days or one or two months, to the final error data will be more and more. The second solution, refer to NSQ, uses the uncertainty of for+select to fractionation, random flow to Chanel and HDFs for thermal testing. The disadvantage is that the development cost is a little higher than the first scenario. This piece should note that the development cost is not very high, because select can only take data from Chanel.
A second summary of experience. Do not choose some very tall technology, or some of what we call black technology, simple, effective, enough to solve the problem is completely possible. It is convenient to design concurrent programs with Goroutine, but the concurrent running model must hold. We had a blog before the Gopher group, which sent a lot of dynamic map, some Go goroutine and channel How to concurrency, dynamic picture is very dazzling. When we write our business, we see how Goroutine and Goroutine and channel linkage, we have a concept. When I want to express my opinion, I can't find a very proper noun to describe it, I don't know if there is any other meaning before this noun.
Proxy multi-day asynchronous download. As shown in 8. The front-end initiates the request, to choose how many days to download, how much data to download, the service side receives the request, immediately returns to the client, I already received, writes this message to the channel. At first we have said that in Readhdfs is written in Java, Goroutine too much, the bottom hanging off. Two searcher to HDFs, a participle corresponding to hundreds of docid, may correspond to hundreds of files, because each docid not necessarily in a file. Inside the searcher, it looks like a request comes in, actually it's going to get bigger, and in the end it could be a number-level growth, as we snowball.
Figure 8
First Java made a simple connection pool, and then there is a fuse mechanism, if the number of connections beyond a certain, directly return error. Like we were a long time ago, the fuse, the electrical rate at home, the fuse is made with wire, wire will melt away.
Again, the GC changes. First of all, I said that GC is never a bottleneck in our entire system. Here are some of the points that we upgraded after the simple test, here and everyone to exchange. If there are other students who do more testing than us, we can communicate.
Go 1.7. We used 1.5, and after upgrading to 1.7, our GC dropped to One-third.
Nginx agent problem, before I do share, a classmate asked me in go front end to add Nginx agent. The system I was working on was for a lot of users, and we just packaged the goserver into a binary executable package, requesting the 80 port of LVS and then forwarding it to goserver 8080, which is very simple. In this project we used Nginx, we use it for reasons.
Access control and load balancing. Load Balancer We can do with LVS, we have this project in a scenario that uses very few people. First we are an internal project, the permissions problem, our front-end port can only let open some machines to access, in addition to our own front end will be accessed, in fact, there are some other teams, will come directly to write a script to request our data. We nginx directly in the use of these two, so I do not need to do in go, the front can be directly with Nginx to do a simple load balancing. To be nginx, it depends entirely on the scene of your business. Because in this scenario, the addition of Nginx only adds a little burden to operations, but the IP limit and load balancing does not need to be re-developed, it was not used before because it did not play any role in it, and before the service, no need to have any restrictions, anyone can come to request.
Changes in open source
We consider open source. Last November, we open source system, the system has 66% code is written with Golang. We have two problems to solve, the first problem of third-party dependencies, our open source main program does not use our own internal dependency package, these third-party components, how we should maintain it, I was with a lot of people, this way is also more, but they each have the advantages and disadvantages, There is almost no perfect solution to the dependency on the dependencies, as well as multilayer dependencies, at least I did not find, since there is no, choose the most popular, the simplest solution, in this way to solve.
In our entire service, we have developed several services, a total of five. We considered that if we let users deploy five services, even if we wrote the script, deployed in each user-side operating system, different CPU bits and so on, there will be a variety of problems. Troubleshoot the problem, do not know which service to troubleshoot, for us developers, when we troubleshoot the problem, we will also be based on the log a service to find a service. We consider that we are putting all the services into one package. In the actual Exchange trial, we learned that a lot of people did not choose all in one and choose these five service independent deployments.
We have open source for five months, and a lot of people want us to make vague queries and filter open source. Fuzzy query We do it very simply, we use a database and have concurrency capabilities. We first need to fuzzy query to divide the word, put into the database inside, in the database I can operate, we usually use the fuzzy query keyword, that is, about billions of, billions of of the amount to do an operation, it is simply too simple, after the search to know the key words, to get the key words, The next scenario is a multi-keyword query for multiple days of the scene, with multiple keywords and a single keyword is the same. More than one keyword to query and multi-day query is the same, each keyword by a goroutine to query, you can solve the problem.
Summary review
First go development experience is better, performance is high, service is stable, we have an accident in addition to the line, it seems that there is no more. We do the online monitoring with our own writing, if it hangs up will automatically pull up, of course, this is a relatively low way, because it may not hang, but it did die. To meet most of the demand scenarios, the Go Language program development requires a balance between code readability and performance, and the application concurrency model needs to be within control. We have a lot of people in the group. Connection pool and object pool, connection pool We don't say, because many clients will implement the connection pooling function, we consider the object pool. The object pool has a lot of advantages, because it can reuse objects to relieve stress, which is the core function. The Reuse object solves the GC pressure, but there is a problem with the code readability, the introduction of the object pool, the object pool and the business is not related, you want to see the object pool How to do, code readability is very poor. It is also said that the object pool solution, when Go1.2, it is very cool, but so far from 1.4 to 1.7, the object pool is far from the solution, because the GC is not so obvious. Unless, in extreme cases, we may use this very extreme way to solve the problem, I think most companies are less likely to encounter this problem. We know that go is developing Android, and we are using it with C + + and C at most, and then introducing it to go with CGO, and being cautious with other languages, even if the language is very familiar, you do not know that the combination of the two may cause a problem, perhaps you can never solve the problem. To reasonably introduce a third-party solution, the cost of operation and maintenance of the system should be balanced.
Original link