Recently busy to transfer a project from MySQL to MongoDB, in the process of importing old data, encountered some twists and turns, made a lot of mistakes, but also learned a lot of knowledge, so recorded.
The company specially equipped with several high-performance server for this project, exclusively dual quad core Hyper-threading CPU, plus 32G of memory, operation and maintenance personnel after the installation of MongoDB, it is my turn, I am accustomed to the use of new servers before the relevant log, to understand the basic situation, when I browse the MongoDB log, Find some warning messages:
Warning:you is running on a NUMA machine. We suggest launching mongod like this-avoid performance problems:numactl--interleave=all Mongod [Other options]
At that time I was not very clear about what NUMA is, so did not deal with it, just to report the problem to the OPS personnel, the fact that the OPS personnel also did not deal with, so the prelude of the problem is so opened ...
Migration works by first importing old data. It was all right at first, but a few hours later, I stumbled upon the fact that I didn't know when the data import was going down, and my PHP script started throwing Exceptions:
Cursor timed out (timeout:30000, Time left:0:0, status:0)
I have a moment to judge the problem, think of the PHP script to increase the value of timeout to cope with:
Mongocursor:: $timeout =-1;
Unfortunately, this does not solve the problem, but the error is changing the appearance of the pattern:
Max number of retries exhausted, couldn ' t send querycouldn ' t send Query:broken pipe
Helpless under the strace followed by the PHP script:
Shell> strace-p <PID>
The discovery process is stuck on the recvfrom operation:
Recvfrom (<FD>,
The following command queries the meaning of the recvfrom operation: Receive a message from a socket
Shell> Apropos Recvfrom
You can also confirm it as follows:
shell> lsof-p <PID>shell> ls-l/proc/<pid>/fd/<fd>
The current operation of MongoDB is queried at this time, and it is found that almost every operation consumes a lot of time:
shell> echo "Db.currentop ()" | /path/to/mongo
Running Mongostat at the same time displays a high locked value.
...
Repeated do a lot of work, but always can not find the crux of the problem where, had to turn to the official forum, where the technical support are very enthusiastic, after I described the problem, not long after the reply, suggested I check whether the index is not poor, in order to verify this possibility, I activated the profiler record slow operation:
mongo> use <DB>mongo> db.setprofilinglevel (1);
However, the result is basically an insert operation (because I am importing data primarily) and I do not need an index on its own:
mongo> use <DB>mongo> db.system.profile.find (). Sort ({$natural:-1})
...
The problem here, seems to have cornered, in order to die as a live horse medicine, I repeated a few times the process of migrating old data, the result is naturally a problem, but fortunately I found that whenever there is a problem, in the results of the top command, there is always a process called irqbalance, Search for a bit, results many introduced Irqbalance 's article mentioned NUMA, let me remember in the log before the warning message, so as described in the information, restarted MongoDB:
Shell> Numactl--interleave=all/path/to/mongod
Everything's fine. In order to solve this problem, wasted a lot of spirit, really do not have the strength to explain what NUMA is exactly what is, have to understand the Netizen can refer to the article of the foreigner, the introduction is very informative.
Original link: huoding.com
For the culprit, the author left everyone to learn, Nosqlfan here can give you a simple description, first explain a few concepts
Numa: Numa is one of the multicore core CPU architectures, all referred to as non-uniform memory Access, which simply means that in a multi-core CPU, the machine's physical memory is allocated to each core, and the architecture diagram is as follows:
Each kernel access is allocated to its own memory faster than accessing the memory allocated to other cores, with the following access control policies:
- 1. Default: Always on the local node assignment (allocated on the node running on the current process);
- 2. Bind (BIND): Force assignment to the specified node;
- 3. Crossover (Interleave): interleaving on all nodes or designated nodes;
- 4. Priority (preferred): Allocated on the specified node, and failure is allocated on the other node.
The last use of the Numactl–interleave command in the previous article is to designate it as a cross-sharing mode.
irqbalance: This is a CPU-intensive process that the author mentions above, and the role of this process is to distribute the system interrupt signal in the multi-core CPU operating system. See also: irqbalance.org
The concept is over, here is a brief description of the above question:
We know that the virtual memory mechanism is using an interrupt signal to notify the virtual memory system of memory swap, so this irqbalance process is busy and is a dangerous signal, here is due to the frequent memory exchange. This frequent switching phenomenon, known as swap insanity, is frequently mentioned in MySQL, in the NUMA framework, where inappropriate policies are used, causing the core to allocate memory only from the specified memory block nodes, even if total memory is surplus, resulting in a large number of swap operations due to insufficient current node memory.
For Numa, learn more: NUMA and Intel next-generation Xeon processors and MySQL standalone multi-instance scenarios two articles
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
MongoDB performance Problem and principle analysis