The MongoDB PHP client has a Mongocursor class that is used to get a handle (or cursor) for a query result set, and this simple fetch operation, the internal implementation is actually not that simple. This article analyzes some operations of the Mongocursor class to uncover some internal details of the MongoDB client server communication.
GetNext and network requests
Typically, every find operation returns a Mongocursor object that calls the GetNext method on the object to obtain a result data. Loop calls the GetNext method to get multiple data. Let's take a look at the specific logic of the internal fetch data.
First, we use the simplest method to generate a Mongocursor object:
$m = new Mongo (), $collection = $m->demodb->democollection; $cursor = $collection->find ();
When we call the Find method, we generate a Mongocursor object, and this time we just generate an in-memory object, and we don't send our find query to the server, because we might do something else about it after the Mongocursor object is generated, such as Sort,limit and so on. This changes the query condition.
When will PHP launch a Find network request for MongoDB, when Mongocursor calls the GetNext method. For example, we perform the sort and getNext two methods based on the above code:
$cursor->sort (Array (' name ' = 1); $result = $cursor->getnext ();
This time the second line of code will trigger the network request for find, the content of the request, such as the binary protocol of the request to parse the data structure after the display:
As we can see from the above figure, the number to Return field is the 0,MONGODB protocol in which 0 means no restrictions and gets all the data. So this time the find operation will get all the data in this collection. And we call once GetNext actually get only one piece of data. Does that mean that every time we tune in, getnext,php will make a network request to get a piece of data? The result is, of course, negative, and it is too inefficient. Well, that's not PHP in the first call GetNext all the data back, in memory, and then the subsequent GetNext calls are in local memory to take it? The result is still negative, so that the amount of data large PHP is prone to be a violent chrysanthemum.
So how do you actually do it? Let's take a look at the following picture:
The value of number returned on the figure is 101, which means that MongoDB returns 101 data to us, and this 101 is actually the default batchsize size of the server. This means that 101 data will be returned by default without specifying how many bars are returned. These 101 data will be in PHP's memory, so that subsequent 100 getNext calls, will not make a network request, but directly from memory to return data.
If we make the following call after the above getNext.
Skip the other docsfor ($i = 0; $i < $i + +) {$cursor->getnext ();} Request Document 102: $result = $cursor->getnext ();
The above loop called 100 times GetNext, the memory of 101 data has been taken out, and then when we call GetNext again to get the 102th data, PHP memory has no data to provide, this time again will be launched to MongoDB Requests from the server to get more data. The client will initiate the following request this time:
This time we see that the code of the request becomes Get more. That is to get more data on the previous basis. At this point the actual mongodb will no longer return data by a specific number of bars, but by a specific size, currently 4M, that is, MongoDB will return up to 4M of data this time. For the above request, MongoDB returns as follows:
In the results of this return, it was identified that 34,673 data were returned from the beginning of article 101th. The size is 4194378, which is exactly 4M.
Set BatchSize
As we said above, MongoDB default batchsize is 101, this number can actually be set by the client. In PHP, it is set by the BatchSize function. For example, we use the following command to set BatchSize to 25:
$cursor = $collection->find ()->sort (Array (' name ' = 1); $cursor->batchsize; $result = $cursor GetNext ();
The above code calls a GetNext, as mentioned above, will be a one-time batch fetch n data back to the client. The following network requests are generated when the code above runs:
As we can see, number to return is set to 25.
If we loop through the GetNext function 25 times, and the above code executes 26 times altogether, because the first time only 25 records are returned, the 26th call to the GetNext function will trigger the network request again. The request body is as follows:
Since we set the batchsize to 25, there are only 25 return requests this time. There are only 25 data returned from the server.
Use limit
In addition to the BatchSize function, there is a way to control the number of records that are returned in batches per network request, that is, the limit function is raised in Mongocursor, and the number of records to be fetched is set directly.
For example, the following code, we set the limit query the first 50,000 records:
$cursor = $c->find ()->sort (Array (' name ' = 1); $cursor->limit (50000); $res = $cursor->getnext ();
The above code will issue the following request
We see that the number of requests returned is 50,000, then the MONGODB server is not going to return 50,000 data. Let's take a look directly at the specific return packet
Unfortunately, MongoDB server only returned 34,678, not our ideal 50,000, in fact, the reason is very simple, from the value of Message Length can be seen, because the current request packet has reached 4M size, this limit can not be exceeded. So we can only return 34,678 data.
At the same time, when the client received the returned packets, found that only 34,678 data, not enough of their own requirements of 50,000, also poor 50000–34678 = 15,322, so the request will be initiated again, requesting the server to return the remaining 15,322 records. As follows:
BatchSize and limit phase combination
Sometimes we may need to take a lot of data, such as the above, by setting the limit for 50000来 to get 50,000 data, and take these 50,000 data fetch may exceed the timeout limit of the mongocursor we set, throw the Cursor timeout exception. At this point we can set the limit at the same time, setting the batchsize to control every two times the request server interval. Avoid mongocursor timeouts due to large data acquisition.
For example, we want to get 128 data, but we can set batchsize to control only 50 fetches from the server at a time. So in the subsequent getNext call, there will be three network requests, respectively, the number of requests is 50, 50, 28.
$cursor = $c->find ()->sort (Array (' name ' = 1); $cursor->limit (->batchsize); $res = $cursor-&G T;getnext ();//Retrieve the other 127 documents that we still wantfor ($i = 0; $i < 127; $i + +) {$cursor->getnext (); }
Small questions about the BatchSize function
Above, we said to control the exchange of data between the client and the MongoDB server by setting Batchsiz E. But here's a special case, when BatchSize is set to 1, or negative, MongoDB only returns the first requested packet, and then shuts down the connection directly. In other words, if we execute the following command:
$cursor = $c->find ()->sort (Array (' name ' = 1), $cursor->batchsize (1)->limit ($cursor->getnext (); Var_dump ($cursor->getnext ());
Will find that the last Var_dump is always NULL. Because each time you press batchsize, only 1 data is returned, and then the connection is closed.
And we just have to make a little change, change the batchsize to 2, and the situation is very different.
$cursor = $c->find ()->sort (Array (' name ' = 1), $cursor->batchsize (2)->limit ($cursor->getnext (); Item 1$cursor->getnext (); Item 2var_dump ($cursor->getnext ()); Item 3
As can be seen, although the first network return package is set to return only two data, but every three times the getNext is returned data, that is, the second time from the server to obtain data.
In fact, through the above experimental results, we have a general understanding of the MongoDB Client Server communication protocol, more detailed content we can directly in the MongoDB official documents found (MONGO wire protocal)
See MongoDB Communication Protocol (RPM) from PHP client