View MongoDB communication protocol from the PHP Client

Last Update:2018-12-05 Source: Internet

Author: User

Tags mongodb client mongodb server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

Getnext and network requests
Set batchsize
Use Limit
Combination of batchsize and limit
Minor problems with the batchsize Function

Http://onepiece.me/blog/tag/mongodb

MongoDB? Is there a PHP client? Export cursor? Class, which is used to obtain the handle (or cursor) of a query result set. The internal implementation of this simple data fetch operation is actually not that simple. This article uses
The mongocursor class analyzes some operations to reveal some internal details about the communication between the MongoDB client and server.

Getnext and network requests

Generally, each find operation returns a cursor object. You can call the getnext Method on this object to obtain a result data. Call the getnext method cyclically to obtain multiple data records. Next, let's take a look at the specific logic of data retrieval.

First, we use the simplest method to generate a cursor object:

$m = new Mongo();$collection = $m->demoDb->demoCollection;$cursor = $collection->find();

When we call the find method, we will generate an existing cursor object. At this time, we will only generate an object in memory and will not send our find query to the server, this is because we may perform other operations on the vertex cursor object after it is generated, such as sort and limit. This changes the query conditions.

When Will PHP initiate a find network request to MongoDB when mongocursor calls the getnext method. For example, we can execute the sort and getnext methods based on the above Code:

$cursor->sort( array( 'name' => 1 ) );$result = $cursor->getNext();

At this time, the second line of code will trigger the find network request. The specific request content is as follows:

From the figure above, we can see that the number to return field is 0, and the value 0 in the MongoDB Protocol indicates that no restrictions are imposed to obtain all data. So this find operation will get all the data in this collection. However, we call getnext once to actually get only one piece of data. That's not to say that every time we call getnext, PHP will perform a network request to get a piece of data? The result is of course no, so the efficiency is too low. Well, isn't it true that PHP will get all the data back when it calls getnext for the first time, and the data is stored in the memory. Then, all subsequent getnext calls will be performed in the local memory? The result is still negative, so the data volume is large.
PHP is vulnerable to brute-force chrysanthemum.

So how is it actually done? Let's take a look at the figure below:

The value of number returned in the figure is 101. That is to say, MongoDB returns 101 pieces of data to us. This 101 is actually the default batchsize of the server. That is to say, if no number of results are returned, 101 data records are returned by default. These 101 pieces of data will be stored in the PHP memory, so that the subsequent 100 getnext calls will not carry out network requests, but will directly return data from the memory.

If we call the following call after getnext.

// skip the other 100 docsfor ($i = 0; $i < 100; $i++) { $cursor->getNext(); }// request document 102:$result = $cursor->getNext();

Getnext is called cyclically for 100 times, and all the 101 data items in the memory are retrieved. Then, when we call getnext again to obtain 102nd data records, no data can be provided in the PHP memory. At this time, a request to the MongoDB server will be initiated again to obtain more data. The client will initiate the following request this time:

This time we can see that the request code is changed to get more. That is, to obtain more data based on the last time. At this time, the actual MongoDB will not return data by a specific number, but by a specific size, which is currently 4 MB. That is to say, this time, MongoDB will return a maximum of 4 MB of data. In the preceding request, MongoDB returns the following:

In the returned results, 101st data records are returned starting from 34673. The size is 4194378, Which is exactly 4 m.

Set batchsize

As we mentioned above, the default batchsize of MongoDB is 101, which can be set on the client. In PHP, The batchsize function is used for setting. For example, we can use the following command to set the batchsize to 25:

$cursor = $collection->find()->sort( array( 'name' => 1 ) );$cursor->batchSize(25);$result = $cursor->getNext();

The above code calls getnext once. As mentioned above, N pieces of data will be retrieved at one time and returned to the client in batches. The network requests generated when the above Code is run are as follows:

We can see that number to return is set to 25.

If the getnext function is recycled for 25 times and the preceding code is executed for a total of 26 times, only 25 records are returned for the first time, therefore, when the getnext function is called for 26th times, the network request is triggered again. The request body is as follows:

Because we set the batchsize to 25, only 25 results are returned for this request. The server returns only 25 data records.

Use Limit

In addition to the batchsize function, there is also a way to control the number of records returned by each network request in batches, that is, to call the limit function on the volume cursor and directly set the number of records to be obtained.

For example, the following code sets limit to query the first 50000 records:

$cursor = $c->find()->sort( array( 'name' => 1 ) );$cursor->limit( 50000 );$res = $cursor->getNext();

The above code will send the following request

We can see that the number of returned data records is 50000, so does the MongoDB server return 50000 data records. Let's take a look at the specific returned data packets.

Unfortunately, the MongoDB server returns only 34678 messages, not the ideal 50000. In fact, the reason is also very simple. We can see from the value of message length, because the size of the Request package has reached 4 MB, this limit cannot be exceeded. Therefore, only 34678 data records can be returned.

At the same time, when the client receives the returned data packet, it finds that there are only 34678 data records, which is less than the 50000 data records required by the client, and the difference is 50000-34678 = 15322. Therefore, it will initiate another request, the server is required to return 15322 remaining records. As follows:

Combination of batchsize and limit

Sometimes we may need to retrieve a lot of data. For example, we set limit to 50000 to get 50000 data records, however, retrieving the 50000 data records may exceed the timeout limit of the cursor we set and throw an exception in cursor timeout. At this time, we can set the limit and the batchsize to control the interval of each two request servers. To avoid the cursor timeout caused by a large amount of data.

For example, in the following example, we want to obtain 128 data records, but set the batchsize to control that only 50 data records are retrieved from the server at a time. In this way, three network requests will occur in subsequent getnext calls, with 50, 50, and 28 requests respectively.

$cursor = $c->find()->sort( array( 'name' => 1 ) );$cursor->limit( 128 )->batchSize( 50 );$res = $cursor->getNext();// retrieve the other 127 documents that we still wantfor ($i = 0; $i < 127; $i++) { $cursor->getNext(); }

Minor problems with the batchsize Function

As mentioned above, we set the batchsize to control the data exchange between the client and the MongoDB server. However, here is a special case. When the batchsize is set to 1 or a negative number, MongoDB only returns the data packet for the first request and closes the connection directly. That is to say, if we execute the following command:

$cursor = $c->find()->sort( array( 'name' => 1 ) );$cursor->batchSize( 1 )->limit( 10 );$cursor->getNext();var_dump( $cursor->getNext() );

The last var_dump always returns NULL. Because only one piece of data is returned for each batchsize setting, and the connection is closed.

However, we only need to make a slight modification to change the batchsize to 2, and the situation is quite different.

$cursor = $c->find()->sort( array( 'name' => 1 ) );$cursor->batchSize( 2 )->limit( 10 );$cursor->getNext(); // item 1$cursor->getNext(); // item 2var_dump( $cursor->getNext() ); // item 3

We can see that although only two pieces of data are returned in the first packet returned from the network, the data is returned every three times when getnext is called, that is, the data is retrieved from the server for the second time.

-- 20121102 update --

Let's try what will happen if batchsize is-2:

We can see that-2 is specified in the sent request, indicating that the real meaning of the negative number is explained by the server.
The returned results are as follows:

If the value of cursor ID is 0, no cursor is available, that is, other data cannot be obtained through cursor.

In fact, through the above experimental results, we have a general understanding of the communication protocol of the MongoDB client server, for more details, see the MongoDB official documentation (Mongo wire Protocal)

Summary:
1. batchsize indicates the number of records returned by the server for each package.
2. The record returned by the package cannot exceed 4 MB.
3. Limit refers to the number of records the client wants (tries) to obtain
4. By default, if limit is not restricted, the number of documents returned in the first batch is 101.
5. If batchsize is negative or 1, the cursor will be destroyed and no subsequent data will be available without the cursor.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More