Research on the memory cost of huge data objects in PHP

Last Update:2016-06-13 Source: Internet

Author: User

Tags apc

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Research on the memory cost of huge data objects in PHP
First of all, please do not misunderstand, not I want to publish what the research results on this issue, but I would like to ask you to help me to analyze this problem together:)

describe the simplified background of the issue:In a PHP-implemented Web site, all program files contain a common file common.php at the beginning. Now, due to business needs, a "mega" data Object (an array object containing about 500k int values) is defined in common.php, such as $huge _array = Array (three-to-one,..., 500000), and the entire system is $hu Ge_array only "read" Access. Assume that the system needs to continue to run stably in 100 concurrent requests.

Question 1:$huge _array in the common.php source file will probably occupy 10M (this is not a problem), loading into memory may occupy 4M (just estimate, as for the accurate measurement of its size, not the main points of this article to discuss). The problem is that PHP itself, each processing an HTTP request, is to enable a separate process (or a thread), is it not to re-load this memory of about 4M memory block it? If memory cannot be shared, it is likely to take up nearly 400M of physical memory at the same time, both in terms of memory consumption and memory access efficiency.

Question 2:When some kind of caching mechanism (such as APC, XCache, etc.) is enabled, we know that this kind of caching mechanism has the ability to cache opcode, but it just seems to reduce the duplication of the script compilation, and can it also play a role in sharing memory for runtime variable loading? Hope it can play a role, after all, that a lot of int value is certainly also as part of the opcode existence.

Question 3:If the above borrowing XCache to the opcode cache does not achieve the purpose, then I direct operation XCache will be effective? That is, instead of writing $huge _array in common.php, write it into the cache engine.

This article is intended to determine whether this usage can lead to memory usage bottlenecks, and how to optimize if there are problems. Of course, trying to reduce the size of the mega data object itself is the first and foremost consideration, but the topic of data structures and algorithms is not discussed in this article. You are welcome to publish your own analysis of this issue, if you can give your own ideas to design some operational test verification scheme is better, if you do not have time to write code, as long as the scheme seems reasonable, I would like to write test code ^_^

――――――――――――――――――――――――――――――――
Based on the plugin extension function provided by CSDN forum, I have made a signature file tool to share with you, welcome technical exchange:)

Share to:

------Solution--------------------
Since it is more than 500 k data, is not to consider the index, and then segmented processing it, why should all be loaded in???
------Solution--------------------
1, yes, to re-load in memory
2, since all the program files are included then should be cached, because he is also part of the opcode
3, that is a very boring thing, rather than optimize the data structure and algorithm
------Solution--------------------
1, the boss said
2, is a part of the opcode, of course, to optimize
3, if the compile time is greater than the time to get the data by index, then why compile it? You compile just to follow the key to get the corresponding data, when you have a way to solve the data by key, why must use the structure of the PHP array data? Is this a bit like getting data from a TXT file or getting it from a contained PHP?

For example, the contents of 1.txt are as follows
1111
2222
3333
4444
......

The contents of 1.php are as follows
$a = Array (
1111,
2222,
3333,
4444,
......
);

If every process is 1.php, initialize $ A, and get $a[3]
If it is 1.txt, then as long as the optimization of how to quickly get the 4th (3+1) line of content can be, all processes can be shared,

But there is a problem here: how to reasonably optimize the concurrency problem?

There is no common way to solve this problem, either the Space change time (400M does not have concurrent wait time), or the time to change space (4M, but there is waiting time to get the data)

------Solution--------------------
I think it's necessary for shared memory to handle big data, or else each HTTP process will increase memory pressure because of your big data. XCache and APC are opcode caches, right? That is, even if PHP does not use lexical parsing, directly to the opcode execution phase, still to allocate memory for large arrays, the focus is an HTTP request allocation of memory once, if you put this large array to memcache, that is, multiple HTTP requests share a piece of memory, since there is no write operation, 100 read concurrency Memcache still no problem, even when it comes to read and write concurrency, Memcached has supported optimistic locking.
------Solution--------------------
First, let's put this thing memcache.

Where you need it, that's it, that's enough.
------Solution--------------------
Reference:

The array initial value that contains a large number of int values, as constant data, is certainly part of the opcode. It's just that I'm not sure that when it is used in the program to assign a variable, it will be allocated a piece of memory to load it. If it is not allocated, it is to let this variable point directly to the address of the storage opcode, then the problem again, storage opcode block memory is the process of sharing it?

Don't think about it, opcode just omitted the steps to compile the syntax, for example:
$a = 1 + 1;
replaced with opcode.
Zend_add ~0 1 1
Just ~0 instead of $ A, if the real run ~0 or to separate space, otherwise, how to get the data php? If another process modifies this value, does it have to affect another process? Is it not possible to guarantee the logical correctness of each process? Because each process is the same code, I don't know if my $ A is still in the state of the first initialization?

So opcode cache does not cache data, just cache another code (script) format! The specific data is still to be re-allocated memory at the time of operation!

About opcode more can point me

In addition to the example I said (of course 1.txt can be replaced with any tool, memcache, SQL, etc.), just you can develop interfaces, such as reading a memory space, if you want the next content can also use this memory space to access the next content, so as to ensure that the memory space of each process is reduced

Pseudo code example:

while ($data = Get ()) {
Dosamething
}

Come up to
foreach ($datas as $data) {
}

Reduce memory consumption by increasing the time it takes to get a single piece of data

------Solution--------------------
Very want to join this topic, unfortunately, personal level is limited, cache what do not understand, say nothing

I talk about my personal opinion from another angle

Web programming is a must to consider concurrency, including server connections, rather than simply to solve the problem
This is also a lot of people do not understand the desktop development, so that the "PHP sucks," and the like to say

PHP's advantage is to quickly solve simple problems, complete and throw to the client, end a connection, vacated to the next request
Complex computing should actually be done with a more core language with higher-quality servers.

A complex compute-consumed resource is worthwhile for both the core language and the high-performance server, because the main purpose is to calculate the results
For example, NASA's server and the above practical procedures, should strive for excellence, may calculate the wrong decimal point billions of bits will cause "Mars hit the Earth" results

But the loss for the information-dissemination network is huge, because every second you consume, your information may be passed on to one or even dozens of people.
Memory consumption and time consumption are the same, memory usage is too large, also cause the TCP connection instability

Therefore, the large number of parameters may be very flexible to the program, to adapt to more users, but the general network access is a failure of choice
These parameters should be cut, the user group classification, each user can choose less parameters to achieve the goal is enough
For example, a global-oriented web site, not all languages are put together to do the parameter program adaptation, but by the user to select the language, only for this language to write code

If you have complex calculations, you should use other languages or controls to complete certain processes and reduce the response of your Web program
#6说的是其中一种情况, of course, this may not be appropriate for your needs, but his formulation is consistent with the business logic of spreading the website

Said so much, estimated also failed to solve your needs, can read I feel honored ...
------Solution--------------------
4M, a click, not too big

Simplify it to a value and then measure your current running PHP memory spikes compared to the look ....

------Solution--------------------
See if this code is helpful for you without loading the file into memory

 
  /*
111
222
63V
444
555

*/


$file = new Splfileobject (__file__);
$file->seek (3);//This is the number of rows, starting from 0
echo $file->current (). '
';
?>

Testing a 50M or so of XML takes less than 0.01 seconds-the greater the number of rows, the more time it takes



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More