Use Huagepage and PGO to improve the execution performance of PHP 7

Source: Internet
Author: User
This article describes how to use Huagepage and PGO to improve the execution performance of PHP 7. it is based on the related research of Laruence, a member of the PHP development team. For more information, see Hugepage
PHP7 has just released RC4, which includes some bug fixes and our latest performance improvement achievements, that is, "HugePageFy php text segment". by enabling this feature, PHP7 moves its TEXT segment (execution body) to the Huagepage. in the previous test, we can see 2% ~ 3% QPS increase.

What is Hugepage, simply put, the default memory is 4 KB paging, and the virtual address and memory address need to be converted, and the conversion is to look up the table, the CPU has built-in TLB (Translation Lookaside Buffer) to accelerate this lookup process. Obviously, the smaller the virtual page, the more entries in the table, and the smaller the TLB, the higher the number of entries, the higher the TLB Cache Miss. Therefore, if we can enable a large memory page, we can indirectly reduce the TLB Cache Miss. for details, I will not go into details when Google searches a lot. here I will mainly explain how to enable this new feature to bring about significant performance improvements.

The new Kernel has become very easy to enable Hugepage. take my development virtual machine as an example (Ubuntu Server 14.04, Kernel 3.20.- 45). if we view the memory information:

$ cat /proc/meminfo | grep Huge

AnonHugePages:  444416 kBHugePages_Total:    0HugePages_Free:    0HugePages_Rsvd:    0HugePages_Surp:    0Hugepagesize:    2048 kB

The size of a Hugepage is 2 MB, but HugePages is not enabled currently. now let's compile PHP RC4 first. remember not to add:-disable-huge-code-pages. (This new feature is enabled by default. if you add this feature, it will be disabled)

Then configure opcache. Opcache has enabled compilation by default since PHP5.5, but it is used to compile the dynamic library. Therefore, we need to configure and load it in php. ini.

zend_extension=opcache.so

This new feature is implemented in Opcache, so you must also enable this feature through Opcache (by setting opcache. huge_code_pages = 1). the specific configuration is as follows:

opcache.huge_code_pages=1

Now let's configure the OS and allocate some Hugepages:

$ sudo sysctl vm.nr_hugepages=128vm.nr_hugepages = 128

Now let's check the memory information again:

$ cat /proc/meminfo | grep Huge

AnonHugePages:  444416 kBHugePages_Total:   128HugePages_Free:   128HugePages_Rsvd:    0HugePages_Surp:    0Hugepagesize:    2048 kB

We can see that the allocated 128 Hugepages are ready, and then we can start php-fpm:

$ /home/huixinchen/local/php7/sbin/php-fpm

[01-Oct-2015 09:33:27] NOTICE: [pool www] 'user' directive is ignored when FPM is not running as root[01-Oct-2015 09:33:27] NOTICE: [pool www] 'group' directive is ignored when FPM is not running as root

Now, check the memory again:

$ cat /proc/meminfo | grep Huge

AnonHugePages:  411648 kBHugePages_Total:   128HugePages_Free:   113HugePages_Rsvd:    27HugePages_Surp:    0Hugepagesize:    2048 kB

If Hugepages is available, Opcache uses Hugepages to store the opcodes cache. huge_code_pages does take effect. we may close opcache. huge_code_pages:

$ cat /proc/meminfo | grep Huge

AnonHugePages:  436224 kBHugePages_Total:   128HugePages_Free:   117HugePages_Rsvd:    27HugePages_Surp:    0Hugepagesize:    2048 kB

It can be seen that after huge_code_pages is enabled, four pages are used after fpm is started. now let's check the text size of php-fpm:

$ size /home/huixinchen/local/php7/sbin/php-fpm

  text    data     bss     dec     hex   filename10114565   695200   131528   10941293   a6f36d   /home/huixinchen/local/php7/sbin/php-fpm

It can be seen that the text segment has a size of 10114565 bytes, which requires a total of about 4.8 pages of 2 MB. Considering the alignment (the part with less than 2 MB pages at the end does not move), apply for 4 pages, exactly match what we see.

The configuration is successful! Enjoy :)

However, if you try to use Perf report/anno to profiling, you will find that the symbols are lost (valgrind, gdb is not affected ), this is mainly because Perf is designed to listen to mmap, record the address range, and convert the IP address to the symbol. However, HugeTLB currently only supports MAP_ANON, therefore, Perf considers that this part of the address has no symbolic information. we hope that later versions of Kernel can fix this restriction ..

GCC PGO
PGO, as the name suggests (Google can be used if Profile Guided Optimization is interested), needs some use cases for feedback. that is to say, this Optimization needs to be bound to a specific scenario.

Your optimization for one scenario may be counterproductive in another scenario. it is not a general optimization. therefore, we cannot simply include these optimizations or directly release the php7after PGO compilation.

Of course, we are trying to find some common optimizations from PGO and Apply them to PHP7 manually, but this obviously cannot achieve the effect of special optimization for a scenario, so I decided to write this article to briefly introduce how to use PGO to compile PHP7, so that the PHP7 compiled by you can specifically make your own independent applications faster.

First, we need to decide what scenarios to use for Feedback GCC. we generally choose: In the scenario you want to optimize: the most visited, the most time-consuming, the most resource-consuming page.

Take Wordpress as an example. we select the Wordpress homepage (because the homepage is usually the most visited ).

Take my machine as an example:

Intel (R) Xeon (R) CPU X5687 @ 3.60 GHz X 16 (hyper-threading ),
48G Memory
Php-fpm uses 32 fixed workers, and opcache uses the default configuration (remember to load opcache)

Wordpress 4.1 is used as the optimization scenario ..

First, we will test the current performance of WP in PHP 7 (AB-n 10000-c 100 ):

$ ab -n 10000 -c 100 http://inf-dev-maybach.weibo.com:8000/wordpress/

This is ApacheBench, Version 2.3 <$Revision: 655654 $>Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking inf-dev-maybach.weibo.com (be patient)Completed 1000 requestsCompleted 2000 requestsCompleted 3000 requestsCompleted 4000 requestsCompleted 5000 requestsCompleted 6000 requestsCompleted 7000 requestsCompleted 8000 requestsCompleted 9000 requestsCompleted 10000 requestsFinished 10000 requests Server Software:    nginx/1.7.12Server Hostname:    inf-dev-maybach.weibo.comServer Port:      8000 Document Path:     /wordpress/Document Length:    9048 bytes Concurrency Level:   100Time taken for tests:  8.957 secondsComplete requests:   10000Failed requests:    0Write errors:      0Total transferred:   92860000 bytesHTML transferred:    90480000 bytesRequests per second:  1116.48 [#/sec] (mean)Time per request:    89.567 [ms] (mean)Time per request:    0.896 [ms] (mean, across all concurrent requests)Transfer rate:     10124.65 [Kbytes/sec] received

It can be seen that Wordpress 4.1 is currently on this machine, and the QPS of the homepage can reach 1116.48. that is, it can process so many requests to the homepage every second,

Now, let's start to teach GCC and let him compile PHP7 that runs Wordpress4.1 faster. First, we need GCC 4.0 or later versions, but I suggest you use a version of GCC-4.8 or above (all GCC-5.1 now ).

The first step is to download the source code of PHP 7, and then do./configure. there is no difference between these.

Next there is a difference. we need to first compile PHP7 so that it can generate an executable file that generates profile data:

$ make prof-gen

Note that the prof-gen parameter is used (this is unique to the Makefile of PHP7. do not try this on other projects too :))

Then, let's start training GCC:

$ sapi/cgi/php-cgi -T 100 /home/huixinchen/local/www/htdocs/wordpress/index.php >/dev/null

That is, let php-cgi run the wordpress homepage 100 times to generate some profile information during this process.

Then, we start to compile php7for the second time.

$ make prof-clean$ make prof-use && make install

Okay, that's simple. PGO compilation is complete. now let's take a look at the performance of PHP7 after PGO compilation:

$ ab -n10000 -c 100 http://inf-dev-maybach.weibo.com:8000/wordpress/

This is ApacheBench, Version 2.3 <$Revision: 655654 $>Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking inf-dev-maybach.weibo.com (be patient)Completed 1000 requestsCompleted 2000 requestsCompleted 3000 requestsCompleted 4000 requestsCompleted 5000 requestsCompleted 6000 requestsCompleted 7000 requestsCompleted 8000 requestsCompleted 9000 requestsCompleted 10000 requestsFinished 10000 requests Server Software:    nginx/1.7.12Server Hostname:    inf-dev-maybach.weibo.comServer Port:      8000 Document Path:     /wordpress/Document Length:    9048 bytes Concurrency Level:   100Time taken for tests:  8.391 secondsComplete requests:   10000Failed requests:    0Write errors:      0Total transferred:   92860000 bytesHTML transferred:    90480000 bytesRequests per second:  1191.78 [#/sec] (mean)Time per request:    83.908 [ms] (mean)Time per request:    0.839 [ms] (mean, across all concurrent requests)Transfer rate:     10807.45 [Kbytes/sec] received

Now we can process 1191.78 QPS per second. The increase is ~ 7%. not long enough (sorry, didn't you say 10%? Why is it 7%? Haha, as I said before, we try to analyze what PGO has done, and then Apply some general optimizations to PHP 7 Manually. that is to say, that ~ 3% of the more general optimizations have been included in PHP7, and of course this work is still going on ).

So it's that simple. you can use the classic scenario of your product to train GCC. it's just a few steps to improve your performance. why not?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.