Use Huagepage and PGO to promote PHP7 performance _php skills

Source: Internet
Author: User
Tags fpm

Hugepage
PHP7 just released RC4, including some bug fixes and one of our latest performance improvements, that is, "hugepagefy PHP text segment", by enabling this feature, PHP7 will "move" its own text segment (actuator) To Huagepage, before the test, we can stable in WordPress see 2%~3% QPS Ascension.

About Hugepage is what, simple to say is the default memory is 4KB paging, and the virtual address and memory address need to convert, and this conversion is to check the table, the CPU in order to speed up this look-up table process will build TLB (translation lookaside Buffer) , obviously if the smaller the virtual page, the number of entries in the table more, and the TLB size is limited, the more entries the TLB cache Miss will be higher, so if we can enable large memory page can indirectly reduce the TLB cache Miss, as detailed introduction, Google to search a lot of I will not repeat, here is the main description of how to enable this new feature, resulting in a significant performance improvement.

The new Kernel enabled Hugepage has become very easy, taking my virtual machine for example (Ubuntu Server 14.04,kernel 3.13.0-45) If we look at memory information:

$ cat/proc/meminfo | grep Huge
Anonhugepages:  444416 kB
hugepages_total:    0
Hugepages_free:    0
HUGEPAGES_RSVD:    0
hugepages_surp:    0
hugepagesize:    2048 KB

It is visible that the size of a hugepage is 2MB, and Hugepages is not currently enabled. Now let's compile the PHP RC4, remember to not add: –disable-huge-code-pages (this new feature is enabled by default, you add this is off)

Then configure Opcache, starting from PHP5.5 Opcache is already enabled by default, but it is compiling the dynamic library, so we still have to configure the load in php.ini.

Zend_extension=opcache.so

This new feature is done in Opcache, so you also have to enable this feature via Opcache (by setting Opcache.huge_code_pages=1), specific configuration:

Opcache.huge_code_pages=1

Now let's configure the OS and assign some hugepages:

$ sudo sysctl vm.nr_hugepages=128
vm.nr_hugepages = 128

Now let's check the memory information again:

$ cat/proc/meminfo | grep Huge
Anonhugepages:  444416 kB
hugepages_total:   128
hugepages_free:   128 HUGEPAGES_RSVD
:    0
Hugepages_surp:    0
hugepagesize:    2048 KB

We can see that the 128 hugepages we've allocated are ready, and then we'll start php-fpm:

$/home/huixinchen/local/php7/sbin/php-fpm
[01-oct-2015 09:33:27] NOTICE: [Pool www] ' user ' directive is ignored when FPM are not running as root
[01-oct-2015 09:33:27] NOTICE: [Pool WW W] ' group ' directive is ignored when FPM isn't running as root

Now check the memory information again:

$ cat/proc/meminfo | grep Huge
Anonhugepages:  411648 kB
hugepages_total:   128
Hugepages_free:   113 HUGEPAGES_RSVD
:    -
Hugepages_surp:    0
hugepagesize:    2048 KB

Speaking of this, if Hugepages is available, in fact Opcache will also use Hugepages to store opcodes cache, so in order to verify that opcache.huge_code_pages does come into effect, we might as well close Opcache.huge_code _pages, then start again and look at the memory information:

$ cat/proc/meminfo | grep Huge
Anonhugepages:  436224 kB
hugepages_total:   128
Hugepages_free:   117 hugepages_rsvd
:
hugepages_surp:    0
hugepagesize:    2048 KB

Visible after the opening of the Huge_code_pages, FPM started after the use of more than 4 pages, now we check the php-fpm text size:

$ size/home/huixinchen/local/php7/sbin/php-fpm
  Text    data     BSS     Dec     hex   filename
10114565   695200   131528   10941293   a6f36d   /home/huixinchen/local/php7/sbin/php-fpm

Visible text segment has 10,114,565 byte size, the total need to occupy 4.8 or so 2M of pages, taking into account after the alignment (the tail is not moving the 2M page part), apply 4 pages, just as we see fit.

Description Configuration Success! Enjoy:)

But pre-announced, when you enable this feature, the problem is that if you try to profiling through perf Report/anno, you will find that the symbol is missing (Valgrind, GDB is not affected), which is mainly because the perf design uses a listening mmap, Then record the address range, do IP to symbolic conversion, but the current hugetlb only support Map_anon, so that perf think this part of the address has no symbolic information, I hope that later version of the kernel can fix this limit it.

GCC PGO
PGO as the name says (Profile guided optimization interested in Google), he needs to use a few use cases to get feedback, which means that the optimization needs to be tied to a particular scenario.

Your optimization of a scene may backfire in another scenario. It is not a generic optimization. So we can't simply include these optimizations, and we can't directly post PGO compiled PHP7.

Of course, we are trying to find some common optimization from the PGO, and then manually apply to PHP7, but this obviously can not do for a special optimization of the scene can achieve the effect, so I decided to write this article briefly describes how to use the PGO to compile PHP7, The PHP7 that you compile can make your own application faster and more unique.

The first thing to decide is what to take to feedback GCC, and we will generally choose: in the scenario you want to optimize: the most visited, the most time-consuming, and the most resource-consuming page.

Take WordPress For example, we choose the home page of WordPress (because the home page is often the largest number of visits).

Let's take my machine for example:

Intel (R) Xeon (r) CPU X5687 @ 3.60GHz X 16 (Hyper-threading)
48G Memory
PHP-FPM uses fixed 32 worker, Opcache adopts the default configuration (must remember to load Opcache)

Optimize the scene with WordPress 4.1.

First, we will test the current WP performance in PHP7 (ab-n 10000-c 100):

$ ab-n 10000-c http://inf-dev-maybach.weibo.com:8000/wordpress/
This is apachebench, Version 2.3 < $Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.z Eustech.net/licensed to the Apache Software Foundation, http://www.apache.org/Benchmarking inf-dev-maybach.weibo.com (  Be patient) Completed 1000 requests Completed requests Completed 3000 requests Completed 4000 requests Completed 5000 Requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 Requests finished 10000 requests Server software:nginx/1.7.12 server hostname:inf-dev-maybach.weibo.com server P ort:8000 Document Path:/wordpress/document length:9048 bytes Concurrency level:100 time taken for T ests:8.957 seconds Complete requests:10000 Failed requests:0 Write errors:0 Total transferred:92860000 b Ytes HTML transferred:90480000 bytes Requests per second:1116.48 [#/sec] (mean) time/request:89.567 [MS] (Me   A) time per request: 0.896 [MS] (mean, across all concurrent requests) Transfer rate:10124.65 [Kbytes/sec] Received
 

Visible WordPress 4.1 is currently on this machine, the home page QPS can to 1116.48. That is, every second you can handle so many requests to the home page,

Now, let's start by teaching gcc, get him to compile a run Wordpress4.1 faster PHP7, first ask for GCC versions above 4.0, but I suggest you use the GCC-4.8 version (now GCC-5.1).

The first step, nature is to download PHP7 source code, and then do./configure. None of this is any different.

And then there's the difference, and first we're going to compile the PHP7 first, and let it generate the executable file that generates profile data:

$ make Prof-gen

Note that we use the Prof-gen parameter (this is PHP7 makefile specific, don't try to do it on other projects:))

And then, let's start training gcc:

$ sapi/cgi/php-cgi-t 100/home/huixinchen/local/www/htdocs/wordpress/index.php >/dev/null

That is, let php-cgi run 100 times the homepage of WordPress, thus generating some profile information in this process.

Then we start the second compilation of PHP7.

$ make Prof-clean
$ make prof-use && make install

OK, so simple, PGO compilation is complete, now let's look at the performance of the PHP7 after PGO compilation:

 $ ab-n10000-c http://inf-dev-maybach.weibo.com:8000/wordpress/
This is apachebench, Version 2.3 < $Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.z Eustech.net/licensed to the Apache Software Foundation, http://www.apache.org/Benchmarking inf-dev-maybach.weibo.com (  Be patient) Completed 1000 requests Completed requests Completed 3000 requests Completed 4000 requests Completed 5000 Requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 Requests finished 10000 requests Server software:nginx/1.7.12 server hostname:inf-dev-maybach.weibo.com server P ort:8000 Document Path:/wordpress/document length:9048 bytes Concurrency level:100 time taken for T ests:8.391 seconds Complete requests:10000 Failed requests:0 Write errors:0 Total transferred:92860000 b Ytes HTML transferred:90480000 bytes Requests per second:1191.78 [#/sec] (mean) time/request:83.908 [MS] (Me   A) time per request: 0.839 [MS] (mean, across all concurrent requests) Transfer rate:10807.45 [Kbytes/sec] Received
 

Now can handle 1191.78 QPS per second, Ascension is ~7%. Not Raiha (eh, you say 10%?) How did it become 7%? Well, as I said before, we try to analyze what PGO have done, and then apply some generic optimizations manually to PHP7. So that means that ~3% 's more generic optimizations are already included in PHP7, Of course the work is still going on.

So it is so simple, you can use their own products in the classic scene to train gcc, a few simple steps to get promotion, why not?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.