Hugepage
PHP7 just released RC4, including some bug fixes and one of our latest performance improvements, that is, "hugepagefy PHP text segment", by enabling this feature, PHP7 will "move" its own text segment (actuator) To Huagepage, before the test, we can stable in WordPress see 2%~3% QPS Ascension.
About Hugepage is what, simple to say is the default memory is 4KB paging, and the virtual address and memory address need to convert, and this conversion is to check the table, the CPU in order to speed up this look-up table process will build TLB (translation lookaside Buffer) , obviously if the smaller the virtual page, the number of entries in the table more, and the TLB size is limited, the more entries the TLB cache Miss will be higher, so if we can enable large memory page can indirectly reduce the TLB cache Miss, as detailed introduction, Google to search a lot of I will not repeat, here is the main description of how to enable this new feature, resulting in a significant performance improvement.
The new Kernel enabled Hugepage has become very easy, taking my virtual machine for example (Ubuntu Server 14.04,kernel 3.13.0-45) If we look at memory information:
$ cat/proc/meminfo | grep Huge
Anonhugepages: 444416 kB
hugepages_total: 0
Hugepages_free: 0
HUGEPAGES_RSVD: 0
hugepages_surp: 0
hugepagesize: 2048 KB
It is visible that the size of a hugepage is 2MB, and Hugepages is not currently enabled. Now let's compile the PHP RC4, remember to not add: –disable-huge-code-pages (this new feature is enabled by default, you add this is off)
Then configure Opcache, starting from PHP5.5 Opcache is already enabled by default, but it is compiling the dynamic library, so we still have to configure the load in php.ini.
Zend_extension=opcache.so
This new feature is done in Opcache, so you also have to enable this feature via Opcache (by setting Opcache.huge_code_pages=1), specific configuration:
Opcache.huge_code_pages=1
Now let's configure the OS and assign some hugepages:
$ sudo sysctl vm.nr_hugepages=128
vm.nr_hugepages = 128
Now let's check the memory information again:
$ cat/proc/meminfo | grep Huge
Anonhugepages: 444416 kB
hugepages_total: 128
hugepages_free: 128 HUGEPAGES_RSVD
: 0
Hugepages_surp: 0
hugepagesize: 2048 KB
We can see that the 128 hugepages we've allocated are ready, and then we'll start php-fpm:
$/home/huixinchen/local/php7/sbin/php-fpm
[01-oct-2015 09:33:27] NOTICE: [Pool www] ' user ' directive is ignored when FPM are not running as root
[01-oct-2015 09:33:27] NOTICE: [Pool WW W] ' group ' directive is ignored when FPM isn't running as root
Now check the memory information again:
$ cat/proc/meminfo | grep Huge
Anonhugepages: 411648 kB
hugepages_total: 128
Hugepages_free: 113 HUGEPAGES_RSVD
: -
Hugepages_surp: 0
hugepagesize: 2048 KB
Speaking of this, if Hugepages is available, in fact Opcache will also use Hugepages to store opcodes cache, so in order to verify that opcache.huge_code_pages does come into effect, we might as well close Opcache.huge_code _pages, then start again and look at the memory information:
$ cat/proc/meminfo | grep Huge
Anonhugepages: 436224 kB
hugepages_total: 128
Hugepages_free: 117 hugepages_rsvd
:
hugepages_surp: 0
hugepagesize: 2048 KB
Visible after the opening of the Huge_code_pages, FPM started after the use of more than 4 pages, now we check the php-fpm text size:
$ size/home/huixinchen/local/php7/sbin/php-fpm
Text data BSS Dec hex filename
10114565 695200 131528 10941293 a6f36d /home/huixinchen/local/php7/sbin/php-fpm
Visible text segment has 10,114,565 byte size, the total need to occupy 4.8 or so 2M of pages, taking into account after the alignment (the tail is not moving the 2M page part), apply 4 pages, just as we see fit.
Description Configuration Success! Enjoy:)
But pre-announced, when you enable this feature, the problem is that if you try to profiling through perf Report/anno, you will find that the symbol is missing (Valgrind, GDB is not affected), which is mainly because the perf design uses a listening mmap, Then record the address range, do IP to symbolic conversion, but the current hugetlb only support Map_anon, so that perf think this part of the address has no symbolic information, I hope that later version of the kernel can fix this limit it.
GCC PGO
PGO as the name says (Profile guided optimization interested in Google), he needs to use a few use cases to get feedback, which means that the optimization needs to be tied to a particular scenario.
Your optimization of a scene may backfire in another scenario. It is not a generic optimization. So we can't simply include these optimizations, and we can't directly post PGO compiled PHP7.
Of course, we are trying to find some common optimization from the PGO, and then manually apply to PHP7, but this obviously can not do for a special optimization of the scene can achieve the effect, so I decided to write this article briefly describes how to use the PGO to compile PHP7, The PHP7 that you compile can make your own application faster and more unique.
The first thing to decide is what to take to feedback GCC, and we will generally choose: in the scenario you want to optimize: the most visited, the most time-consuming, and the most resource-consuming page.
Take WordPress For example, we choose the home page of WordPress (because the home page is often the largest number of visits).
Let's take my machine for example:
Intel (R) Xeon (r) CPU X5687 @ 3.60GHz X 16 (Hyper-threading)
48G Memory
PHP-FPM uses fixed 32 worker, Opcache adopts the default configuration (must remember to load Opcache)
Optimize the scene with WordPress 4.1.
First, we will test the current WP performance in PHP7 (ab-n 10000-c 100):
$ ab-n 10000-c http://inf-dev-maybach.weibo.com:8000/wordpress/
This is apachebench, Version 2.3 < $Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.z Eustech.net/licensed to the Apache Software Foundation, http://www.apache.org/Benchmarking inf-dev-maybach.weibo.com ( Be patient) Completed 1000 requests Completed requests Completed 3000 requests Completed 4000 requests Completed 5000 Requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 Requests finished 10000 requests Server software:nginx/1.7.12 server hostname:inf-dev-maybach.weibo.com server P ort:8000 Document Path:/wordpress/document length:9048 bytes Concurrency level:100 time taken for T ests:8.957 seconds Complete requests:10000 Failed requests:0 Write errors:0 Total transferred:92860000 b Ytes HTML transferred:90480000 bytes Requests per second:1116.48 [#/sec] (mean) time/request:89.567 [MS] (Me A) time per request: 0.896 [MS] (mean, across all concurrent requests) Transfer rate:10124.65 [Kbytes/sec] Received
Visible WordPress 4.1 is currently on this machine, the home page QPS can to 1116.48. That is, every second you can handle so many requests to the home page,
Now, let's start by teaching gcc, get him to compile a run Wordpress4.1 faster PHP7, first ask for GCC versions above 4.0, but I suggest you use the GCC-4.8 version (now GCC-5.1).
The first step, nature is to download PHP7 source code, and then do./configure. None of this is any different.
And then there's the difference, and first we're going to compile the PHP7 first, and let it generate the executable file that generates profile data:
Note that we use the Prof-gen parameter (this is PHP7 makefile specific, don't try to do it on other projects:))
And then, let's start training gcc:
$ sapi/cgi/php-cgi-t 100/home/huixinchen/local/www/htdocs/wordpress/index.php >/dev/null
That is, let php-cgi run 100 times the homepage of WordPress, thus generating some profile information in this process.
Then we start the second compilation of PHP7.
$ make Prof-clean
$ make prof-use && make install
OK, so simple, PGO compilation is complete, now let's look at the performance of the PHP7 after PGO compilation:
$ ab-n10000-c http://inf-dev-maybach.weibo.com:8000/wordpress/
This is apachebench, Version 2.3 < $Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.z Eustech.net/licensed to the Apache Software Foundation, http://www.apache.org/Benchmarking inf-dev-maybach.weibo.com ( Be patient) Completed 1000 requests Completed requests Completed 3000 requests Completed 4000 requests Completed 5000 Requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 Requests finished 10000 requests Server software:nginx/1.7.12 server hostname:inf-dev-maybach.weibo.com server P ort:8000 Document Path:/wordpress/document length:9048 bytes Concurrency level:100 time taken for T ests:8.391 seconds Complete requests:10000 Failed requests:0 Write errors:0 Total transferred:92860000 b Ytes HTML transferred:90480000 bytes Requests per second:1191.78 [#/sec] (mean) time/request:83.908 [MS] (Me A) time per request: 0.839 [MS] (mean, across all concurrent requests) Transfer rate:10807.45 [Kbytes/sec] Received
Now can handle 1191.78 QPS per second, Ascension is ~7%. Not Raiha (eh, you say 10%?) How did it become 7%? Well, as I said before, we try to analyze what PGO have done, and then apply some generic optimizations manually to PHP7. So that means that ~3% 's more generic optimizations are already included in PHP7, Of course the work is still going on.
So it is so simple, you can use their own products in the classic scene to train gcc, a few simple steps to get promotion, why not?