How does HHVM improve PHP performance?

Source: Internet
Author: User
Tags coding standards hhvm php language zend


HHVM is a high-performance PHP virtual machine developed by Facebook, claiming to be 9 times times faster than the official, and I was curious, so I took a quick look at it and sorted out this article, hoping to answer two questions:

    • Is HHVM really reliable? Is it possible to use the product?
    • Why is it much faster than the official PHP? How exactly is it optimized?

What would you do?

Before discussing the principle of HHVM implementation, let's put ourselves in the first thought: if you have a PHP-written website that has a performance problem, after analyzing it, you find that a large part of the resources are spent on PHP, then how will you optimize PHP performance?

For example, there are several ways to do this:

    • Scenario 1, migrating to better-performing languages such as Java, C + +, Go.
    • Scenario 2, using RPC to separate the functionality out of other languages, so that PHP do less things, such as Twitter put a lot of business logic in Scala, the front-end of the Rails is only responsible for the presentation.
    • Scenario 3, Write PHP extensions, and change C + + in the area of performance bottlenecks.
    • Scenario 4, Optimizing PHP performance.

Scenario 1 is almost impossible, and a decade ago Joel took Netscape's example to warn you that you would give up years of experience, especially in a complex business logic like Facebook, where there are too many PHP code, allegedly 20 million lines (quoted from [PHP on the Metal with HHVM]), the cost of the change is probably bigger than writing a virtual machine, and for a team of thousands, learning from scratch is unacceptable.

Scenario 2 is the safest option that can be moved gradually, in fact Facebook is also working on it, and has developed an RPC solution such as Thrift, the other language that Facebook uses mainly is C + +, from the early Thrift The code can be seen because other languages are poorly implemented and cannot be used in a production environment.

Facebook is now said to have increased from 9:1 to 7:3, plus the presence of Andrei Alexandrescu, C + + is more popular on Facebook, but this only solves part of the problem, after all, C + + development cost is higher than PHP php:c++ , is not suitable for frequently modified places, and too many calls to RPC can seriously affect performance.

Scenario 3 looks good, practical implementation is difficult, generally speaking, performance bottlenecks are not very significant, most of the cumulative results, coupled with the high cost of PHP extension development, this scenario is generally used only in the public and small change of the base library, so this solution can not solve many problems.

As you can see, the previous 3 scenarios do not solve the problem well, so Facebook has no choice but to consider the optimization of PHP itself.

Faster PHP

Since you want to optimize PHP, how to optimize it? In my opinion, there are several ways to do this:

    • The program 1,php language level optimization.
    • Scenario 2, optimize the official implementation of PHP (i.e. Zend).
    • Scenario 3, compile PHP into bytecode (bytecode) in other languages and run with virtual machines (such as JVMs) in other languages.
    • Scenario 4, you convert PHP to C + + and then compile the cost code.
    • Scenario 5, develop a faster PHP virtual machine.

PHP Language level optimization is the simplest possible, of course, Facebook, and also developed a xhprof such a performance analysis tool, for locating performance bottlenecks is very helpful.

But Xhprof is still not able to solve the Facebook problem, so we continue to look, Next is Scenario 2, in brief, the Zend execution process can be divided into two parts: PHP compiled into opcode, execute opcode, so the optimization Zend can be considered from both aspects.

Optimizing opcode is a common practice that avoids parsing PHP repeatedly, and can also do some static compilation optimizations, such as Zend Optimizer Plus, but due to the dynamic nature of the PHP language, this optimization method is limited, and optimistic estimation can only improve the performance of 20%. Another consideration is to optimize the opcode architecture itself, such as a register-based approach, but this practice is too much to modify, and the performance gains are not particularly noticeable (maybe 30%?). ), so the input-output ratio is not high.

Another way is to optimize the execution of the opcode, first simply mention how Zend is executed, Zend interpreter (also known as the interpreter) after reading opcode, according to different opcode call different functions (in fact, some are switch, but in order to describe the convenience of my Simplified), and then perform a variety of language-related operations in this function (see the book for an in-depth understanding of the PHP kernel), so there is no complex encapsulation or indirect call in Zend, and it's done well as an interpreter.

To improve the performance of Zend, you need to have a solution to the underlying execution of the program, such as a function call is actually a cost, so can be optimized by the inline threading, it is like the C language of the inline keyword, but it is at runtime to the relevant functions of the expansion, Then execute in turn (just for example, the actual implementation is not the same), but also avoid the CPU pipeline prediction failure caused by waste.

You can also use the assembly to implement interpreter, like JavaScriptCore and Luajit, with detailed advice on what Mike explains

But these two approaches are too expensive, even hard to write a specific proportion, especially to ensure backward compatibility, the following reference to the characteristics of PHP you know.

The development of a high-performance virtual machine is not a simple thing, the JVM took more than 10 years to achieve the current performance, whether it can directly use these high-performance virtual machines to optimize the performance of PHP? This is the idea of scenario 3.

In fact, such a scheme has long been tried, such as Quercus and IBM P8,quercus almost no one to use, and P8 has died. Facebook has also researched this approach, and there have even been anecdotal rumors, but Facebook abandoned it in 2011.

Because Scenario 3 looks good, but the actual effect is not ideal, according to many Daniel (such as Mike), the VM is always optimized for a language, other languages in the implementation of the above will encounter many bottlenecks, such as dynamic method calls, this is described in the Dart documentation, and it is said that Quercus The performance is not much better than ZEND+APC ([from the HipHop Compiler for PHP]), so it doesn't make much sense.

But OpenJDK has been working hard these past few years, and the recent Grall project looks pretty good, and the language has achieved significant results, but I haven't had time to study grall, so there's no way to judge.

Next is Scenario 4, it is HPHPC (HHVM predecessor) approach, the principle is to convert the PHP code into C + +, and then compiled into a local file, can be considered an AOT (ahead of time), the technical details of the code conversion can refer to the HipHop Com Piler for PHP This paper, the following is one of the papers, you can use it to understand:

The biggest advantage of this approach is that it is simple (relative to a VM), and can do a lot of compilation optimization (because it is offline, it is OK to slow), such as the above example will be - 1 optimized, but it is difficult to support PHP in a lot of dynamic methods, such as, eval() create_function() Because this has to be embedded in a interpreter, the cost is not small, so HPHPC simply does not support these grammars directly.

In addition to HPHPC, there are two similar projects, one is Roadsend, the other is PHC, the PHC approach is to convert PHP into C re-compiled, the following is the example that it will be file_get_contents($f) converted to C code:

  1. static php_fcall_info fgc_info;
  2. php_fcall_info_init ("file_get_contents", &fgc_info);
  3. php_hash_find (LOCAL_ST, "f", 5863275, &fgc_info.params);
  4. php_call_function (&fgc_info);

The PHC author once cried on the blog, said he went to Facebook two years ago to demonstrate the PHC, but also with the engineers there, the results of the release of the fire, and his busy 4 years but unknown, now the future is slim ...

Roadsend also has not maintained, for the dynamic language such as PHP, this practice has a lot of limitations, because unable to dynamically include,facebook all the files are compiled together, on-line file deployment incredibly up to 1G, more and more unacceptable.

There is also a project called PHP QB, because of the time I did not see, the feeling may be similar east.

So there is only one way left, that is to write a faster PHP virtual machine, will be a footmarks go to the end, perhaps you and I, the first to hear that Facebook to make a virtual machine is too outrageous, but if careful analysis will find that the only way.

Faster virtual machines

Why is HHVM faster? In a variety of news reports have mentioned the key technology of JIT, but in fact far less simple, JIT is not a magic wand, with it gently flick to improve performance, and JIT this operation itself will be time-consuming, for the simple program is more than interpreter slow, the most extreme example is Luajit 2 of the interpreter is slightly faster than the V8 JIT, so there is no absolute thing, more or in the details of the processing, HHVM development history is the history of continuous optimization, you can see how it is a little more than HPHPC:

It is worth mentioning that the new virtual machine ART in Android 4.4 is using the AOT scheme (remember?). The hphpc mentioned above is this), the result is faster than the previous use of JIT Dalvik, so that the JIT is not necessarily faster than AOT.

So the project is very risky, if there is no strong heart and perseverance, it is likely to give up halfway, Google has wanted to use the JIT to improve the performance of Python, but eventually failed, for Google, the use of Python is not actually a performance problem (well, before Google wrote crawl [reference in the Plex] with Python, but that was 1996.

There is obviously more motivation and determination than Google,facebook, PHP is the most important language of Facebook, and we look at what Daniel has put into this project (not all):

    • Andrei Alexandrescu, author of "Modern C + + Design" and "C + + Coding standards", the undisputed great God of C + + field
    • Keith Adams, who was in charge of VMware's core architecture, sent him on a technical partnership with Intel to demonstrate how much he knew about the VMM field.
    • Drew Paroski, who has been involved in. NET virtual machine Development in Microsoft, has improved its JIT
    • Jason Evans, developed Jemalloc, reduces Firefox's memory consumption by half
    • Sara Golemon, author of "Extending and Embedding PHP", PHP kernel expert, this book estimates all the PHP pros have seen it, perhaps you do not know that she is actually a woman

While there is no top-level expert in virtual machines like Lars Bak and Mike Pall, what challenges will they face if they work together to write a virtual machine or if the problem is small? We'll talk about it all next.

What is the specification?

The first problem of writing PHP virtual Confidential is that PHP has no language specification, and that many versions of the syntax are incompatible (even minor version numbers, such as 5.2.1 and 5.2.3), how is the PHP language specification defined? Take a look at a statement from the IEEE:

The PHP group claim that they has the? Nal say in the speci?cation of (the language) PHP. This groups speci?cation is a implementation, and there is no prose speci?cation or agreed validation suite.

So the only way is to honestly see the realization of the Zend, fortunately HPHPC has already suffered once, so HHVM can directly use off-the-shelf, so this problem is not too big.

Language or extension?

Implementing the PHP language is not just about implementing a virtual machine, but the PHP language itself includes various extensions that are integrated with the language, and Zend tirelessly to implement the various features you might use. If you analyze PHP code, you will find its C code to remove the empty line comments after the 80+, and you guess how many Zend engine part? There are less than 100,000 rows.

This is not a bad thing for developers, but it's tragic for the engine implementation, we can compare Java, write a Java virtual machine just realize bytecode interpretation and some basic JNI calls, Java most of the built-in libraries are implemented in Java, so if performance optimization is not considered , the implementation of PHP VMS is much more difficult than the JVM, for example, a JVM Doppio is implemented with 8,000 rows of TypeScript.

The solution to this problem, HHVM, is simply to implement Facebook, and it can be used first in hphpc, so the problem is not very good.

Implement interpreter

Next is the implementation of interpreter, after parsing PHP will generate HHVM own design of a bytecode, stored in ~/.hhvm.hhbc (SQLite file) in order to reuse, in the execution of bytecode and Zend similar, but also the different bytes stacked to different function to implement (this way in the virtual machine has a special name: subroutine threading)

The main implementation of interpreter in Bytecode.cpp, such as VMExecutionContext::iopAdd This method, the final execution will be based on different types to distinguish, such as the implementation of the add operation is in the Tv-arith.cpp, the following excerpt one of the small paragraph

  1. if (c2.m_type == KindOfInt64) return o(c1.m_data.num, c2.m_data.num);
  2. if (c2.m_type == KindOfDouble) return o(c1.m_data.num, c2.m_data.dbl);

It is because of the INTERPRETER,HHVM in the support of PHP syntax than HPHPC has a significant improvement, in theory, fully compatible with official PHP, but only in this performance is not much better than Zend, because the variable type can not be determined, so you need to add a similar conditional judgment statement, However, this code is not good for modern CPU execution optimization, another problem is that the data are boxed, each reading needs to be m_data.num m_data.dbl obtained indirectly through a similar method.

For such a problem, you have to rely on the JIT to optimize.

Achieve JIT and optimization

First of all, it is worth mentioning that the JIT before the PHP is not nobody tried:

    • 2008, someone used LLVM experiment, the result is 21 times times slower than the original ...
    • 2010 IBM Japan Research Institute based on their JVM virtual machine code developed P9, performance is the official PHP 2.5 to 9.5 times times, can see their paper Evaluation of a just-in-time compiler retrofitted F or PHP.
    • 2011 Andrei Homescu based on Rpython developed, also wrote a paper happyjit:a tracing JIT compiler for PHP, but the test results are good and bad, not ideal.

So what exactly is JIT? How to implement a JIT?

In the dynamic language basically there will be an eval method, you can pass it a string to execute, JIT is doing something similar, but it is to splice is not a string, but the machine code under different platforms, and then execute, but how to use C to achieve it? You can refer to this introductory example written by Eli, the following is a piece of code in this article:

  1. unsigned char code[] = {
  2. 0x48, 0x89, 0xf8, // mov %rdi, %rax
  3. 0x48, 0x83, 0xc0, 0x04, // add $4, %rax
  4. 0xc3 // ret
  5. };
  6. memcpy(m, code, sizeof(code));

However, the manual programming of the machine code is very error-prone, so it is best to have an auxiliary library, such as the Mozilla Nanojit and Luajit dynasm, but HHVM did not use these, but the implementation of a only support x64 (also trying to use VIXL to generate AR M 64-bit), mprotect to make the code executable.

But why is the JIT code faster? You can think of the fact that the code written in C + + will eventually compile the machine code, if it is just the same code manually turned into the machine code, and what is the difference between GCC generated? While we mentioned some of the techniques for optimizing the CPU implementation principle, the more important optimization in the JIT is to generate specific instructions based on the type, which drastically reduces the number of instructions and conditional judgments, and the following figure from Tracemonkey is a very straightforward comparison, which we'll see later HHVM Specific examples of:

HHVM is executed first through interpeter, so will it use JIT at the time? There are 2 kinds of common JIT triggering conditions:

    • Trace: Record the number of cycles executed and JIT the code if a certain amount is exceeded
    • Method: Record the number of function executions, JIT the entire function if it exceeds a certain number, or even directly inline

What's better about both of these methods a post on Lambada led to discussions among the great gods, especially Mike Pall (Luajit author), Andreas Gal (Mozilla VP) and Brendan Eich (Mozilla CTO) have published a lot of their own The view, recommend everyone onlookers, I here is not caught dead.

The difference between them is not only the compilation scope, but also a lot of detail problems, such as the processing of local variables, here does not expand the

But HHVM does not adopt these two methods, but has created a method called Tracelet, which is divided according to the type, see the following picture

You can see that it divides a function into 3 parts, the above 2 is for processing $k as an integer or a string two different cases, the following part is the return value, so it seems that it is mainly based on the type of change in the JIT region, specifically how to analyze and disassemble Tracelet The details can be seen in the Translator.cpp Translator::analyze method, I do not have time to see, here is not discussed.

Of course, to achieve high-performance JIT also need to make a variety of attempts and optimizations, such as the original HHVM new tracelet will be placed in front, that is, a and C swap positions, and then try to a down to the back, the results of performance prompted 14%, because the test found that it is easier to advance the type of response to hit

The JIT execution process is to first convert the HHBC to SSA (hhbc-translator.cpp), then optimize the SSA (such as Copy propagation), and then generate the local machine code, such as X64 under the Translator-x64.cpp Implemented.

Let's look at a simple example of what the HHVM eventually generates, such as the following PHP function:

  1. function a($b){
  2. echo $b + 2;
  3. }

This is what it looks like after compiling:

  1. MOV rcx,0x7200000
  4. MOV rdx,0x20
  5. Call 0X2651DFB
  6. CMP BYTE PTR [rbp-0x8],0xa
  7. Jne 0xae00306
  8. ; The front is to check if the parameters are valid
  9. mov Rcx,qword PTR [rbp-0x10]; The%RCX is assigned a value of 1.
  10. MOV edi,0x2; Assigns a value of 2 to the%edi (that is, the low 32-bit of%rdi)
  11. Add RDI,RCX; Plus%RCX.
  12. Call 0X2131F1B ; calls the Print_int function, at which point the value of the first parameter%rdi is 3.
  13. ; No discussion at the back
  14. mov BYTE PTR [rbp+0x28],0x8
  15. Lea Rbx,[rbp+0x20]
  16. Test BYTE PTR [r12],0xff
  17. Jne 0xae0032a
  18. Push QWORD PTR [rbp+0x8]
  19. mov Rbp,qword PTR [rbp+0x0]
  22. mov Rdx,qword PTR [RSP]
  23. Call 0x236b70e
  24. Ret
Copy Code

and hphp: The implementation of the:p Rint_int function is this:

  1. void print_int(int64_t i) {
  2. char buf[256];
  3. snprintf(buf, 256, "%" PRId64, i);
  4. echo(buf);
  5. TRACE(1, "t-x64 output(int): %" PRId64 "\n", i);
  6. }

Can see HHVM compiled code directly used int64_t , to avoid the need to judge the interpreter in the parameters and the problem of indirect data, thereby significantly improving performance, and ultimately even to the C compiled code with little difference.

Note: HHVM in server mode, only more than 12 requests will trigger the JIT, the HHVM can be started by adding the following parameters to make it the first request to use the JIT:

  1. -v Eval.JitWarmupRequests=0

Therefore, in testing the performance needs to be noted, run one or two times to compare is not see the effect.

Type derivation is a hassle, or forcing programmers to write clearly.

The key to JIT is guessing type, so the type of a variable is difficult to optimize, so HHVM engineers began to consider the PHP syntax to tamper with the type of support, launched a new language-Hack (spit slot This name is really not conducive to SEO), it looks as follows:

  1. class Point2 {
  2. public float $x, $y;
  3. function __construct(float $x, float $y) {
  4. $this->x = $x;
  5. $this->y = $y;
  6. }
  7. }
  8. //来自:

floatdid you notice the keyword? Having a static type allows HHVM to better optimize performance, but it also means that it is incompatible with PHP syntax and can only be used with HHVM.

In fact, I personally think that the greatest advantage of this is to make the code more understandable, reduce unintentional mistakes, like the optional type in Dart is also the original intention, but also convenient for the IDE to identify, it is said that Facebook is also developing a WEB-based IDE, can work together to edit code, can look forward to.

Can you use HHVM?

In general, compared to the previous HPHPC, I think HHVM is worth a try, it is a real virtual machine, to better support the various PHP syntax, so the cost of change will not be higher, and because it can be seamlessly switched to the official PHP version, so you can start FPM at the same time to stand by, HHVM and Fas tcgi interface convenient to call, as long as a good emergency record, the risk is controllable, in the long run is very promising.

How much performance can be improved I'm not sure that I need to take my business code for real testing so that I can really see how much revenue HHVM can bring, especially for overall performance improvements, and only get this data to make decisions.

Finally tidy up the problems that may be encountered, there are plans to use the reference:

    • Extension problem: If PHP extension is used, it must be rewritten, but the HHVM extension is much simpler to write than Zend, the details can be seen on the wiki.
    • HHVM Server Stability Issues: This multi-threaded architecture may run into memory leaks for some time, or an otherwise not-written PHP directly causes the entire process to hang up, so you need to be aware of this test and disaster mitigation measures.
    • Problem fix Difficulty: HHVM will be more difficult to repair than Zend, especially JIT code, can only expect it to be more stable.

P.S. In fact, I only understand the basic knowledge of virtual machine, also did not write a few lines of PHP code, a lot of things are writing this article when the temporary to find information, due to the limited time haste, there will inevitably be incorrect places, welcome everyone comments Enlighten:)

January 2014 supplement: At present HHVM in the humble factory promotion momentum is very good, recommend everyone in 2014 to try, especially now the compatibility test has reached 98.58%, modify the cost to further reduce.


    • Andrei Alexandrescu on AMA
    • The telltale signs of Keith Adams on the HN.
    • How three Guys rebuilt the Foundation of Facebook
    • PHP on the Metal with HHVM
    • Making Hphpi Faster
    • HHVM optimization Tips
    • The HipHop Virtual Machine (HHVM) PHP execution at the speed of the JIT
    • Julien Verlaguet, facebook:analyzing PHP statically
    • Speeding up php-based development with HHVM
    • Adding an opcode to HHBC
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.