Facebook's HPHP: Initial Comments
By cj on Feb 02, 2010
Facebook have announced a significant project around PHP that I saw previewed at a small tech summit last month. No, Facebook did not announce PHPVille.
Facebook announced HPHP, pronounced hip-hop. "HipHop programmatically transforms your PHP source code into highly optimized C++".
Have you hovered over the word 'Facebook' in the copyright banner of any facebook.com page? Chances are that it says "HPHP", indicating that the server handling your request was running it. Ninety percent of their web traffic is delivered by it and rollout across their servers continues.
What do we know about HPHP? Cherry picking just a few facts from my notes:
- Takes a complete PHP application and generates equivalent C++ code
- For Facebook's (huge) application, they end up with a 1Gb binary
- But this includes a built-in multi-threaded webserver
- No SSL yet
- Has XHProf gatherer built in. Facebook run this in production with a configurable sampling rate
- All extensions are currently statically linked
- Runs one process multi-threaded
- Has its own configuration file syntax: no .htaccess equivalent, but does support virtual hosts
- The performance benefit of HPHP being quoted is "on average it's about a 50% CPU saving". This is with g++ and the Gold linker. Database-heavy pages aren't going to benefit as much as PHP-centric pages
Although in the best Silicon Valley spirit HPHP is being called Beta, I'd say it is at the end of Alpha (or it was when I saw it). However it would only take tidy-ups to the build infrastructure to stabilize the experience.
Writing extensions involves starting with an IDL file describing the extension API that PHP scripts would call. A template C++ file with stubs is automatically generated from the IDL file. After that it's all your own C++ code. When HPHP becomes generally available I'll be interested in digging into this more.
A number of core extensions have been ported. Some way to use other existing PHP extensions was high on the summit attendees' wish-lists. There are some technical issues with globals and I don't know if Facebook will spend time on this, since they don't have a business requirement for it.
Quality and compatibility of HPHP appears high. I understood that a lot of development at Facebook still uses PHP even though the code is then deployed with HPHP. Facebook HPHP testing included comparisons with PHP:
- comparing the number of functions being called
- comparing network I/O
- comparing Error Logs
Details about which PHP version HPHP is compatible with (PHP 5.2.5) and what extensions are supported could soon be irrelevant as Facebook continue working on HPHP and if a community forms around it. My notes on Facebook's stated roadmap say:
- Make it work with Apache
- Catch up with PHP 5.3
- Minimize differences between HPHP and PHP
These indicate recognition that the crux of building a community is getting adoption. There have been other re-implementations of PHP in the past. Where have they got to? This new implementation has some clout behind it and there is vigor to create a community. Also I don't doubt Facebook will continue to use and improve HPHP and this might make it enticing to sites that have issues with capacity. The rest is up to you.
Finally, what else was discussed at the summit? Discussions ranged from specific implementation details about facebook.com to how Facebook could work with the PHP community. There was a significant exchange of information in both directions.
I'll briefly mention Memcached, which was of particular interest on the day. Facebook faced a lot of questions about its use and the improvements they've made (64-bit, UDP, multi-threading, ethernet drivers, compression, locking, batching, hashing). Facebook's UDP support has been mentioned at PHP conferences and is something people latch onto as if they had been dreaming of it all their life. However it brings complexity and has its own problems that had to be solved, including issues with handling large objects. Facebook also found UDP worked well when their datacenters' latencies were all the same, which is not always true now for them. Facebook stated they hope to merge back their useful Memcached changes to the open source community.