Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PHP "require" Performance (gazehawk.com)
81 points by bkrausz on May 17, 2011 | hide | past | favorite | 37 comments


Always use Autoload if you can for large projects. The DRAMATIC performance losses of requiring files and classes you do not ever use during a page load can be pricey. Rasmus' quote was saying that PHP/APC can't save the opcode for a file WITH all of the includes if there is any conditional associated with the require. This means that each include's opcode will be pulled from cache each time (not a big deal!!!). [2006, http://pooteeweet.org/blog/538/]

Things in favor of Autoload:

1) Ease-of-use of not worrying about if your class is available

2) If you don't use Autoloading and require clases, you have to load all of those from the opcode cache (and make APC keep track of them) despite that you may not be using many of those classes at all.

3) It helps encourage putting things in classes (whether they are static functions or not).

4) If you ever need to conditionally require a class, you are likely going to run into the same opcode cache hits (instead of NOPs) as Autoload. You'll have to include ALL of your classes to avoid the problem.


I'm not sure #3 is really a benefit.


Keep in mind those numbers are microseconds. If you are looking for ways to improve your performance, there are probably lots of other places to focus your attention that will have a much greater impact. For example, optimizing your frontend will give you far more bang for the buck and in fact I'd argue that sometimes it's worth the extra 300 microseconds for the sanity of your development team.


I don't believe he provided units, since it was over a large number of runs, and the units would be relative and meaningless due to hardware and setup differences (he said as much in the comments, as well).


There should really be no reason why __autoload() should be significantly slower than require_once(). It's even trivially easy to ensure your classes are __autoloaded()'d from absolute paths to make opcode caches happy.


My initial thought was "surely it's just a quick hash lookup", but then require_once is probably using the canonical path for that lookup, which involves calling realpath() or abspath(), at that point all bets are probably off, especially in a shared hosting environments involving things like NFS.

Edit: I looked at the implementation. require_once/include_once unconditionally opens the file, whether it has been loaded already or not. Attempting to follow that path through about 10 levels of indirection (seriously), at some point the PHP implementation passes its notion of the path back to Zend, which suggests possible abspath() etc. Also somewhere in PHP's streams.c I see it unconditionally seek()ing the new file too.

If nothing else, then the layers of crap I just trawled through might cause some slowness, if not some heavy library call like abspath().

I just remembered why I stopped using PHP. :)


Ugh, you're bringing up bad memories for me. I was once bit badly by the require_once/realpath() problem.

require_once("/foo/bar/baz/random/crap/foo.php");

The above requires 5 lstat(2) system calls thanks to realpath() and a typical WordPress-MU page load for us was calling lstat over 5k times. Over NFS that's 5k round trips to the server for every page load. Thankfully there is a knob you can tune in PHP 5.1.x (realpath_cache_size) that caches all those lstats, but as usual, the default is (16K) is too small to be of any use.


Even if autoload is slower, it's such a simple concept that even if you use it early on and need to take it out later, you wouldn't have to restructure anything. If you know performance is that big of a deal upfront, anything bug vanilla PHP might not be the best choice since the interpreter will probably become a bottleneck.


Hmm, PHP 5.2+ is still that slow with require_once ?

I thought they fixed that.

Anyway, just use function_exists instead.


Yup, we're on 5.3. May have something to do with the relative vs. absolute paths (we use relative), but still...

Turns out we didn't need require_once anyway, all but a couple of includes in our entire codebase were in that one area.


As quickly as possible I turn relative paths to absolute paths (with __DIR__) to avoid any more overhead there. I also empty the include path; no need to have PHP searching for files all over the place.


Out of curiosity, what does your include_path look like? Is ./ the first element, the last, or somewhere in the middle?


. is the first entry


Every time I see an article like this I thank God I'm not using PHP. It's always, "Remember to always use do_something() instead of dosomething() or else something will suck for some reason."


This article is a serious micro-optimization and, in general, not something people should be that worried about. If you nit-pick enough, all platforms have these sorts of issues.


In a good language, include/require[_once] should all be reduced to the same one statement, e.g. Python's import. Don't you think it's a ridiculous notion that the distinctions between include and require (and the once variants) are meaningful enough that programmers should be able to express them?


PHP using __autoload and namespaces looks and feels exactly like Python using imports. You see, the PHP developers have been actively improving a lot of aspects of the language recently, and the latest release, 5.3, had a ton of really nice improvements and features, such as namespaces.

I'm not going to say PHP is the greatest language every invented, and in fact I am a huge fan of Python. But I do not think PHP deserves the "In a good language..." treatment. It is a language that is getting better everyday, with a strong and vibrant development community and massive user adoption. It is a good language.


Sorry, I have to disagree with you.

A language whose maintainers decided that, in the fifth major release, `goto` needed to be added is not getting better. Nor is one whose maintainers rationalize the choice of backslash as a third, distinct scope resolution operator -- instance-attribute, class-attribute, and now namespace-member.

The sheer number of fails with PHP warrants the 'good language' treatment. The fact that,

  08 === 0
alone is enough to make it deserving of this treatment.


amen. nobody chooses to use php on new projects but at least it's easy, fast, and pays the bills.


Really? I choose it every time I have a choice.

Especially with a Rails-like framework like CakePHP, I find that my programing time is lessened dramatically because of the framework, and my processing time is much lower than Rails because most of the servers I work on are optimized out the wazoo for PHP instead of Ruby.

But your mileage may vary.


Don't you think it's a ridiculous notion that the distinctions between include and require (and the once variants) are meaningful enough that programmers should be able to express them?

Not at all. They mean different things, and they're used for different things. Import-if-it's-available is include, import-or-error-if-missing is require, and they're both quite handy. The once variants do what most programmers would expect, but the non-once ones do what most template users would expect. Should the "once" have been a flag? Yeah, why not? But that doesn't mean the functionality shouldn't even exist.


The `once` distinction is a product of bad design and it only exists when you have a language that is both the templating (view) language and the processing language. To those of us accustomed to good languages, it's like saying "the language allows me to call render_once() so that I don't re-render a template I've already rendered."

When the problem is framed thus, most people would recognize that the solution to that problem is a better understanding on the part of the programmer of the flow of control inside his/her application.

The error handing should be done with exceptions. As it is, there are many errors in PHP which are recoverable, but they don't throw exceptions but instead emit other kinds of non-catchable failures. In Python, you would simply `try` to import a module and `catch` a failure to do so, e.g. if that module wasn't installed.


To those of us accustomed to good languages, it's like saying "the language allows me to call render_once() so that I don't re-render a template I've already rendered."

Right, and there's a case for that in languages that need a template library, as well. In some cases, you want to re-render a template for some other place in the output, and in other cases, you'd want to use a previously cached rendered template, and both of those cases are useful.

So, I agree that the distinction between the cases is the result of PHP being both a full programming language and a built-in templating language. I don't view this (in and of itself) as a bad thing, though I would quibble with the exact implementation, which is how it is for historical reasons.

When the problem is framed thus, most people would recognize that the solution to that problem is a better understanding on the part of the programmer of the flow of control inside his/her application.

Since you've already agreed that having the option for rerendering is a feature of including templating as a core feature, it seems like this statement is equivalent to declaring that templating should never be a core feature of a programming language. It's less about the flow of control and more about whether the result is cached. Even programmers who have a good understand of their programs' flow of control occasionally find memoization useful. :)


*_once doesn't memoize, however. The result of include_once called twice isn't that the same template is repeated, but rather that the first call outputs the template and the second call is silent.

include/require_once are not statements to the effect of "return same output as the last time I include/required this" but rather "if I haven't already include/required this, do it now."

It's a very difficult proposition to say that good programming requires the latter capability.


Uh, that's what import in Python does. It doesn't reimport; it ignores the import request if you've already done it, right? (I must confess I haven't used Python a lot in the last few years, but it used to be my primary language).

So, if you're willing to accept my assertion that Python is a good programming language, then you'll have to agree that that latter capability is the default in at least some good programming languages.

The former capability to which you refer, though, is quite often useful for including template fragments, so I wouldn't want to throw it out, either. If you said, "Hey, these only seem conceptually similar because of the name, and they're really different things", I'd be willing to go with that, I think. Or, if you were to say, "Hey, these are so similar that we shouldn't have a whole separate name for the behavior switch", I'd agree with that, as it's my position. However, it seems as though you're arguing that one of the two behaviors is never needed in a good web application domain language, and I disagree with that.


I think you misunderstood what I said.

First, it's important to understand that include/require_once is useful in PHP because of a particular pattern -- include/require the files containing classes and functions I need; these statements are placed at the top of every script that needs those definitions. It's an error in PHP to declare the same function twice (to redeclare). So if you include script A and B, and both depend on C, then you have to use require_once in A and B when they call in C. This way, anyone calling in both A and B won't trigger a redeclaration error.

This use case is not relevant to people using the statements to pull in templates, because you would deliberately place the require statement where you needed it. Someone using the same partial template in the header of a page and in the footer of a page would not care if the template had been invoked before -- "place this in the footer unless you already placed it somewhere else in the document (or even if you simply included it and threw it away, or emailed it to someone, or anything else at all)" would be a very poorly written template.

So what we have is the case that a set of processing scripts all include the same file at the top and this could potentially trigger redeclaration errors. So instead of addressing the fact that the interpreter cannot distinguish a common pattern (multiple inclusion) from something that really isn't even an error (redeclaration), we have four statements that serve very minor variations of the same function.

There are two binary choices here: require or include and once or not once. The distinction between include and require is totally unnecessary -- in what case would you want to optionally include another file, but not even receive notification or change your behavior depending on whether the file was able to be included? The _once distinction is only a guard against redeclaration, and it's beyond me why it matters that something was declared multiple times -- or why the programmer needs to count the number of inclusions.

Ideally, calling in a template would be different from calling in essential definitions, and would be treated differently.

BTW in Python, the import statement is idempotent -- importing multiple times has the same effect as importing once.


> The `once` distinction is a product of bad design and it only exists when you have a language that is both the templating (view) language and the processing language.

PHP takes it's cues from C -- that's where the idea of include() and require() come from. These functions pre-date the existence of OOP in PHP. There was a time when even the _once() functions didn't exist and you had to manage that yourself.

It's best to think of these functions as low-level building blocks you can use to build on your own abstraction layer. I never call those functions directly.

> The error handing should be done with exceptions.

PHP supports many styles of development. If you prefer errors, you can have those. If you prefer exceptions, you can convert all errors to exceptions with only a few lines of code (there is even a built in exception type for this purpose).


"Remember to always use do_something() instead of dosomething() or else something will suck for some reason."

I'd say that is the case for any language.


Really, every time I hear of PHP I honestly ask myself: "why on earth would anyone want to be using PHP in this day and age?"

It is like seeing people trying to cure cancer with leeches.


One way to improve cpu-bound performance is to switch to something like HipHop for PHP [1], which compiles PHP down to C++ and runs that. Granted, this incurs a non-trivial switching cost.

However, it's scary to me that someone is suggesting the potentially fatal-inducing use of "require" over "require_once" (if you call it twice on the same file, you end up re-defining whatever classes or methods were required and the script dies). It seems like optimizing for developer time and simplicity over cpu utilization is better here. Using HipHop would incur a one-time operational cost, but I suspect it would be worth the additional developer and speed gains.

[1]: https://github.com/facebook/hiphop-php/wiki/


By using __autoload and a class/file naming convention, you can get away with using just require, and not having to write require statements manually. Decreased time for developers, and near optimal load times.


HPHP is realistically only going to get you like 5% performance bump.

Performance Talk by Rasmus Lerdorf http://vimeo.com/12416792 (skip to 14:30)


I'm trying to understand the metrics here. Doesn't the first test show that _autoload is actually much faster than require or require_once?

require_once: 1579 require: 1318 __autoload: 578

The second test shows a similar result:

require_once: 1689 require: 1382 __autoload: 658

Am I missing something here?


Yes, __autoload is much faster, because it only ends up loading ~5 of the files. Previously we had included 30 files by default, and even though __autoload is slower on a file-by-file basis, the savings on the files we didn't need to load at all made it very worthwhile.


There's a really nice discussion on Stack Overflow about this:

http://stackoverflow.com/questions/186338/why-is-require-onc...


The require vs. require_once was a byproduct of this research: I was really interested in __autoload vs. require, as that seems to have fewer numbers online for it.


Asked author to include APC enabled/disabled as a variable in comparison.

Edit: APC was enabled




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: