PHP Hacking aka trying to push PHP internals forwards

pbiggar · on June 10, 2011

I've read a lot of the PHP internals source code (I'm an author of phc), and this post leaves me a tiny bit optimistic, and a tiny bit disappointed.

I completely agree of the need to fork to make PHP a better language. The PHP internals community is the most awful OSS community I've ever been involved in, and getting shit done there just isn't going to work. So this is a good first step.

I also like the things he has taken away. PHP suffers from too many unnecessary options. I know that "unnecessary" is often subjective and in the eye of the beholder, but the OP seems to have a good eye for this. The things he removed are a blight on the codebase, so well done.

The optimizations are so-so. While I can see the benefits of the hardcoded constants, the other optimizations are hacks. And that has historically been the problem with PHP optimizations - they're all hacks. The engine needs rearchitecting to be fast, and all these hacks need to be removed, not more added.

Take for example that strlen is slow. The problem is that all function calls in PHP are slow. And if you looked at the function calling code, it's very obvious why. The correct solution is to make _all_ function calls fast, presumably by using some kind of inline cache, or just refactoring a ton of that crap out.

(Actually, someone introduced a patch to PHP-internals about two years ago to cache function calls somehow. I don't remember the exact details, but it was similar to the concept of an inline cache, perhaps storing a function cache struct in some bytecode or something. As I recall, it was denegrated as not being a full PIC. Sigh).

Finally, I really don't like the added functions. No problem with the functions themselves, just that adding functions to core PHP is not useful to anyone but the author. What if they clash with a user's function names? Why couldn't they just be implemented in user space? Etc. This is very much going on the wrong direction.

In the future, I'd like to see an effort like this go wild and make backwards incompatible changes like making the needle/haystack parameters consistent, but that's optimistic. (If they were to do it, phpcompiler.org could well be used to provide a 2to3-like too for doing the user-code porting automatically).

So overall, some good and some bad. If the OP focussed on making optimizations which improved the code base, removing more crap, and avoided adding "new" things, this would be really nice.

adaml_623 · on June 10, 2011

Good to see. It's about time PHP got forked even if experimentally.

It seems to me that you could fix a lot of the problems with PHP by breaking backwards compatibility.

jcoby · on June 10, 2011

Many of the problems and annoyances with PHP could be fixed while keeping backwards compatibility.

Just a short list I can think of now:

- Short array syntax. Completely optional but would clean up the language quite a bit. This has been proposed several times and shot down each for no real good reason.

- Promote complex primitives into objects. With the proper interfaces it wouldn't break any old code. Once this is done, deprecate the myriad of str* and array* functions in favor of the object methods. This also cleans up the "what order do i pass in the args" problem that so many people cite.

- Named parameters. Gets rid of passing in default params or an array blob for functions/methods that take lots of optional params. It also opens up the way for DSLs.

- Replace PEAR with something better. The quality of PEAR modules is really low and PEAR always seems to be broken one way or another.

- Add a list and hash type. PHP's array is very slow for some cases, and sometimes you need to enforce the datatype you're working with.

- Use exceptions for errors. Could be a runtime flag to prevent breaking things. It's incredibly easy to write bad code because PHP make it easy to ignore problems. Things like functions that connect to the db or trying to retrieve data from a stream don't throw exceptions on error.

adaml_623 · on June 10, 2011

Well to be honest if you deprecate the str* and array* functions you are pretty much breaking backwards compatibility.

I'm not saying it wouldn't be a good thing but those functions are one of the defining features of PHP imho

josegonzalez · on June 10, 2011

Deprecation is not the same thing as removal.

abarrett · on June 10, 2011

Regarding your first point: it seems short syntax for arrays may well make it into PHP 5.4.

http://www.reddit.com/r/PHP/comments/hw8da/php_fork_based_on...

wvenable · on June 10, 2011

PHP already provides a mechanism to replace errors with exceptions (set_error_handler() and the ErrorException exception type). I do this on all my projects.

jcoby · on June 10, 2011

I'm aware of that method. It's not an ideal approach IMO.

Specific errors should throw specific exceptions. That way I can handle the ones I know how to handle and let the others bubble up and either completely break the app to be logged and fixed later or get handled correctly somewhere else. With ErrorException, you have to look at the contents of the error message to see if it's an error you can handle or need to rethrow.

wvenable · on June 10, 2011

If something in PHP triggers an error, there is very little to recover from. It's either a developer error or an unrecoverable problem (a file expected to exist was not found, network is down, etc). In any of my projects, I've never needed to recover from PHP ErrorException.

VMG · on June 10, 2011

What would a sane version of PHP look like? What are the good parts of PHP besides easy setup?

qntm · on June 10, 2011

The good parts of PHP, just off the top of my head: Extremely forgiving syntax which makes it quick and easy to bash together something that just works, implicit type conversion, a vast array of built-in functions, readability, near-ubiquity, generally very strong documentation with a healthy focus on example code, and the way it interpolates with static HTML, which I'm sure there's a word for but I can't remember offhand.

code_duck · on June 11, 2011

What I see as the big strength is how easy it is to set up mod_php - possibly in no small part due to how most Linux distributions have solid defaults for mod_php5 and keep it up to date. The things you listed I'm not so sure about.

I don't think it's the syntax that's forgiving as much as the type coercion. JavaScript is actually more forgiving syntax wise, for example - PHP throws a fatal error when a semicolon is missing at the end of a line, JS interpreters fill it in. Unlike PHP, though it will throw a ReferenceError and die upon referencing any undefined value. PHP just coerces that to something falsey... they'll both die trying to call undefined functions or methods.

I don't see how PHP is not more readable than Python or Ruby, either. There's really nothing about the language itself that makes it readable or better for beginners.

The docs are definitely good, I agree. I think they need to do something about the outdated and not-so-well curated comments on each entry though.

Near ubiquity is good, but it also makes a lot of good PHP resources get lost in horrible noise. WordPress, for instance - if you search to solve anything, you find scores of outdated tutorials offering solutions to problems which have been solved by the official branch for years. Then, you find more blogs blog-spamming the incorrect solution... so it's actually easier to search for and find Rails or Django help, in my experience. A large percentage of the PHP code out there is notably terrible, for whatever reasons.

The 'interpolation' would be called templating. You can do this in most languages (erb for Ruby, for example) but it's not always considered a good idea to give the template language logic capabilities. It happens that this is the default for how PHP and Apache work together, and I think the straightforward concepts of files, directories, scripts it's easier for beginners to understand and work with than framework style URL routing.

The large software eco-system centered around tools consumers use such as WordPress or joomla has helped PHP grow.

Set up is easy, but the majority of PHP users probably never set up their own Linux boxes with Apache anyhow. If they wanted to, though it would be easier than say, Django since last I checked Ubuntu was still providing Django .96 packages.

The ease of setup comes from every shared host offering PHP and MySQL. If they all offered Python, Django and Postgres support, with cpanel configs and PhpPgAdmin or the like, people would magically find Django really easy to set up. That's just how it goes when you have lots of market share.

dools · on June 10, 2011

Date and time handling with date() and strtotime(), magic methods, simple and practical array handling, proximity to the HTTP so you can solve problems the way you want instead of the way someone tells you you should.

z92 · on June 12, 2011

One good point in PHP is the hello world code is just "hello, world". Not print("hello, world") neither <?php echo "hello, world"?>.

Simply type in hello world in a file, save it and run it as PHP script. It will print "hello world". And that was the start of the good things.

vertice · on June 10, 2011

or just moving on.

php itself does break backwards compatibility, and it doesnt really help.

hackernewz · on June 10, 2011

Boo, deleted short tags. Why do people hate short tags again? I can't remember because <? is not valid XML so there shouldn't be any problems with mixing php and xml... hmm... I wonder ....

mgkimsal · on June 10, 2011

<? is what is used by XML to indicate a processing instruction.

http://www.javacommerce.com/displaypage.jsp?name=pi.sql&...

<?name pidata?>

So when someone uses <?php in their XML document and tries to run it through PHP, will the collective PHP-internals community explode? React? Ignore it?

<%= has been the compatible way forward since 1996 or so, and the community has chosen to ignore it. We've successfully migrated away from HTTP_GET_VARS, register_globals, and other bad habits. But somehow <?php is lorded over others as 'the one true way', as if somehow typing more boilerplate to avoid conflict with 0.02% of the use cases where there's a problem is something to be proud of.

dexen · on June 10, 2011

Can we please give up this ``is-valid-XML'' argument already?

Quick counter-example:

  <?php echo 'foo ?>' ?>

Guess what:

- if interpreted as XML, the processing instruction ends at the first ?>

- if interpreted as PHP, the PHP code ends at the sencond ?>

...which makes the argument against short tags -- XML validity -- moot. Can't be fixed without breaking backward compatibility either. Let's drop the argument now; otherwise somebody well-meaning will try to apply it and will end up breaking backward-compatibility (and removing a neat feature) yet keeping XML-compatibility broken anyway. Just like Robert Eisele did.

acabal · on June 10, 2011

Agreed... I've never used <? in a non-PHP context and it's way easier to type than <?php all the time. Why not just leave it in, or at least change it to something just as short but unique?

philolson · on June 10, 2011

<?= always exists as of PHP 5.4, despite the short tags setting.

romaniv · on June 10, 2011

Awesome. I remember arguing for it on the mailing lists and I'm extremely happy they did this.

With echo tag you can have a very simple and powerful templating system written in the language itself. (I usually just use two methods, Temaplte::show and Template::get, which both are less than 20 lines long.)

mgkimsal · on June 10, 2011

encoderer · on June 10, 2011

Personally, I'm a fan of exactly one way to do one thing.

z92 · on June 11, 2011

Short tags are why I don't need a template engine for PHP. And can write <?=$name?> instead of <?php echo $name;?>. This had been a unique feature that no other scripting language offers. This is a historic sign indicating that PHP was designed for the web.

chopsueyar · on June 10, 2011

It is not a unique escape code.

Everytime I type those three little characters, it gives me great pleasure.

mgkimsal · on June 10, 2011

<%= is a unique escape code

It always bugged me that the PHP community didn't pick up on this and run with it. We've had ASP tags since the beginning (almost?) but I think people avoid it out of some anti-MS stance. Rails runs with <%= just fine.

hackernewz · on June 10, 2011

Can you give a demonstration of its non-uniqueness?

duskwuff · on June 10, 2011

<? is used in XML headers.

OTOH, you shouldn't ever be generating XML headers from PHP anyway (XHTML is a dead-end, and raw XML should be generated by an XML library), so this shouldn't be that big of an issue.

hackernewz · on June 10, 2011

But, less than question mark and a space are not a valid XML combination. XML processing instructions cannot begin with a space. Likewise, PHP cannot run commands into the short tags, like <?echo $foo; ?>. So, there really is no collision, it's just grandstanding.

mgkimsal · on June 10, 2011

I wonder if the XML PI spec says anything about <?= Can you have a PI start with = ? If not, the echo short tag should stay for certain.

zeen · on June 10, 2011

'<?=' is invalid XML. '<?' must be followed by a valid XML Name.

See http://www.w3.org/TR/2008/REC-xml-20081126/#sec-pi

josegonzalez · on June 10, 2011

I wonder if it wouldn't have been better to import the PHP source-tree first, then apply each patch one at a time/in a batch. That way applying these patches at a later date would have been easier.

mattyb · on June 10, 2011

That's what was done.

https://github.com/infusion/PHP/commit/790d551ac9ef8e204b44f...

sc68cal · on June 10, 2011

Why oh why didn't he fork php/php-src, make a branch for his changes, then commit as he went along? This just ends up as a big patchbomb. They're never going to merge his changes, and I don't blame them.

EDIT - Hopefully we can straighten this around: https://github.com/infusion/PHP/commit/790d551ac9ef8e204b44f...

jrockway · on June 10, 2011

In this case, it would be better to maintain it as several topic branches (so that each logical change can be implemented in as many patches as necessary), and then a good old octopus merge as "master" that you force-update every so often.

(Yes, everyone hates you for modifying published history, it's evil, it will give you bad breath, yeah yeah yeah. But you have to think of this use case as a "code wiki" rather than "list of changes leading to the current state". After the patches are accepted, then don't force update them. When they are still in flux, modify history as often as necessary to make your changes easily comprehensible.)

sc68cal · on June 10, 2011

His work is based off 5.3.x, there is probably a branch for it in the official SVN repo. As one commenter noted, the github mirror is stale so he'd need to fork it and then update it with the official svn to find the branch.

It may make sense to rebase it off of trunk of svn.

mattyb · on June 10, 2011

What do you mean by force update?

josegonzalez · on June 10, 2011

master would always contain the php-base+the octopus merge. So if the php-base was 5.3.6, it would be 5.3.6+patches. Tomorrow PHP 5.3.7 comes out, and then master would be 5.3.7+patches.

josegonzalez · on June 10, 2011

No it wasn't. As sc68cal says, its a patchbomb. And if you think otherwise, let me know how I can revert his changes to remove short tags and the mysql* changes using a single git command (hint, there isn't).

I'm all for changes to an open source project - whether it acts like one or not - but every open source developer should, at some point, learn that gigantic patchsets with lots of unrelated changes are a big no-no.

sc68cal · on June 10, 2011

I'm just pissed that he didn't even bother to actually fork the project. On Github! WOW! All the previous commits before the fork? Gone. Poof. It's completely without any context. Even though there's a big "FORK" button!

mattyb · on June 10, 2011

Well what project would he fork? The (apparently) official PHP mirror is way out of date:

https://github.com/php/php-src

sc68cal · on June 10, 2011

Well, then just use git svn to update the master of his fork to match the official PHP svn. It'll just be a fast forward anyway.

Two birds with one stone.

xarg · on June 10, 2011

I work on it. I currently update the git repo with the svn sources. This is somewhat slow... May I ask what you want to do with the dissected patches? Try to build a "clean" version as pbiggar has said?

josegonzalez · on June 10, 2011

This doesn't concern me as much, so long as he has a note saying "compiled onto commit hash blah". Cherry-picking would be a pain, but doable at that point.

mattyb · on June 10, 2011

Well I wasn't sure what you meant by in a batch. I agree that it wasn't well done, but I wasn't commenting on that.

It's especially interesting considering he says this in the comments:

...I don't want to leave it as a stand alone project. I modified PHP as proof of concept in order to get these changes into one of the next releases.

grumpycanuck · on June 10, 2011

I'm a long-time PHP user (since 1998) but I never really peered inside the discussions of the internals of PHP. Now that I'm more connected to others in the PHP world, what you see inside the mailing list for PHP internals is what I would label as obstructionism and an attitude that seems to imply that if you cannot code the requested changes yourself, don't even bother asking.

Lead, follow, or get out of the way are the only three choices available to any language.

hackernewz · on June 10, 2011

Also, if you can code the change yourself it gets rejected as "not a bug" for a few times, then they think about it and say that it's too late for any reasonable release and that it will go into PHP 6 or 5.3 or something that you won't upgrade too because it breaks too much other stuff.

wvenable · on June 10, 2011

> attitude that seems to imply that if you cannot code the requested changes yourself, don't even bother asking.

I think that's fair actually. The internals list is not for wish-lists; if it was it would be flooded and no useful work could be done there. The problem is that there are coded patches for some of these features (like short array syntax) but they're still not being included.

philolson · on June 14, 2011

It's not productive and everyone loses with that attitude, because "... don't even bother asking" essentially blames others for not implementing ideas stuck in your head.

ldng · on June 10, 2011

I was hoping for more profound changes. I've never read PHP internals but I'm under the impression the opcode language is not 'jitable' because it's an unformal mess while even python and lua are getting there ... It seems that Opcode caching is the most you can get out of the language as is.

Revisor · on June 10, 2011

And in a true PHP fashion, the new functions follow at least two naming conventions. Cf. str_random() vs strcut()

PHP has no vision, it draws no people with vision and suffers for it very much. I say it as someone working with it for historical reason.

jrockway · on June 10, 2011

So, are strings in PHP cstrings, or are they a length/address pair?

roel_v · on June 10, 2011

Pascal-style, so size/address. Technically, strings don't need to be null-terminated, but many libraries expect they are so many are 'both' in a way.

pbiggar · on June 10, 2011

length/address pair.

jrockway · on June 10, 2011

I see. Why is strlen such a hit then?

pbiggar · on June 10, 2011

function call overheard. isset is a builtin.

viraptor · on June 10, 2011

Might be not completely offtopic if I asked here: is there some reason array, resource and object were never folded into one type (object)? It seems like array and resource are kept separate just because it's done this way internally (with array and resource super classes). Why can't resources or array be native-code-backed objects like some modules in python?

It seems like many special cases were left over from old versions and the inertia prevents any change.

tcdent · on June 10, 2011

Minor nitpick given the scope of these additions, but his use of a boolean argument to enhance implode is not my favorite.

  implode(',', $array, true)

A new function (or even leaving it the way it was) is far more readable.

  implode_keys(',', $array)

  implode(',', array_keys($array))

voidr · on June 11, 2011

This should be the mainline version, a lot of good stuff especially the new array syntax.

koski · on June 10, 2011

Does anyone know if there are any benchmarks about this?

aba_sababa · on June 10, 2011

Mmm, string looping!