I've read a lot of the PHP internals source code (I'm an author of phc), and this post leaves me a tiny bit optimistic, and a tiny bit disappointed.
I completely agree of the need to fork to make PHP a better language. The PHP internals community is the most awful OSS community I've ever been involved in, and getting shit done there just isn't going to work. So this is a good first step.
I also like the things he has taken away. PHP suffers from too many unnecessary options. I know that "unnecessary" is often subjective and in the eye of the beholder, but the OP seems to have a good eye for this. The things he removed are a blight on the codebase, so well done.
The optimizations are so-so. While I can see the benefits of the hardcoded constants, the other optimizations are hacks. And that has historically been the problem with PHP optimizations - they're all hacks. The engine needs rearchitecting to be fast, and all these hacks need to be removed, not more added.
Take for example that strlen is slow. The problem is that all function calls in PHP are slow. And if you looked at the function calling code, it's very obvious why. The correct solution is to make _all_ function calls fast, presumably by using some kind of inline cache, or just refactoring a ton of that crap out.
(Actually, someone introduced a patch to PHP-internals about two years ago to cache function calls somehow. I don't remember the exact details, but it was similar to the concept of an inline cache, perhaps storing a function cache struct in some bytecode or something. As I recall, it was denegrated as not being a full PIC. Sigh).
Finally, I really don't like the added functions. No problem with the functions themselves, just that adding functions to core PHP is not useful to anyone but the author. What if they clash with a user's function names? Why couldn't they just be implemented in user space? Etc. This is very much going on the wrong direction.
In the future, I'd like to see an effort like this go wild and make backwards incompatible changes like making the needle/haystack parameters consistent, but that's optimistic. (If they were to do it, phpcompiler.org could well be used to provide a 2to3-like too for doing the user-code porting automatically).
So overall, some good and some bad. If the OP focussed on making optimizations which improved the code base, removing more crap, and avoided adding "new" things, this would be really nice.
Many of the problems and annoyances with PHP could be fixed while keeping backwards compatibility.
Just a short list I can think of now:
- Short array syntax. Completely optional but would clean up the language quite a bit. This has been proposed several times and shot down each for no real good reason.
- Promote complex primitives into objects. With the proper interfaces it wouldn't break any old code. Once this is done, deprecate the myriad of str* and array* functions in favor of the object methods. This also cleans up the "what order do i pass in the args" problem that so many people cite.
- Named parameters. Gets rid of passing in default params or an array blob for functions/methods that take lots of optional params. It also opens up the way for DSLs.
- Replace PEAR with something better. The quality of PEAR modules is really low and PEAR always seems to be broken one way or another.
- Add a list and hash type. PHP's array is very slow for some cases, and sometimes you need to enforce the datatype you're working with.
- Use exceptions for errors. Could be a runtime flag to prevent breaking things. It's incredibly easy to write bad code because PHP make it easy to ignore problems. Things like functions that connect to the db or trying to retrieve data from a stream don't throw exceptions on error.
PHP already provides a mechanism to replace errors with exceptions (set_error_handler() and the ErrorException exception type). I do this on all my projects.
I'm aware of that method. It's not an ideal approach IMO.
Specific errors should throw specific exceptions. That way I can handle the ones I know how to handle and let the others bubble up and either completely break the app to be logged and fixed later or get handled correctly somewhere else. With ErrorException, you have to look at the contents of the error message to see if it's an error you can handle or need to rethrow.
If something in PHP triggers an error, there is very little to recover from. It's either a developer error or an unrecoverable problem (a file expected to exist was not found, network is down, etc). In any of my projects, I've never needed to recover from PHP ErrorException.
The good parts of PHP, just off the top of my head: Extremely forgiving syntax which makes it quick and easy to bash together something that just works, implicit type conversion, a vast array of built-in functions, readability, near-ubiquity, generally very strong documentation with a healthy focus on example code, and the way it interpolates with static HTML, which I'm sure there's a word for but I can't remember offhand.
What I see as the big strength is how easy it is to set up mod_php - possibly in no small part due to how most Linux distributions have solid defaults for mod_php5 and keep it up to date. The things you listed I'm not so sure about.
I don't think it's the syntax that's forgiving as much as the type coercion. JavaScript is actually more forgiving syntax wise, for example - PHP throws a fatal error when a semicolon is missing at the end of a line, JS interpreters fill it in. Unlike PHP, though it will throw a ReferenceError and die upon referencing any undefined value. PHP just coerces that to something falsey... they'll both die trying to call undefined functions or methods.
I don't see how PHP is not more readable than Python or Ruby, either. There's really nothing about the language itself that makes it readable or better for beginners.
The docs are definitely good, I agree. I think they need to do something about the outdated and not-so-well curated comments on each entry though.
Near ubiquity is good, but it also makes a lot of good PHP resources get lost in horrible noise. WordPress, for instance - if you search to solve anything, you find scores of outdated tutorials offering solutions to problems which have been solved by the official branch for years. Then, you find more blogs blog-spamming the incorrect solution... so it's actually easier to search for and find Rails or Django help, in my experience. A large percentage of the PHP code out there is notably terrible, for whatever reasons.
The 'interpolation' would be called templating. You can do this in most languages (erb for Ruby, for example) but it's not always considered a good idea to give the template language logic capabilities. It happens that this is the default for how PHP and Apache work together, and I think the straightforward concepts of files, directories, scripts it's easier for beginners to understand and work with than framework style URL routing.
The large software eco-system centered around tools consumers use such as WordPress or joomla has helped PHP grow.
Set up is easy, but the majority of PHP users probably never set up their own Linux boxes with Apache anyhow. If they wanted to, though it would be easier than say, Django since last I checked Ubuntu was still providing Django .96 packages.
The ease of setup comes from every shared host offering PHP and MySQL. If they all offered Python, Django and Postgres support, with cpanel configs and PhpPgAdmin or the like, people would magically find Django really easy to set up. That's just how it goes when you have lots of market share.
Date and time handling with date() and strtotime(), magic methods, simple and practical array handling, proximity to the HTTP so you can solve problems the way you want instead of the way someone tells you you should.
Boo, deleted short tags. Why do people hate short tags again? I can't remember because <? is not valid XML so there shouldn't be any problems with mixing php and xml... hmm... I wonder ....
So when someone uses <?php in their XML document and tries to run it through PHP, will the collective PHP-internals community explode? React? Ignore it?
<%= has been the compatible way forward since 1996 or so, and the community has chosen to ignore it. We've successfully migrated away from HTTP_GET_VARS, register_globals, and other bad habits. But somehow <?php is lorded over others as 'the one true way', as if somehow typing more boilerplate to avoid conflict with 0.02% of the use cases where there's a problem is something to be proud of.
Can we please give up this ``is-valid-XML'' argument already?
Quick counter-example:
<?php echo 'foo ?>' ?>
Guess what:
- if interpreted as XML, the processing instruction ends at the first ?>
- if interpreted as PHP, the PHP code ends at the sencond ?>
...which makes the argument against short tags -- XML validity -- moot. Can't be fixed without breaking backward compatibility either. Let's drop the argument now; otherwise somebody well-meaning will try to apply it and will end up breaking backward-compatibility (and removing a neat feature) yet keeping XML-compatibility broken anyway. Just like Robert Eisele did.
Agreed... I've never used <? in a non-PHP context and it's way easier to type than <?php all the time. Why not just leave it in, or at least change it to something just as short but unique?
Awesome. I remember arguing for it on the mailing lists and I'm extremely happy they did this.
With echo tag you can have a very simple and powerful templating system written in the language itself. (I usually just use two methods, Temaplte::show and Template::get, which both are less than 20 lines long.)
Short tags are why I don't need a template engine for PHP. And can write <?=$name?> instead of <?php echo $name;?>. This had been a unique feature that no other scripting language offers. This is a historic sign indicating that PHP was designed for the web.
It always bugged me that the PHP community didn't pick up on this and run with it. We've had ASP tags since the beginning (almost?) but I think people avoid it out of some anti-MS stance. Rails runs with <%= just fine.
OTOH, you shouldn't ever be generating XML headers from PHP anyway (XHTML is a dead-end, and raw XML should be generated by an XML library), so this shouldn't be that big of an issue.
But, less than question mark and a space are not a valid XML combination. XML processing instructions cannot begin with a space. Likewise, PHP cannot run commands into the short tags, like <?echo $foo; ?>. So, there really is no collision, it's just grandstanding.
I wonder if it wouldn't have been better to import the PHP source-tree first, then apply each patch one at a time/in a batch. That way applying these patches at a later date would have been easier.
Why oh why didn't he fork php/php-src, make a branch for his changes, then commit as he went along? This just ends up as a big patchbomb. They're never going to merge his changes, and I don't blame them.
In this case, it would be better to maintain it as several topic branches (so that each logical change can be implemented in as many patches as necessary), and then a good old octopus merge as "master" that you force-update every so often.
(Yes, everyone hates you for modifying published history, it's evil, it will give you bad breath, yeah yeah yeah. But you have to think of this use case as a "code wiki" rather than "list of changes leading to the current state". After the patches are accepted, then don't force update them. When they are still in flux, modify history as often as necessary to make your changes easily comprehensible.)
His work is based off 5.3.x, there is probably a branch for it in the official SVN repo. As one commenter noted, the github mirror is stale so he'd need to fork it and then update it with the official svn to find the branch.
It may make sense to rebase it off of trunk of svn.
master would always contain the php-base+the octopus merge. So if the php-base was 5.3.6, it would be 5.3.6+patches. Tomorrow PHP 5.3.7 comes out, and then master would be 5.3.7+patches.
No it wasn't. As sc68cal says, its a patchbomb. And if you think otherwise, let me know how I can revert his changes to remove short tags and the mysql* changes using a single git command (hint, there isn't).
I'm all for changes to an open source project - whether it acts like one or not - but every open source developer should, at some point, learn that gigantic patchsets with lots of unrelated changes are a big no-no.
I'm just pissed that he didn't even bother to actually fork the project. On Github! WOW! All the previous commits before the fork? Gone. Poof. It's completely without any context. Even though there's a big "FORK" button!
I work on it. I currently update the git repo with the svn sources. This is somewhat slow... May I ask what you want to do with the dissected patches? Try to build a "clean" version as pbiggar has said?
This doesn't concern me as much, so long as he has a note saying "compiled onto commit hash blah". Cherry-picking would be a pain, but doable at that point.
I'm a long-time PHP user (since 1998) but I never really peered inside the discussions of the internals of PHP. Now that I'm more connected to others in the PHP world, what you see inside the mailing list for PHP internals is what I would label as obstructionism and an attitude that seems to imply that if you cannot code the requested changes yourself, don't even bother asking.
Lead, follow, or get out of the way are the only three choices available to any language.
Also, if you can code the change yourself it gets rejected as "not a bug" for a few times, then they think about it and say that it's too late for any reasonable release and that it will go into PHP 6 or 5.3 or something that you won't upgrade too because it breaks too much other stuff.
> attitude that seems to imply that if you cannot code the requested changes yourself, don't even bother asking.
I think that's fair actually. The internals list is not for wish-lists; if it was it would be flooded and no useful work could be done there. The problem is that there are coded patches for some of these features (like short array syntax) but they're still not being included.
It's not productive and everyone loses with that attitude, because "... don't even bother asking" essentially blames others for not implementing ideas stuck in your head.
I was hoping for more profound changes. I've never read PHP internals but I'm under the impression the opcode language is not 'jitable' because it's an unformal mess while even python and lua are getting there ... It seems that Opcode caching is the most you can get out of the language as is.
Might be not completely offtopic if I asked here: is there some reason array, resource and object were never folded into one type (object)? It seems like array and resource are kept separate just because it's done this way internally (with array and resource super classes). Why can't resources or array be native-code-backed objects like some modules in python?
It seems like many special cases were left over from old versions and the inertia prevents any change.
I completely agree of the need to fork to make PHP a better language. The PHP internals community is the most awful OSS community I've ever been involved in, and getting shit done there just isn't going to work. So this is a good first step.
I also like the things he has taken away. PHP suffers from too many unnecessary options. I know that "unnecessary" is often subjective and in the eye of the beholder, but the OP seems to have a good eye for this. The things he removed are a blight on the codebase, so well done.
The optimizations are so-so. While I can see the benefits of the hardcoded constants, the other optimizations are hacks. And that has historically been the problem with PHP optimizations - they're all hacks. The engine needs rearchitecting to be fast, and all these hacks need to be removed, not more added.
Take for example that strlen is slow. The problem is that all function calls in PHP are slow. And if you looked at the function calling code, it's very obvious why. The correct solution is to make _all_ function calls fast, presumably by using some kind of inline cache, or just refactoring a ton of that crap out.
(Actually, someone introduced a patch to PHP-internals about two years ago to cache function calls somehow. I don't remember the exact details, but it was similar to the concept of an inline cache, perhaps storing a function cache struct in some bytecode or something. As I recall, it was denegrated as not being a full PIC. Sigh).
Finally, I really don't like the added functions. No problem with the functions themselves, just that adding functions to core PHP is not useful to anyone but the author. What if they clash with a user's function names? Why couldn't they just be implemented in user space? Etc. This is very much going on the wrong direction.
In the future, I'd like to see an effort like this go wild and make backwards incompatible changes like making the needle/haystack parameters consistent, but that's optimistic. (If they were to do it, phpcompiler.org could well be used to provide a 2to3-like too for doing the user-code porting automatically).
So overall, some good and some bad. If the OP focussed on making optimizations which improved the code base, removing more crap, and avoided adding "new" things, this would be really nice.