Hack and HHVM solves what is, IMO, the worst feature of the default PHP runtime environment[0] - and that is the superglobals.
It wasn't mentioned in the post from Slack, but default superglobals and the earlier register_globals design decisions are the worst and most impactful wart in PHP.
Because it was designed as a templating language, the default web server interface, which is CGI - will auto-expose all variables in global scope, ex.
echo $_POST['user_id']
PHP has a horrible reputation with security for this reason - we all know that somehow, somewhere, in almost every project someone is pulling in a user-controlled variable from a superglobal and they aren't escaping or checking it properly (since you can't be warned about it but the feature will work).
Worse - and i've seen this a lot, even with Laravel, CodeIgniter, Cake, Symfony/Silex etc. you end up with these well structured projects that declare request classes, methods and variables etc. etc. but then sometime down the road a developer takes a shortcut in a method and pulls in a $_GET or $_POST inside a controller (usually because they don't know how, or aren't bothered to - changing all the related classes) - running around the default exec stack.
I've seen this so often - because it's so easy to do it. The most common place is where a designer has built a frontend AJAX form. They now need to build a quick backend check, so they Google "php backend ajax username check" and they'll likely get a result like this one:
$username=$_POST['username'];
$query="SELECT * FROM username_list WHERE username='$username' ";
they copy that into a file called ajax_username_check.php and save it to the server - and they've now destroyed all that previous good work by opening up a very blatant and easy to find SQLi vulnerability. Their database will be on pastebin within a month.
You can spot this type of vulnerability from the frontend because the URLs used in the AJAX calls don't match the URL router patterns for the rest of the app (ex. GET /user/username_check_ajax.php vs /user/check_user).
In other languages you can't get that without using a standard library that will escape the values by default. Any solution you search for will always be a safe method to obtain the variable values by default.
Some good news: Hack doesn't expose superglobals in strict mode:
I'd strongly recommend that this is used in all PHP projects, since it strictly enforces variable access - even in cases where you're using a framework that is supposed to enforce it.
IMO PHP missed a big opportunity with not removing superglobals in version 7 and enforcing an explicit safe request object much like other languages do. They likely wanted to avoid it because of the cluster of register_globals and magic_quotes from earlier versions.
[0] I think it is important to distinguish PHP the language and PHP the runtime. PHP the language is now decent - having caught up with a lot of features (although I find it very verbose and harder to read) while PHP the runtime is undoubtably still a horrible runtime - hence HHVM
I'm not a huge fan of the superglobals - but the correct fix here is to use a parameterized query!
I'm not sure which languages you are used to, that will somehow magically 'escape' an input string so that it's safe to inject directly into your query in all circumstances.
I know I don't want strings from the frontend pre-quoted in any way. I want the string the way the user typed it in!
I'm sad that I had to scroll that much down to get to this response. Any other approach is still not secure, might output strange characters for specific inputs or both.
The fact that their "recipe" relies on one file being run in "non-strict" mode for it to work at all is very telling about how short-sighted this "vision" to remove superglobals is.
Even if you went the whole hog, and removed $_GET, $_POST etc. You force users to use filter_input() to get variables. (Why doesn't the Hack "recipe" do this anyway?)
Now tell me how you access a a structured POST body. e.g.
foo[bar]=baz&foo[baz]=bar
Oh. Right, you can't. Because filter_input only returns scalars.
If you remove $_GET and $_POST people will just do the equivalent in the new construct:
$query = "SELECT * FROM username_list WHERE username='" . filter_input(INPUT_POST, 'username') . "'";
$query = "SELECT * FROM username_list WHERE username='" . $myFancyPostObject->getString('username') . "'";
The PHP developers who understand why using raw untrusted input is dangerous are already using the facilities provided to make the input safe for use, in some cases built around access to $_GET and $_POST.
The PHP developers who are already using raw untrusted input in dangerous ways will simply find new dangerous ways to use the data.
> I've seen this so often - because it's so easy to do it.
That's the very point. PHP was/is _easy_ to pick up. You get something working quickly. Then you meet problems. And hopefully, you learn along the way and fix that (and see, this whole thread is from people that learned how not to use $_GET/$_POST).
This "build/make it work/fail/fix" loop was an ingredient that made PHP so popular. The "happy code path" was not a pain to setup.
While I agree that directly pulling things out of superglobals is dangerous, I disagree that it should be removed, lest you end up with a python2/python3 situation.
You can't just run around breaking BC of the language every time something is unideal.
Yes, there are a lot of ways to easily create security holes. This is what code review is for. I'm also not going to advocate abandoning C/C++ because "it's easy to create security holes" i.e. overflows.
Agreed, at Amazon you are allowed to use any programing language except for PHP for this very reason, it is the least secure. PHP is officially banned as a language.
I don't think I've used a superglobal in 10 years, because of all the reasons you list. This is why you have a senior developer on a project doing code reviews.
Hack is a technical solution that brings lots of problems of its own. By all means use it, but not because of that feature.
Re: the SO answer, is it possible to report that code for being inherently unsafe? I think SO should take responsibility and edit or at the very least flag unsafe code.
Pro tip - first thing when you start a PHP project:
- move all superglobals to your private vars
- only allow access to these vars through your special functions which REQUIRE from programmer to specify type and validation / sanitization (regex for strings, min/max for numbers,...)
Make it difficult for programmers to use unsanitized vars and you will have much more secure code.
Not sure why no frameworks (that I know of) do this, but fortunately it is easy to add this.
Python does support block scoping- so no- it doesn't work there. Also, if you need to use variables in outer scopes, it's ok to declare them there. There is literally, and I mean this, literally- no possible justification for using inner-scoped variables in an outer scope that they weren't declared in.
I don't get why people keep harping on super globals are being inherently bad. The variables are there. You can use them or ignore them. A variable definition harms you in no way other than a tiny bit of memory usage which is capped by the HTTP limit on POST and GET limits anyway. What? You think you're gonna get hacked because $_POST['ihaxyou'] is set to 'w00ts'?
No one does this anymore:
mysql_query("SELECT * FROM `table` WHERE `id`=".$_POST['ID']);
There's absolutely NOTHING wrong with having $_POST['whatever'] inside a controller as long as you're doing proper checks.
Are you expecting it to be an integer? Easy
if(!ctype_digit($_POST['ID'])) {
// throw exception here
}
Contrary to the hive mind you don't need some special encapsulation class to pull your post and get variables.
HHVM does not in any way solve this issue. You still have to write proper validation into your code or you'll get hacked. What's with people expecting frameworks to do everything for them these days?
> There's absolutely NOTHING wrong with having $_POST['whatever'] inside a controller as long as you're doing proper checks.
The last part is why this is a problem.
The truth is that programming is simply too difficult a task for human beings. Software is so complicated with so many moving parts that it is impossible for anyone to understand all the details of even the simplest piece of code. This is why we have operating systems, programming languages, and frameworks. Or more generally: this is why we have abstractions and tools: to make it possible for humans to actually write somewhat functional software.
This is also why you not only want languages, frameworks and tools that let you do the right thing; you want them to prevent you from doing the wrong thing. You want to reduce the number of things you have to think about to the absolute minimum, simply because no one is smart enough to fully understand everything that's going on. That is why you want things like strongly typed languages.
Sure, you can make something that works and is safe, if you make no mistakes. The point is that we want to take the human ability to make mistakes out of the equation as much as possible. Sometimes it is inevitable because of what a language, framework or tool is used for. In the case of PHP there are a lot of ways to shoot yourself in the foot that do not need to exist in a high-level language such as PHP, and that is the reason why PHP sucks and is a bad idea. It is way more fragile than it has a need to be.
$_POST isn't the problem. The scope of the variable isn't the problem. You've realized this, too, and that's why you're shifting the argument to one about typing instead of superglobals-are-bad (typing and scope are obviously independent features).
> You want to reduce the number of things you have to think about to the absolute minimum
The net effect of type systems seems to actually be to force you to think about a larger number of things very very carefully. The only thing that ends up being reduced is having to think about having to think about them.
No, it isn't. That's why I specifically said "The last part is the problem", referring to "as long as you're doing proper checks."
Specifically, my point is that you should not rely on people 'doing the proper checks' because people make mistakes, thus you want to reduce the amount of situations where people are given the opportunity to make those mistakes.
There are a lot of situations in PHP where you need to 'do the proper checks' for no reason other than bad language design, and that is what makes PHP a bad language.
> What's with people expecting frameworks to do everything for them these days?
I don't think that's the expectation. PHPs reputation seems to surround the fact that it tends to be (or was) the first language amateur coders dabbled with. That crowd is especially susceptible (at least before mysqli, etc) to making mistakes that amount to serious security vulnerabilities. Historically, these seemingly benign things that can only go bad if you dont know better, do go bad [1]. It seems that this lead rampant conflation of whether php is a bad language and whether php made it easier to do things in an unsafe way. Frameworks that create safer environments for devs (especially newer ones) are certainly a good thing and the good frameworks often get out of your way when you need them to.
>PHPs reputation seems to surround the fact that it tends to be (or was) the first language amateur coders dabbled with.
That's certainly part of it. JavaScript suffers the same hate today -- amateur and junior developers produce thousands of lines of crap per year, and people blame it on the language.
But PHP itself is just a mess. I'm an experience developer (about 30 years at this point), and about 12 years ago I decided to do a volunteer project for a nonprofit in PHP. Finished the project and the nonprofit used it for at least 10 years -- they may still be using it for all I know.
Never. Again. I can't stand PHP and I will avoid its language and ecosystem like the plague. Even if they've fixed some of the problems in the core language and added types, the "standard" libraries were a random pile of mismatched garbage where the mysql_xxxx functions could have parameter signatures in different orders than the pg_xxxx versions. Maybe they've fixed that as well, but they'd have to break backward compatibility in pretty awkward ways to achieve that.
I don't even remember all the other things that tortured me, but it wasn't fun.
And it's not asynchronous. There's not even a good reason to use a synchronous language for web development today. Not to mention the ease of running NodeJS code in a debugger, or running tests in a browser and debugging it there...
I'm using TypeScript and Go for all my web related code moving forward. Something better comes along, and I'll consider it. But PHP was just a nightmare. (Elm on client? Maybe, under the right circumstances?)
> I'm an experience developer (about 30 years at this point), and about 12 years ago I decided to do a volunteer project for a nonprofit in PHP
> mysql_xxxx functions
So, you used PHP 12 years ago and comment based on that.
You have a point though. I used this internet thing about 17 years ago, it was terrible. Only dial-up. SLOW! And don't talk to me about browsers. Netscape? Internet Explorer? Ugh. Forget it. I don't care what they might have changed, or replaced or completely removed, its always terrible.
The language/runtime have problems, just like every other language/runtime that exists, but complaining about a version from 12 years ago and comparing functions that aren't even part of the language anymore seems a little odd to me.
>You have a point though. I used this internet thing about 17 years ago, it was terrible. Only dial-up. SLOW! And don't talk to me about browsers. Netscape? Internet Explorer? Ugh. Forget it. I don't care what they might have changed, or replaced or completely removed, its always terrible.
We have only one Internet. But we have a lot of alternatives for Php to choose from.
If you have an Internet provider that gave you horrible service, will you give them another try even when your current provider is great in every way, just because the only good thing with your past provider was they gave you a connection in a day, instead of 3 days as with your current provider....
I'll be honest I don't really care what languages people use, not my place to convince you back. Just saying there's quite high chances PHP has changed at least a tiny bit over the past 12 years
I started using PHP when it was in version 3 and still use it occassionally. It has changed, but not considerably. The function naming is still a mess (backwards compatibility), arrays are completely inappropriate (naming them "bags" would be better), some decisions were so baffling (safe mode and magic quotes, superglobals, square brackets for arrays...) that I simply don't trust PHP to ever get better.
But there was a reason for experienced developers to use it - hosting. You could write a web app, ftp it somewhere and it would just work. There was no other platform aside from asp which would offer that. Nowadays this doesn't matter much, but 10 years ago it was great.
>And it's not asynchronous. There's not even a good reason to use a synchronous language for web development today. Not to mention the ease of running NodeJS code in a debugger, or running tests in a browser and debugging it there...
This is still true today, and this fact alone makes it not worth trying out as a server language.
Go is asynchronous. [1] It's a huge point of the language, and why it's working its way to the top of the new TechEmpower benchmarks [2].
> "callback hell"
It can be ugly to look at, but it's fast.
And I thought we weren't using references to 10 years ago to judge a language? Promises minimize callback hell with a far better (more functional) interface, and async/await (usable today with transpilation) banish it entirely.
>If using threads means the language is asynchronous to you,
Goroutines are not threads. It's named after a "coroutine," which is an async function (like a generator) where you can pause execution of a task (to wait for IO, for instance) and then resume it at a later time. Coroutines give you implicit async/await style programming, without having to use extra keywords. The language Lua supports coroutines natively, for instance, including explicit yields to use them as generators, if you need. Goroutines give you coroutines with a bonus: Actual thread hopping (and multiple channels of communication possible, so you can effectively have more than one "yield" channel).
If you have 400,000 Goroutines running on 4 CPUs, your Goroutines can task switch between those 4 CPUs using 4 OS threads (or 8, or whatever number you determine is best for your app) as their IO events queue up. It is async, and each Goroutine has around 10k of RAM overhead, not counting data you're storing yourself (it varies by architecture, but that's a reasonable estimate; I've seen between 6k and 16k on different architectures).
Go is better async than JavaScript, because a Goroutine started on one thread can get processed on another, so a few thread-hogs won't block a task. A single Goroutine could cycle between all your worker threads, in fact, though I think it tends to stick with a single worker thread ("thread affinity"). It's probably the most asynchronous you can get in a language (only the Erlang VM/Elixir is in the same class of language, as far as I know).
Do you think PHP could handle 400,000 concurrent WebSocket connections on one server? Go can. How much RAM would you need to handle 400,000 CPU threads? Maybe 64Gb instead of ~8Gb in Go? Wouldn't CPU switching alone be starving most of the threads and redlining the CPU at that point? In my experience you start seeing serious slowdown around 5,000 threads. I'd guess you'd need 50x as many servers to handle that many connections on PHP. Maybe 100x. You need to support a million users? Is it better to have 3 servers or 300?
> The things you complained about in another language were fixed/removed years ago but you say it's still just as bad.
PHP is architecturally single-thread-per-request. Full stop. Even PHP 7 and HHVM. That simply doesn't scale as well as async, as I described above. The complaints I made were why I bailed on it years ago. I'm sure PHP has gotten a lot better than it was when I used it, but given that it still misses the mark architecturally, I have zero motivation to give it a second chance.
But given that NodeJS has far surpassed it in community support (as well as by many performance measures -- HHVM is quite fast at raw processing speed now, though not as high throughput because of the lack of true async), why is PHP still relevant except to continue to maintain existing code bases? (I've worked with several of those as well -- Joomla and Drupal in particular -- and neither was something I'd consider worth using, having looked at their internals and terrible performance. Joomla was a complete disaster; Drupal only slightly better.)
>Every failure or less than great situation in your favoured languages has just been fixed in the last few years.
Your point? Even if they all were only better as of yesterday, they're still better. Or are you defending your choice of a few years ago?
And actually, Promises have been available using polyfills for years. The proposal was made at least six years ago; I can't find the exact date, but I found a reference added to the CommonJS wiki in 2010 [1]. The Q library dates back to 2010 as well. [2]
I concede that there may have been a period of time where PHP was better than JavaScript, by the criteria I'm using today, because PHP did improve a lot after I first encountered it, while JavaScript only more recently developed its compelling advantages. It looks like Node was released in 2009? In 2009 I was using OpenResty (Nginx+Lua) to do my (small amount of) server work, which can still be more performant than both Node and PHP. But NodeJS has shot past in terms of community support, which is critical, TypeScript is awesome, sharing code on client and server is awesome, isomorphic rendering is awesome, and performance is good enough (compared to Lua) that NodeJS just wins for me, big time. Except when I need the extra speed, and for that I use Go.
Go is a brand new language, relatively speaking. So of course they're still making major improvements. Performance has already surpassed HHVM, even for raw compute tasks. It's a better fundamental architecture, giving it an edge, and performance will likely improve still more, though they're already hitting diminishing returns: Some of the worst case performance they see now may get 2x-4x faster, but typical app performance is probably only 10-20% short of optimal. I mean, they're already beating C++ on the server. How much better can they go?
Use whatever language you want; if your background and/or current job is PHP, so be it. It's not as bad as it was 10 years ago, for certain. Just don't expect anyone to pick it up based on its merits. There are too many other, stronger options available today.
Sounds like you just had a very bad experience with PHP, but it could have happened with most languages, honestly. I got (only) 20-year experience, but I have come across quite bad C codebases, poor Perl applications and terrible Java code, and I have thought 'never again' more than once, too.
In fairness, PHP was in a different place 12 years ago. The language has improved in that time. There are still lots of warts but the language and ecosystem have worked together to push it forward.
I'm going to be honest, I only use it because of company lock in. I hated it for many years but some of the stuff released and some of the stuff on the way in the internals is quite exciting.
The difference between Php and Javascript is that Javascript have competent people driving it forward. Php is still developed by college students with no real world programming experience, in their spare time....
Downvotes? Don't think this is true? See the following. These are couple of most prominent people working in the language.
Not impressed by your comment at all. If you think that the age of Nikita Popov or Andrea Faulds is any indicator of their "competence" I suggest you actually take a look at their contributions.
My comment just said that they are just college students. And the links are to prove that. Those are just facts. I don't know why people are pissed.
Edit: It is not just their age. I have seen their contributions (RFC's and implementations) and had conversations with them. And my opinion is also based on that...
> Contrary to the hive mind you don't need some special encapsulation class to pull your post and get variables.
The hive mind is like that for a reason. Making it easy to do the right thing and wrong to do the wrong thing has massive effects, I'd argue the magnitude of which scale exponentially with the growth of an engineering team. Really really talented engineers make mistakes all the time. To the extent that we can systematically limit those with little to no downside we absolutely should, especially when it comes to security.
I think people are also confusing an old issue from PHP 4.x where if you had $_POST['somevar'] it would actually have an alias automatically set as $somevar in the global userspace. This was turned off by default a long time ago and is the main real security issue when it comes to super globals. $_POST and $_GET are just the normal way to access POST and GET vars. There's nothing inherently insecure about it.
Exactly, this is where the real problem was and thankfully it was fixed. An attacker could insert any variable in a script just by adding it to the URLs additionally the PHP configuration could change variable order the variables from different sources were given so a script essentially didn't know where it was getting the information from. On the other hand the super globals are just a utility making things easier for the developer, they don't directly make code insecure.
I don't think people are confusing those things at all. The comment you're replying to is quite literally saying that using $_POST['somevar'] is too easy.
> I don't think people are confusing those things at all. The comment you're replying to is quite literally saying that using $_POST['somevar'] is too easy.
Are you guys, then, saying that $_POST['id'], by itself, is less secure than a getPostVar('id') would be, by itself?
In all seriousness, shouldn't all the frameworks just have some validation built in? Being that this is such a "global" WTF problem.
I would love to be able to say ini_set('sanitize_rest', true) and deal with errors that might result from that knowing at least the strings are safe. Or have functions like sanitize_string($str) and have the documentation encourage it everywhere. I mean, aren't we all just implementing those on our own anyway?
I know that obviously there are already functions for type checks etc, but the idea is to make it even easier, more obvious, and functions directly targeting the problem, even if they are mere aliases.
When a mistake is easy, the solution should be made easier.
PHP has already tried automatic sanitization with magic_quotes_gpc, and the results were far from secure.
It is not possible to have a single sanitize() function that renders a string safe for every possible context, and to try to provide one results in nothing but complacency and false sense of security.
Escaping for SQL is different from escaping for HTML, and even if you escaped a string for both, some idiot is going to echo it inside an inline script or a CSS attribute. Sanitize for all of them, and you begin to seriously mangle those strings. Oh, and it might still be ineffective against directory traversal. I've seen plenty of PHP users who think they're clever because they wrote a function that applies every single escaping function in a row, not realizing them some of them undo one another's work. Selectively unescaping after the fact is even more fun.
The only thing I can think of that could render a string absolutely safe in every context is intval(), but then you don't have a string anymore, and I suspect that even that can be abused with unexpected zeroes and negative values.
Right. I absolutely get this argument, except, the alternative to idiots misusing the tools is for those same idiots to start with no tools! And they're going to somehow implement and upload what they're building anyway...
What does this nonsense reply even mean? Nobody is talking about one-size-fits-all. People are talking about mitigating some stupid default behavior in a language. The suggestion "You're doing it wrong" just feeds into his point (that you can even get something this straightforward wrong in the first place indicates, maybe, you should put a fence there to warn people).
Programmers have so much stockholm syndrome it's unbelievable.
> I would love to be able to say ini_set('sanitize_rest', true) and deal with errors that might result from that knowing at least the strings are safe.
How is a magic ini setting to "make
Strings safe" not a one-size-fits-all?
> People are talking about mitigating some stupid default behavior in a language.
What stupid default behaviour? Giving you data as its received and tools to validate/sanitize it as required?
> if(!ctype_digit($_POST['ID'])) { // throw exception here }
ctype_digit is broken. Try passing integer values. ctype_digit(50) === true, but ctype_digit(100) === false. And "0000001" passes as true, which most people in the majority of scenarios would prefer not to pass. I can't remember what the even-worse-bug is with ctype_digit is, but even if you cast the value to string [ex: ctype_digit((string)$var))], there is some value that still passes for true when it shouldn't - do not use ctype_digit. is_numeric() is also unusable for validation [is_numeric("123e4") === true]. is_int() is a strict type-check so can't solely be used to validate request variables which are always strings (...or arrays, more below).
The only correct ways to verify that a variable contains either a valid numeric string or integer is by comparing type, and then using a regex or a double string-then-int cast.
ex: unsigned database ids: if ((is_int($var) || is_string($var)) && preg_match('/^[1-9]\d*\z/', $var)) { // definitely an int > 0 }
ex: signed integer: if (is_int($var) || (is_string($var) && (string)(int)$var === $var)) { // valid int (including negative values) }
Frankly, developers who don't understand how request variables are handled in PHP have zero chance of properly validating input. Find any site/app written in php, even if built on any of the major frameworks. You can instantly break 30-50% of them by passing an array where a string is expected.
Find an app that takes "?query=hello+world". Instead pass in "?query[]=hello+world". Want an example? Log in to Facebook, then visit this search page[1]. Look at the query string and then what was searched for - and the contents of the search box. Bam, even Facebook gets it wrong! Same thing with Symfony's search[2]. Or Packagist (composer's package manager repository)[3]. More seriously at Yii[4], which exposes an internal error to users as they try to string-trim an array ("Error - trim() expects parameter 1 to be string, array given").
Most developers - including many seniors who have been exclusively coding in php for years - have no clue. You will either cause a 500 Internal Server Error, or your input array will result in an output string of "Array" if they typecast your array to the string they expected. Even the major frameworks, when you pull user-submitted values, simply passthrough the value submitted. Your app expects a string (or a string that contains a numeric value), and instead any user who knows the "[]" syntax can pass in an array.
Really reflect on this fact. Most applications start handling a submitted array value as if it's a string. The bugs this produces are astronomical in some cases.
If you think your framework protects you, think again. The frameworks' request objects also do not have strict type checking. The same goes for their form and model validation classes; if you're using the built-in "integer" or "numeric" validators, you're probably doing things wrong.
It's a nightmare. You could try to blame PHP, but really it's the developers - including the developers of every major well-known framework I've ever touched - that have absolutely no clue.
Related tangent: comparing password and password confirmation fields. Many developers do if ($password == $passwordConfirm) {}. In PHP 5.x, "10" == "0xA" (so type "10" in password field and "0xA" in the confirmation field, and it passes validation). This changed in PHP 7 though. There are only two correct ways to verify that two strings are exact: $password === $passwordConfirm (triple equals), or strcmp($password, $passwordConfirm) === 0.
> The only correct ways to verify that a variable contains either a valid numeric string or integer is by comparing type, and then using a regex or a double string-then-int cast.
You know there is an entire extension dedicated to validating and sanitising inputs right?
All your type checking and regexes and double cast comparisons could be replaced with:
if (($value = filter_var($value, FILTER_VALIDATE_INT)) !== false) { doStuff(); }
> You could try to blame PHP, but really it's the developers
> ctype_digit is broken. Try passing integer values.
Well, ctype_digit takes strings, not integers. So don't be surprised if you pass the wrong type to a function and it doesn't work as you expected.
Some of your criticism is valid, but you can't go around talking about how PHP isn't rigorous enough, and then complain tha some functions don't work as you'd like when you give them a wrong argument type.
Your other arguments are more about bad developers as you say it yourself, anyone who actually cares about what he does knows you have to check equality with ===, while the array argument problem is less well known, but actually almost unrelated to PHP: POST or GET is user data that can be any type and should be checked. Only the last of your examples is actually a problem to me.
The problem is that the behavior is not consistant. Php is some parts c, some parts java and some part perl. That is the problem. It takes a encyclopedic knowledge of the documentation to know what part you are dealing with. And even that might not help you sometimes, because the documentation can be plain wrong at places...
>anyone who actually cares about what he does knows you have to check equality with ===
Can you write php code to store some string to string mapping in a php array and further down, check if a particular key exist in that array?
If you know C, you can identify pretty well what are the (thin) PHP wrappers around the C routines. Java influenced the OO design, so you know where to find it, and Perl is mostly, well, PCRE. It's not consistent, but it's not _that_ hard to navigate.
Well, It is not that easy. For example, take the function strlen(). You can see that it is a wrapper for the C function.
So can you expect that it will behave like the c function, accepting strings only? No! It now accept both strings and integers. So you have part perl there.
Now take another function. ctype_digit(). I don't know where the name come from. You expect it to behave like strlen() accepting both strings and numbers. But no!
If you pass it a number, it won't even bat an eye (throw an exception or error), but it will just return gibberish...
Hope my point, that these influences are mixed together in a haphazard fashion, is a bit more clear now...
It's more about automatic type conversion than the API, though; numerics magically get converted to strings and vice-versa. This is convenient in some cases, especially for beginners who don't have to think about types, but it eventually bites you if you never realize what happens in your back.
This is not so much a PHP thing as it is a different way of thinking about things. If you know something is supposed to be an integer, you can simply force it to an integer before you do anything with it:
$id = intval(@$_POST['id']);
if (!empty($id)) { ... }
Note: 0 shouldn't be a valid value for something called 'id', since it's likely a db id; if it is, use something other than an empty() check.
That said, input is a problem. In a dynamically typed language, it's easy for beginners to expect HTTP and requests in PHP work the same way. In reality, you will be coercing from string to wherever you are working with, which could also be an array of strings, or vice versa.
Input rules would be nice. For example, we always want id to always be an unsigned integer in this context and email will always be... and so on.
Dynamic typing makes, in this case, two types look like either plain old dynamic typing or leads to believing input has a homogeneous type.
In any case, I'm going to take a mag glass to some of our code today. Thanks!
It is in fact documented; what I didn't explain is that ctype_digit treats integers < 127 as chr() equivalents. It's designed to juggle both strings and integers, which indeed works against php's usual method of type juggling. This is because ctype is a port or wrapper around the C lib which behaves as such.
Saying that superglobals are the worst thing about PHP is like saying "the worst thing about x86 Assembly is the mnemonics". It misses the point entirely. The worst thing about PHP is that it is fundamentally not well designed and therefore makes developing high-quality software much harder than it needs to be. It also makes developing extremely low-quality software easy, which could be good or bad depending on your perspective.
By all means, but having the language nudge people in the right direction makes a world of difference.
PHP, much like Javascript is terrible for new developers for this very reason.
Learning a "good" language for lack of a better term is no more difficult than learning PHP/JS and is always worth the effort, if anything learning "good" languages is usually much easier because they are usually internally consistent.
If all newbs picked up Java as language #1.. would their apps be better? Or would the really bad devs writing copy paste stack overflow code just be unable to understand it, so they would quit?
Like is it safer because it keeps out knuckle-draggers, or safer because it is actually safer? Cuz I can write some horrible Java code that will rival anything you can do in PHP
As I said, by all means you can write bad code in good languages. I'm not saying choosing a good language excludes all possible bad code, only that they provide some guidance on better practices.
So you mention Java. Java enforces OOP. Now OOP may not be the best paradigm always, however its a vast improvement on inline procedural PHP.
That isn't to say you can't write some horribly modelled Java code, but the fact that modelling tools are so explicit and forced on the user makes the user at least think about how to use them better.
Other peoples opinion may differ from mine but I maintain this is incredibly important in speeding up new programmers towards writing good code.
> by all means you can write bad code in good languages
I think the main criticism of the GP was the fact that you use the expression "good languages" without defining what makes a language "good".
> not be the best paradigm always
same as above, what makes a paradigm "best"?
> vast improvement on inline procedural PHP.
but why you assume that the majority of PHP codebases are written in an "inline procedural" style? Do you have any evidence? Regarding the "procedural" part, the only large project that is not OOP-based is Wordpress, and even there spaghetti code (which I assume is what you mean by "inline") is AFAIK frowned upon by the community.
> the fact that modelling tools are so explicit and forced on the user
You need to accept the fact that many people may not like the "opinionated" nature of some language, (in fact that inflexibility that you mentioned is something I dislike about Java); often, a language may or may not be the right tool for a specific job precisely because of those opinionated bits.
The statement I made is that more consistent and "opinionated" languages encourage better code. They don't enforce it, just encourage it.
It is my opinion that this is valuable.
I did define "good", internally consistent languages with strong guidelines for developers.
I made no statements about mature PHP codebases as they are irrelevant to my argument.
I do accept that people prefer less "opinionated" languages, I too fall into this camp, but I am no longer a new developer, as such this point is entirely irrelevant to what I was saying.
Nitpicking individual points whilst misconstruing what I said is neither useful or appreciated.
> Nitpicking individual points whilst misconstruing what I said is neither useful or appreciated.
It wasn't my intention, I'm sorry if my comment came off as nit-picky. I wasn't trying to misconstrue your comment, I genuinely did not get your argument (I think I now get it, thanks to your reply).
It rare that I see good Java code, especially that written by junior developers.
I think OO is a hard concept to get right. I know it took me years to master, and one of my epiphanies about OO design is that it's not always appropriate. Yes I can tell you the best OO approach to a problem, but I can also often tell you a better approach that isn't OO.
I often see people say that Java has pretty much been designed as (or at least evolved into) a way to let large numbers of mediocre programmers to develop acceptable-quality software.
This is the most pernicious and annoying technicality that advocates of low-quality languages invoke. PHP does not actively work against bad or just plain wrong code, and its construction actively encourages bad code. It's missing aspects that we know to be tremendously useful for writing high-quality correct code.
You can write low-quality software in e.g. rust, but you're going to work a lot harder at it. Rust (again, just as an example) also makes it easier to write high-quality software.
This is really the only metric by which you can judge the quality of a language, since in the end they're all (mostly) Turing complete.
So, I generally don't wade into this argument. I've been programming for 27 years, 12 of that professionally. In that time I've used a lot of languages for a lot of projects. Every language is capable of being used to shoot yourself in the foot TBH. The hate that PHP gets is, IMHO, mostly from the fact that it's a gateway language and as such often has a higher WTF per minute rate for the code you find than many other languages. Anyway, on to what made me post this.
> its construction actively encourages bad code
That's a statement that needs a reference to back it up.
We did the same thing with HHVM, and had VERY similar results; getting it to work was plain hard, and i had a lot of concerns about our ability to ever go back.
Before we ever launched with HHVM completely, PHP7 came out. With only a few weeks of work, we managed to make the switch. The gains were identical to what we saw on HHVM, only the experience of working with PHP7 was so much easier for everyone involved.
Having said all this, I think HHVM served a great purpose: It raised the bar and PHP is better because of that. All in all, a great outcome for the people of the Internet.
Hack's influence is all over PHP7, unsurprisingly. As someone still bound to PHP due to technical debt, I'm thrilled this happened. PHP still has warts, but changes in 7 are tantamount to ES5 :: ES6. The language feels more mature, real, sensical.
Of course, but they won't do that because of backwards compatibility and I get that.
It's one of the nice parts of rebranding. Hack could keep and throw out anything they wanted because it was intended for private FB use. At some point, PHP will have to start cutting off the stdlib PHP4.x warts. There's enough about PHP 7 that's good enough to be compelling to anyone working in an interpreted language on the web, but the (well earned) reputation keeps a lot of people away.
At ServerPilot, we decided early on not to support HHVM for similar reasons: we could see PHP 7 was going to offer the same performance benefits without the pain, breakage, and downtime of HHVM.
Early on, before PHP 7 was released, we had to explain this to many of our users who use ServerPilot to host WordPress, Magento, Laravel, and other PHP apps. They often thought there was no downside or risk with HHVM, it was as simple as dropping it in as a replacement. Nowadays, with the hype around HHVM dying down, we don't get requests for HHVM support much anymore.
For a huge company like Facebook, HHVM makes a lot of sense. And the existence of HHVM really sped up the PHP 7 development efforts and provided a great benchmark for how fast PHP 7 could be. So, the PHP community should be very grateful to Facebook for that even if HHVM isn't the future of PHP.
> It took less than a week to migrate our codebase (a 10 years old PHP monolith)...
> And it took 4 hours to migrate our custom extensions.
That seems like a very small amount of work; I'm impressed at how smooth a transition that must've been.
I'm also quite surprised that
> we can handle twice more traffic with same infrastructure.
Wow, I didn't think that PHP application code would be such a bottleneck. Maybe it's not that, but if the entire codebase is written in PHP, and you replace it all in one shot, you just get such an improvement. But I thought DBs, etc. would play a bigger role.
Hey, I'm the author of this blog post, I'll try to respond to your questions:
> Wow, I didn't think that PHP application code would be such a bottleneck. Maybe it's not that, but if the entire codebase is written in PHP, and you replace it all in one shot, you just get such an improvement. But I thought DBs, etc. would play a bigger role
Indeed front-end servers not the only bottleneck to handle more queries. We also made data migrations on mysql databases to optimize memory utilization, two months after the php7 migration. Code migration and the validation that we hadn't introduced new regressions / errors by redirecting a small percentage of the traffic through two servers with php7 configured during few days before full deployment. Full deployment on our production farm (more than 250 servers) was done in less than two hours with the possibility of rollback)
>> It took less than a week to migrate our codebase (a 10 years old PHP monolith)...
>> And it took 4 hours to migrate our custom extensions.
>That seems like a very small amount of work; I'm impressed at how smooth a transition that must've been.
We used phan and phpcs to discover our incompatibilities, it doesn't find 100% of problems, but it really reduced time to find where there was backward incompatibilities. It's a first step before unit tests / small load test on production. I wrote a small blog post on how to use this tools to migrate your applications : https://medium.com/@colomb.thomas/php7-how-to-migrate-your-a...
For a lot of array-heavy applications (where you store all kinds of data in giant multi-level PHP arrays), the memory usage alone counts for most of the speedup; instead of wading through tens or hundreds of MB of array structures, PHP 7 trimmed things down by a factor of 2 or more.
There are a lot of PHP apps/CMSes/etc that gained 30-50% speedups due to just that improvement. Other more optimized apps/scripts saw a much more modest gain.
IIRC, a PHP array entry had 127 bytes of overhead. PHP 7, that went down to 42(?). Also, IIRC, for JVM, it's .. 37? 40? PHP7 got array overhead down a lot, and I do believe that's where a lot of speed improvement came from (though certainly not all of it).
In the world of PHP 7, blocking I/O will still be a problem (at least, it was when I looked at the proposed feature set over a year and a half ago), but in PHP 5.x, the Zend engine is actually just incredibly inefficient, to the point where it is often the largest bottleneck on the request path.
have you ever loaded a stock magento server? Set it up, add maybe 10 products with basic images, and turn it loose.
Give it a reasonable box. 2 CPU cores, 2 GB of ram.
You are capped at something like 3-5 requests per second, with an average load time of 5 seconds..
Just blows my mind. Simple Java web app on the same server is doing 500 requests per second. Python app, with the horrible gil and all that nasty is going 200 requests per second. And magento is rocking 3 requests per second??!?!?!?!?!
I wonder how much global energy consumption would go down if PHP was not a thing.
comparing "simple java app" to something as complex (overly? needlessly in some cases? sure) as magento is nowhere near apples and oranges. compare it to broadleaf or konakart, maybe. I've no doubt java will probably still be faster, but it won't be 500 rps vs 3 rps.
At a guess, you left everything in development mode. I've seen similarly specced hardware easily handle around 100 requests per second just by configuring Magento and Opcache for production (as per documentation).
Without getting into a pissing war if Brian says he's done something on the JVM, trust him.
Also if you need someone to validate that their system does 500 qps, you really need to check your assumptions. I'm trying very hard to think of what kind of system I'd build that would do less than that (hint each q would be big)
With all due respect, no. It's trivially easy to just claim that you've done something that happens to anecdotally prove the point you're making. I could say I've written a PHP app that gets an easy 1000 qps without flinching.
Without anyone dropping any factual proof my app is definitely better.
I used to run one of my site (25K unique visitors a day) on PHP 5.3, when HHVM came out with stable version I shifted to HHVM and I had similar experience. Now I am running it on PHP 7 and I have to say I am more than happy with results. As much as PHP is not cool for today's developers it has served on some really high traffic sites and stayed useful even with the test of time.
P.S. Now I wish somebody just implements a good Async IO system and ability to run HTTP right off the PHP engine (I know there is php -S ...; I am talking about a better async system).
This seems interesting hopefully someone will pick this up and make even the mysql_* and other sync functions async too. This could be final nail in the coffin.
We're currently building a pretty large production system in it. It's got a few warts, but it's damned nice, and it's compatibility with ReactPHP (event-loop, not the front-end tool!) is super useful!
You may be looking for React PHP (http://reactphp.org). No relation to the JS library. It is an asynchronous event loop implementation, and there is a native HTTP stack built on top of it.
Can you explain the draw of Async I/O? A single request will not be faster, but you may get more concurrent requests going due to running some while some are waiting for I/O to finish? Is that correct?
When you start talking about multiple concurrent connections, and considering performance of an application as a whole, Async I/O at the application level offers basically no speed benefits over any kind of Sync I/O runtime that can run in parallel.
What an Async I/O model can do is allow the amount of resources consumed by parallel execution contexts to be reduced - therefore allowing you to service more concurrent requests in parallel. But not necessarily any faster, if I/O is your bottleneck in the first place.
It can also be used to give more predictable performance under loads with i.e. response times, if the responses are not dependent on I/O operations to complete.
> In other languages you can't get that without using a standard library that will escape the values by default.
Escape for what context?
Escaping for SQL is different from escaping for HTML, which in turn is different from escaping for JS.
How does your hypothetical Request object know how to escape any given variable? Does it ping every open database handle to figure out how they want their data escaped? Does it use some kind of static analysis to figure out in what format (HTML? XML? JSON? CSV?) the app is going to spit out the value later on?
Or does it simply run a bunch of cargo-cult functions like
Had a similar experience deploying HHVM at a previous company, I wrote up a blog post of the issues we ran into / how we worked around them[0]. One thing the dailymotion blog omits is hacklang which has additional features like lambdas, async support (though PHP 7 will soon?), strict typing, collections, generics and more. That said, if you're just trying to squeeze more out of an existing codebase, then PHP7 wins hands down.
I would assume by this point that PHP devs would be fairly confident and comfortable with their decision to continue with PHP and would be used to others bagging on it unnecessarily.
As a JS/Web developer you learn to ignore the hatred of the web that it seems to get from the HN crowd.
>As a JS/Web developer you learn to ignore the hatred of the web that it seems to get from the HN crowd.
As an occasional full stack developer (not by choice), I can confidently say that the reason people hate on popular web tech is that it is uniformly terrible compared to non-web tech. I'm no fan of Java, for example, but I'll take it over PHP any day. JavaScript is so bad that I (and many other developers) will put a lot of effort into using any alternative, such as typescript, purescript, Elm, etc.
This is the typical "hatred of the web" that I usually ignore. ES2015 brought a ton of huge language improvements that are still filtering out into usage, Babel means you can use them all now without waiting for browsers to implement them, Webpack gives you a ton of flexibility for packaging it, Eslint allows you to lint in a completely pluggable way, NPM (and now Yarn, which fixes many of NPM's problems at scale) allows you to effectively manage dependencies, Typescript or Flow allow you to incrementally add the benefits of static types, and Javascript's "functions as a first class object" allow it to behave as a powerful functional programming language.
It's very possible to write--and deploy--very high quality Javascript today.
The length of that paragraph and the number of tools mentioned is exactly one of the problems of web development. It's like missing the forest for the trees. And even with all the huge language improvements, it's still no where near the capabilities and safety of non-web languages.
But I don't disagree that it's possible to write very high quality JavaScript code -- it's just a little bit painful.
heh. I find the fact that this conversation happened to be extremely interesting - it's almost as if people are missing the point :)
In the real world, changing web development to give it the "capabilities and safety of non-web languages" is extremely difficult to do on any sort of timeframe because it needs to be supported AND backwards compatible in all browsers. Realistically speaking, how do you 'fix' web development? How could you make it better?
The modern Javascript ecosystem is a realisation of this and it does the best it can do given the shitty situation it's in - using tooling and preprocessing to give it some features from other types of development, like static types!
It's certainly a little bit painful, but that's only because these things are brand new. These tools let you create applications most comparable to native apps, and could you imagine developing for iOS or Android without a Xcode or Android Studio? The current trajectory is very, very good, and it's with a bunch of tools and ideas that came from the community.
"My language that I use all the time (definitely no Blub paradox here) isn't bad, look at all these random features it has!" Sorry, but that isn't a reasonable argument. Having used all the stuff you mentioned, and many other languages, JS is relatively not good.
> Typescript
Is essentially an entirely different language. But I agree, it's a vast improvement.
> It's very possible to write--and deploy--very high quality Javascript today.
But the language doesn't actively assist in precluding low-quality code, and most production JS that wasn't transpiled is low-quality.
Take the attacks with a grain of salt, although many of the criticism is true, it is not exclusive to PHP, ex. the unexpected type coercions.
OTOH, most of the performance benchmarks done against PHP (ex. PHP vs Python) usually mean "which language is faster at crunching numbers". I/O operations like reading a file from disk or running a query against a database are an order of magnitude slower than number crunching, so any of the gains you can get by switching languages become effectively negligible, unless you really care about nanoseconds.
Remember to always pick the right tool for the job, you won't use PHP for number crunching the same way you won't pick C++ to build the minimum-viable-product website of a startup.
To be fair, "number crunching" starts to become really important when you need to, oh, work with the large set of data that you pull back from the database.
But in saying that, PHP7 is actually pretty fast. Probably faster than CPython for most of the algorithmic tasks you might run with data from a SQL database, for example.
Hard to separate yourself from your decisions and your toolset sometimes! I certainly may check out some alternatives at a hobby level and pursue them further if they appeal to me. But for now the money in my rural city is in PHP and to a lesser extend, .NET. Don't think I've ever seen a job asking for Python, Node, etc. that didn't require an hour + commute.
If you're (or anyone else) interested, my company is hiring senior PHP engineers. Modern tech stack: PHP7, MariaDB (MySQL), Redis, distributed workers, Debian, AWS, Solr, data mining/analysis. Competitive salaries, fully remote, vacation, retirement, etc. Contact me at meritt.hn@gmail.com
If you jump on a new language early enough and are proficient enough to be productive in it, you have a chance to actually land some jobs with it without having to take a pay cut. Letting people know and engaging in the community is a requirement though.
You'll meet other people passionate about it and ultimately network is everything ;)
Then again I end up with php jobs anyway because I enjoy sharing my knowledge (coaching, improving way of working, etc) but at the same time surround myself with people who I can learn from. (Be it business, architecture or a different language, all relative knowledge is valuable)
>surround myself with people who I can learn from.
This is my biggest struggle. I am the only dev in what is primarily a graphics place. I've come a long way on my own since I started here, but I miss having a mentor or at least someone whose code I can look at and learn from and know I'm looking at GOOD code.
Hey, I'm the author of this blog post, I'll try to respond to your comment :
This is not an antisocial and greedy attitude, every society have this problem. Dailymotion invests a lot in teams, and servers. But sometimes you have to think differently, "is there something to do before I buy new servers / rewrite all our application ?", to let more time for co-worker to implement a new architecture.
This view is pretty common for people reading their first language written by someone using their second or third language. This comes people that just isn't as good at writing and expressing themselves. This does by no means make their thoughts less valuable. Sometimes you just have to open your mind to other people even though the words come out in the wrong order.
English is not my first language either but I don't write "I'm sorry for my mistakes, english is not my first language" because that should be pretty obvious. And if it's not obvious, there is no need to say it, right?
I just think I'm fortunate that my first language became the de-facto language for technology and that anyone who speaks/writes it well enough to be comprehensible about technical matters in a 2nd language deserves my respect!.
My GF finds that I know multiple programming languages impressive while not realising that her ability to speak English, Hungarian and German fluently leaves me in awe.
It wasn't mentioned in the post from Slack, but default superglobals and the earlier register_globals design decisions are the worst and most impactful wart in PHP.
Because it was designed as a templating language, the default web server interface, which is CGI - will auto-expose all variables in global scope, ex.
PHP has a horrible reputation with security for this reason - we all know that somehow, somewhere, in almost every project someone is pulling in a user-controlled variable from a superglobal and they aren't escaping or checking it properly (since you can't be warned about it but the feature will work).Worse - and i've seen this a lot, even with Laravel, CodeIgniter, Cake, Symfony/Silex etc. you end up with these well structured projects that declare request classes, methods and variables etc. etc. but then sometime down the road a developer takes a shortcut in a method and pulls in a $_GET or $_POST inside a controller (usually because they don't know how, or aren't bothered to - changing all the related classes) - running around the default exec stack.
I've seen this so often - because it's so easy to do it. The most common place is where a designer has built a frontend AJAX form. They now need to build a quick backend check, so they Google "php backend ajax username check" and they'll likely get a result like this one:
http://stackoverflow.com/questions/29459183/check-username-a...
where the 4th and 5th lines to the solution are:
they copy that into a file called ajax_username_check.php and save it to the server - and they've now destroyed all that previous good work by opening up a very blatant and easy to find SQLi vulnerability. Their database will be on pastebin within a month.You can spot this type of vulnerability from the frontend because the URLs used in the AJAX calls don't match the URL router patterns for the rest of the app (ex. GET /user/username_check_ajax.php vs /user/check_user).
In other languages you can't get that without using a standard library that will escape the values by default. Any solution you search for will always be a safe method to obtain the variable values by default.
Some good news: Hack doesn't expose superglobals in strict mode:
http://cookbook.hacklang.org/recipes/get-and-post/
I'd strongly recommend that this is used in all PHP projects, since it strictly enforces variable access - even in cases where you're using a framework that is supposed to enforce it.
IMO PHP missed a big opportunity with not removing superglobals in version 7 and enforcing an explicit safe request object much like other languages do. They likely wanted to avoid it because of the cluster of register_globals and magic_quotes from earlier versions.
[0] I think it is important to distinguish PHP the language and PHP the runtime. PHP the language is now decent - having caught up with a lot of features (although I find it very verbose and harder to read) while PHP the runtime is undoubtably still a horrible runtime - hence HHVM