This seems very likely not to be a programming style choice but some kind of optimization trick by a JS minifier/compiler. I don't think this has anything to do with the typical semicolon war.
I doubt any remnant of a programmer's semicolon preference would make it through a compiler that performs this sort of optimization. The code is likely going to be transformed into an intermediate form (after applying ASI rules) and then that will be transformed into an optimized output.
There was a story here a while back about how the Twitter/Bootstrap developer was arguing with Douglas Crockford about whether or not he should have to insert semicolons. The cause of the argument was the fact that the minifier was changing the meaning of the code, IIRC, by preserving semicolon preference.
Right, but that was JSMin, which is more of a souped-up find and replace, whereas this comma-replacement strategy is what you'd expect from a much more sophisticated compiler which does static analysis and complex code transformations to squeeze out more than you can achieve by simply compacting white space and deleting comments. That level of modification would be crazy and fragile if you were modifying the original source code, so it's a fair bet that ASI is irrelevant by the time you're looking at the compiler output.
Purity. Read the issue tracker threads about it, he's militantly against semicolons, even when it's demonstrated to introduce problems with other software (like minifiers).
And Google has a long history (years!) of making Opera appear as bad as possible. One of the classical things were that they had more sites that declined to work at all but worked with no problem in Opera if you force Opera to report itself as "Firefox."
I bet Google intentionally wrote Closure Compiler to use commas, with the knowledge that Opera has a bug with too many commas, so as to intentionally cause Opera to mess up and look bad.
Let me guess, you never used Opera yourself? Otherwise, you wouldn't question the facts, you would have lived them: Google sites intentionally blocked Opera based only on the User agent.
The most recent intentional annoyance is less than two weeks old:
"(...) it's a mark of pure arrogance from a company that isn't afraid to act like Microsoft (1998) when it needs to muscle out a competitor.
The Google roadblock for Opera is crude. If you change the User-Agent string for Opera so that it identifies itself as Google Chrome, the Blogger editing and management screens work perfectly."
It's not a "roadblock for Opera", it's a nag bar that appears for every browser with low market share.
They don't say it doesn't work, they just say they don't support it, so that people won't assume that being broken on Opera means it's broken on every browser. And don't claim that "if they code to standard it'll work" because this very story proves that's not always true.
I do think Opera deserves being supported, and Google shouldn't be pushing Chrome like that. But calling it a "roadblock for Opera" is misleading.
"And you cannot make those nagging messages go away. Any visit to a page in the Blogger content-editing interface results in this nag screen, and although you can dismiss the message, it will keep coming back."
But none of that is relevant because it just isn't very plausible that Google would intentionally exploit a bug this way in their compiler specifically as a way of attacking Opera. They used a valid optimization technique that does save some space, and does conform to the standard. Did they likely test it extensively in Opera? No. But to act like this is an intentional offensive move is conspiracy theory level nonsense.
Please reread this thread: it's documented that they make intentonal offensive moves. They do them openly, nothing hidden. It's sad that such acts are ignored: as the article says, they are as bad as Microsoft in the worst days of anticompetitive behaviour. Check who here tries to hide this by inventing the "conspiracy theories."
I'm not denying that Google has had an antagonistic and underhanded strategy against Opera. What I'm denying is that they intentionally designed their compiler to exploit some obscure corner case in Opera so that very large JS files fail as some master plan to break Twitter on Opera.
It's not that I think Google is above using dirty tactics. It's that I think Google is above using dumb tactics.
I mean, are you seriously suggesting that Google assigned someone to comb through known Opera bugs, looking for one that they could exploit in their compiler while masquerading the exploit as a legitimate optimization? And then they assigned someone to implement that optimization? All in the hopes that someday a 4.5MB JS file would come along and break a popular website under Opera? And ignoring the facts that a) Opera might have fixed the bug by that time, b) said popular website might permanently switch to a different compiler when that happened, and c) it would take the Opera team a small fraction of the time Google had spent on this to put out a patch?
I haven't seen anybody before you in this thread describing the very approach you write about. The answer to your "are you suggesting" is: no, it's you who are suggesting.
My point is, in the light of the all bullying tactics against Opera, the community should definitely have less tolerance to Google than it has. In five words: Monopolies bad, supporting Opera good.
Well that seems like some pretty severe backpedaling, since by bringing up past offensive tactics by Google, it was imminently apparent that you were implying this was another example of an offensive tactic.
But regardless of who is good and who is bad (and trust me, I have plenty of bad feelings toward Google myself), this was fundamentally Opera's mistake, not Google's. There is nothing in the ECMAScript standard that says that there is an upper limit on how long a statement can be, and that's the end of the matter.
It is certainly an another example of an anti-competitive tactic and the most probable scenario they achieved it was by ignoring Opera's existence on as many levels of their organization as possible, otherwise it just wouldn't happen -- if you know your "clever trick" doesn't work on one of the browsers, you wouldn't implement the "clever trick" as an universal solution. As such, it's not an accident, it's a consequence of their political decisions (something like "to all our teams: we don't like Opera, do pretend it doesn't exist, do force their users to switch" etc). It's bad enough, no need for more complex "conspiracies."
I worked at a company a year ago that was all gung-ho about front-commas for no reason in particular, leading me to believe there's a certain amount of cargo-cult'ing going on here.
It shouldn't be necessary to write them in the first place.
Anyway in this case I guess some automatic code reduction tool has decieded that four characters can be saved by changing it to commas. I doubt it was written that way to begin with.
> It shouldn't be necessary to write them in the first place.
I love Go's lack of semicolons but I don't extend that to some foolhardy attempt to write JavaScript without semicolons. In JavaScript, support for proper semicolon-less style is clearly lacking in practice if not in spec (if it weren't lacking this wouldn't be a oft-recurring story, but it is).
The title of this post was changed from "Lack of semicolons causes Twitter to crash in Opera" to "Twitter crashes itself with commas", long after it hit the front page (and in fact got all the way to #2); this sadly drops the (as far as I'm concerned) critical note that this bug only affects Opera, and makes it sound like this is Twitter's fault, when this is really Opera's fault.
On the other side, this is why many of the comments, including the top comment, seem overly invested in the notion of semicolon insertion, as opposed to looking at the specific usage of replacing sequences of ExpressionStatements (even if separated by semicolons) with a single ExpressionStatement holding a very long compound Expression (separated by commas).
Headline makes it sound like there is a bug with Twitter. In reality they are doing something totally legal, if a little gross to look at IMO (C / C++ embedded coder by training).
The actual bug appears to be not a bug so much as a limitation in the Opera parser with regards to how many comma delineated calls it can handle at a single time. Somewhere around comma 1019 things get messy.
The headline used to mention Opera. Then an overzealous mod edited it to match the page title, which leaves out the context that this is an Opera-specific blog.
"give the impression that twitter is somehow at fault"... which it is. This is definitely partly Twitter's fault for craming all of their code as one humongous instruction (comma) rather than multiple lines (semi-colon.) It's also obviously partly Opera's, for not handling terrible code such as Twitter's as well as they should have.
"Terrible code" is this sort've nebulous subjective thing. Clearly, to you and I, this is terrible code, though probably created by a minifier / compiler.
Lots of code can be terrible (We've all written some in our time). However, if terrible code is valid it's the interpreter's (in this case) problem to figure it out. The ECMAScript spec is nightmarish but we get the spec we deserve :). Twitter has ZERO fault here. They wrote terrible valid code and the parser / interpreter should be spec conformant no matter the pain.
I would say that not accounting for a browser bug is definitely their fault. Remember that they have a responsibility to their users to make their site work by accounting for bugs. Designers spend a lot of times meticulously working on CSS to account for all of the various browser rendering quirks. Developers need to put the same care into their JavaScript.
Perhaps Twitter has chosen not to support Opera which is fine by me (even though I'm a user), but I think more than likely this was a slip-up in their QA process.
Right. "70's-style static limitation in Opera parser crashes Twitter" would be more like it. Seriously guys? You don't get to do dumb things just because you don't think syntax will be used....
If I have a machine with many gigabytes of memory and ~100MB of stack space available (the latter requires some tuning at thread-creation time with current runtimes, but the former is frankly routine)? Yes. Yes, I'd expect my software to handle it correctly.
The fact that you think otherwise is precisely the problem. If you don't want to write correct software (which, frankly, isn't any harder than writing broken parsers) you might want to reconsider your career choice.
OK, Lets see. Do you think you should be able to create a billion files on a default ext3 file system without issues? Because you won't be able to, even if you have the space for them.
Unfortunately, sometimes you have to put in some sane limits, because of constraints elsewhere.
To call software that has sane limits to cover 99.9% of usage patterns "broken" is pretty silly.
If you think as a software developer, you never have to make compromises, or make a tough call on what 99.9% of users will do, you're being naive.
Fair enough. I was terribly impolite. Though to be fair, sticks (via public shaming, in this case I guess) work as well as carrots often. The "coming to the defense of clearly broken software" thing (which happens far, far too much) is a pet peeve. People who make arguments like that are generally the ones who write code like Opera's parser.
Public shaming also only works if the shamer knows they are right. I love giving a shaming as much as anyone else (maybe more?), but it's not a posture that's compatible with learning. It requires becoming entrenched.
For a very simpel reason -- you are not implementing a new language, but writing a parser/evaluator for an already existing one. And you shouldn't change the language (as much as all of us have a beef with Javascript).
So from the looks of it, the problem is in Opera's javascript implementation.
I'm not sure how much Google uses Closure Compiler internally but I wouldn't have thought they would use code they knew to cause issues in some browsers. I guess that ether means the Opera implementation is wrong and the code should work fine, or Google didn't test this as fully as they should have done
Uglify uses the comma trick a lot, and from my experience, Closure Compiler doesn't rely on it very often.
Regardless, the actual bug is in Opera as the resulting javascript is legit, even if its absolutely crazy to have 4MB of script be only a single javascript statement.
That "explanation of why", when you take the trouble of reading it, is just an explanation of how. The article ends with:
Conclusion - Using the comma trick to do { }-less indentation is far from viable. However this may still be useful for debugging and overall it is fun to try new coding styles!
Reading the article and the comments made it more clear to me. Twitter is using some minification tool (maybe Google Closure) which attempts to reduce the number of characters used. Using commas you can reduce two characters by avoiding some brackets.
We already knew that we had a limitation that other browsers don't have in the parser, and it was already considered important to fix. Breaking Twitter of course raises the profile of this bug.
Can someone please provide a valid, technical reason for not using semicolons other than "it doesn't say I hafta so I'm not gonna!" (that quoted bit should be read as if a 5 year old throwing a tantrum was saying it)
JavaScript minifiers use commas to avoid braces in control statements. The following two statements are functionally identical. Apparently the Closure Compiler goes overboard and just uses commas everywhere, even when the comma offers no length advantage over semicolons.
Ok, but any front-end engineer worth their salt should be used to the fact that the browser will never be a perfect implementation environment and be willing to adapt around known issues to produce functional code that works for your users. They aren't going to notice that 1% file size. They are going to notice your page not working.
Yes of course working in every browser should be the priority, but it is only obvious that this particular optimization would cause a problem in hindsight.
As for "they aren't going to notice that 1% file size", you could keep saying that about small optimization after small optimization until your compiler's output is 10% bigger than the competition's, at which point you will start losing significant user base.
A 1% reduction in output at Twitter's scale isn't as paltry as you might assume. Engineering decisions are made for both server-side and client-side reasons.
You're making a blanket statement about Twitter developing bad code because of a recent bug. It will be fixed.
I kind of wonder about that reasoning, though. I mean, no it's not trivial if you imagine having to foot the bill for it yourself, but proportional to the kind of money Twitter is dealing with, it's, well, still just 1% of a JS file.
But I don't think we need to think of it in terms of Twitter's scale at all. The point is that saving 1% for basically free, is saving 1% for basically free, and that's worthwhile no matter how big you are. On its own, it's a drop in the bucket, but the cumulative effect of many small low-opportunity-cost savings is a significant low-opportunity-cost saving.
I write Javascript for a living, and the code standard requires semicolons.
Anyway variables in javascript have function scope, so two variables with the same name in the same function always refer to the same value, even if declared twice.
As a result I like to write all the variable statements at the top of each function, all in one var statement.
The problem then becomes that you write var a, b, c, d; and then later want to add variable e. If you aren't careful you might accidentally write var a, b, c, d; e; if you do that, the program may still work but e now has global scope. This results in nearly impossible to find bugs that may only happen if the function is recursive or it is called (perhaps indirectly) by some other function with a similarily named variable.
Considering that semicolons insertion mostly works, and the other can be a huge pain in the rear end, I can understand why people would want to skip the semicolons.
I think it's generally a good idea to avoid declaring multiple variables in the same line (unless you have a very good reason). The space you "save" isn't worth the maintainability penalty. This is a habit I have from C where
int* a, b
declares "a" as ( int* ) and b as ( int ). Instead, I just write
The JS interpreter shares your preference of writing all var declarations at the top - in fact, at interpretation time they will be "hoisted" to the top of the scope. In other words:
But this is a simple error caught by js(h/l)int. And it's especially easy to spot if you just declare each var one after another:
var a,
b,
c,
d;
e;
Then your editor (which you've a integrated a linter... right?) will protested:
yourJsFile.js |5 warning| 'e' was used before it was defined.
And if you have a global by the same name you'll see:
yourJsFile.js |5 warning| Expected an assignment or function call and instead saw an expression.
Granted, if you assign it a default value then, yes you're going to have a bad time. But if you follow the white spacing rules of jsLint it will catch the obvious error.
This headline is misleading, especially given how much people love to argue about semicolon insertion. This has nothing to do with semicolon insertion. It's just about an artificial constant limit in how large a certain type of parse node (comma expression) can get in Opera's JavaScript parser. (Of course, Twitter should have tested in Opera!)
> that "bundle/t1-more" file is a whopping 4 129 653 characters.
I'd have to imagine that 355kb is the compressed size, but that character count (assuming one byte characters, not UTF8) works out to be 3.938 megabytes.
That's a lot of code. Even more when you consider it's been minified.
Why do we have to imagine and then speculate from there? This is trivial to inspect.
It's 1.34mb on disk uncompressed but still minified. The copy Twitter is hosting right now is 1409121 characters minified but uncompressed. I have to assume the 4mb mentioned is output from some middle step (closure compiler output?) I'm not arguing the merits of that filesize, just that it's definitely not 4mb.
I ran into this problem last night and ended up looking at the script. File size is 4,129,653 bytes, on a single line. I had it saved to check out what the problem was, so here it is if you want to check it out yourself: https://dl.dropbox.com/u/24903613/hn/twitter-4MB-line.js
Interesting. The difference could be A/B testing or serving up different scripts to different browsers.
At a quick glance you can definitely see some differences between the different scripts we were served. For instance, in the one you were served there are some unminified portions that include the license notice for some of the code (easyXDM). In the script served to me there are no unminified portions, with everything being served on a single line.
According to Firebug: roughly 250 KB is transferred, half of that is lazy loaded (after DOMready, I suppose) and it looks like the biggest file (75KB) is from Google+.
All scripts uncompressed (not un-minimized) weigh about 740 KB.
The main page for the site I work on makes extensive use of jquery, jquery ui, backbone, a crapton of custom javascript, and more of the OpenLayers library than most places I've seen. It weighs in at 330kb and has a decent bit of fat to trim.
What on earth are the people at twitter doing that warrants that much code? Maybe their single page app -> dedicated page transition was more about disabling some things and rewiring urls with the plan of taking advantage of the dedicated setup later?
It's been a long time since I looked at the Twitter API, but I remember some of their API responses being hilariously inefficient. I don't remember the specifics, but something like retrieving a user's tweets the JSON would include a user "object" with every tweet message. The response would be something like this:
[{user: {name: 'jlarocco', userid: 1, /* 50 other attributes */}, message: 'my first tweet', id: 1, time: 123456},
{user: {name: 'jlarocco', userid: 1, /* 50 other attributes */}, message: 'my second tweet', id: 2, time: 123498},
/* similar thing 18 more times */]
If the rest of their stuff is like that, a 4MB Javascript file doesn't surprise me at all.
Now that Twitter is 'stable,' the API is being overhauled and they are moving away from being a single page application.
You forget Twitter has been dealing with technical debt for years. The fact that Twitter still runs on Rails and serves as much traffic as it does is testament to what the engineers there have built.
I think recently they added some options to omit nested informations in apis.
OTOH, duplicated data _is_ easier to handle and produce and probably on the wire efficiency wasn't a big concern in the original twitter app.
It reminds me back in the day of the 'to html' converter in Microsoft Word. If you had a table in the original document, the converter would needlessly specify all the typeface information in each individual cell...
C, Pascal, and ALGOL (1958) use semicolons as a statement delimiter. Why was the semicolon chosen instead of, say, the period or newline? BCPL (like JS and Go) allows semicolons to be omitted if a statement ends unambiguously on one line.
Was the semicolon a QWERTY home row key before or after ALGOL?
Start with ASCII. Now take out the characters which did not reliably appear in character formats, take out characters that have meaning to us in mathematical expressions, have obvious utilities as delimiters in other context, or which convey mood. You're left with ; and :. Of the two, ; is more visible and easier to type. So ; it is.
For what it's worth, there's a precedence for it in formal, legal-style writing. UN resolutions, for example, are always written as a single sentence[1].
It makes sense when you think of an entire resolution (program) as a single goal (output/result), which involves many separate steps/clauses (statements) that need to be executed.
A period is a terminator; it says 'This thought ends here', while a semicolon is a delimiter (ie, the clauses it separates are independent grammatically, but not contextually).
I have seen the same thing in mayoral declarations or the like.
Whereas John Q. Doe is an upstanding citizen who has contributed to the city, and
Whereas aforementioned Mr. Doe is an awesome dude, and
Whereas some people who happened to contribute to my campaign are fans of Mr. Doe,
Therefore, I, Mayor Wile E. Coyote do hereby declare Octember 31st, 2012 as John Q. Doe Day in the city of Acme, CO.
In this context, the UN resolution is actually similar to the comma-style used here by Twitter. We don't want the statement/sentence to end, so we use semicolons/commas to make it continue, and end up breaking browsers/non-lawyer readers.
It's very common for javascript compressors to reduce statements to expressions if that's possible. Even uglify-js does that and that ones is not particularly amazing.
I remember reading about some twitter dev who apparently hated semicolons and went far and beyond to avoid putting semicolons. He was a bootstrap dev i IIRC. I tried googling but couldn't find anything again. maybe this is related?
I saw a couple of tweets today by Chrome users who were wondering where the 'new tweet' input form field went. So not sure if this problem is limited to Opera. (I am using Chrome as well and didn't have any problems but still.)
When you replace a thousand of them with commas, nothing has been gained!
When you start placing them only at the beginning of certain lines, subject to JS's parsing rules, you are thinking more, not less!