Gangnam Style breaks YouTube viewer count

gavinpc · on Dec 3, 2014

Can't quite tell if this is a joke, but here's a related "story about a bug" from Doug Crockford [0]:

    I made a bug once, and I need to tell you about it.  So, in 2001, I wrote a
    reference library for JSON, in Java, and in it, I had this line
    
        private int index
    
    that created a variable called "index" which counted the number of characters in
    the JSON text that we were parsing, and it was used to produce an error message.
    Last year, I got a bug report from somebody.  It turns out that they had a JSON
    text which was several gigabytes in size, and they had a syntax error past two
    gigabytes, and my JSON library did not properly report where the error was — it
    was off by two gigabytes, which, that's kind of a big error, isn't it?  And the
    reason was, I used an int.
    
    Now, I can justify my choice in doing that.  At the time that I did it, two
    gigabytes was a really big disk drive, and my use of JSON still is very small
    messages.  My JSON messages are rarely bigger than a couple of K.  And — a
    couple gigs, yeah that's about a thousand times bigger than I need, I should be
    all right.  No, turns out it wasn't enough.
    
    You might think well, one bug in 12 years you're doing pretty good.  And I'm
    saying no, that's not good enough.  I want my programs to be perfect.  I don't
    want anything to go wrong.  And in this case it went wrong simply because *Java
    gave me a choice that I didn't need, and I made the wrong choice*.

[0] https://www.youtube.com/watch?v=bo36MrBfTk4&t=38m

EDIT: is there a reference for formatting comments? I've never been able to find one.

danbruc · on Dec 3, 2014

He did not need the choice but others do. And he is wrong when he says it makes no difference whether you use a byte or eight of them. Yes, it will take the same amount of time to add two of them but it will also cost eight times more cache space and memory bandwidth to move them around. It may not be an issue if you have a single number or ten of them but it certainly becomes one if you have an array with millions or billions of them.

rtpg · on Dec 3, 2014

There are use cases where you need the choice, but most people do not need the choice.

Most programs written in the real world (enterprise-y Java apps) do not need strong control on GC, choice of integer types, or many other things offered to them. Reducing choice will increase code/tool quality.

I think that we should make the uncommon choice reallllly hard to put into place. Make it a pain to configure the GC, give specific integer types really long names. Just stop people from premature optimization and leave these tools to people who know what they're doing.

TeMPOraL · on Dec 3, 2014

I guess that most programming languages, with possible exception of COBOL, were not created with boring stuff - like enterprise-y apps or webdev, that compose most of the programming today - in mind.

danbruc · on Dec 3, 2014

If you are unable to make good decision between different number types you better don't write software, IMHO. How do you reason about the operations you apply to your data if you are ignorant of the possible values?

coldtea · on Dec 3, 2014

>If you are unable to make good decision between different number types you better don't write software, IMHO.

Doesn't sound very humble, this "HO".

Do you know how many programmers that run circles around you (and me) have done mistakes similar to what Doug describes?

danbruc · on Dec 3, 2014

Being able to make good decisions does not guarantee you will never make a mistake. And he even explains his reasoning behind his choice and it was a justifiable decision when he made it. But I stand by what I said - if you are unable to decide between using 8 bit or 64 bit integers or a floating point type or a decimal type you should not be a professional software developer because it is really a very basic and fundamental skill.

onion2k · on Dec 3, 2014

In the web industry, which is the biggest part of the software industry by people employed, most developers use dynamically typed languages and very happily never make this "fundamental" choice. Should they all resign?

markokrajnc · on Dec 3, 2014

Not only web industry: in Smalltalk every integer can have unlimited number of bits. On 32-bit systems internally 31 bit is used first and with overflow it automatically starts using variable size integers... There is no need to make a choice on the number of bits... Just use integers as they are in the nature...

danbruc · on Dec 3, 2014

Dynamically typed does not mean that you don't have to think about number types - 3/2 and 3/2.0 may yield different results in dynamically typed languages.

And how many web developers don't know about or are unable to decide between different number types? JavaScript type coercion is such a mess, how could you get away without thinking about types, even though number types are usually not an issue?

berdario · on Dec 3, 2014

Thankfully Python fixed that discrepancy about 3/2

Anyhow, I don't agree with you: I think that getting a bug years down the road due to a too small numeric type is something that the programming language itself should avoid.... not because "developers don't ought to know it", but because mistakes happen

Anyhow, even with a dynamically typed programming language like Python or Javascript you can care about the size of your numbers.

Just import the array module (in Python) or use the Int32Array/Int8Array/etc. types (in Javascript)

benaston · on Dec 3, 2014

"JavaScript type coercion is such a mess" - in what way? Seriously.

danbruc · on Dec 3, 2014

In what way it isn't? Just look at this beauty [1] and its consequences. But probably nobody uses the equality operator anymore because it is such a mess. Less than is even more messed up [2].

[1] http://people.mozilla.org/~jorendorff/es5.html#sec-11.9.3

[2] http://strilanc.com/visualization/2014/03/27/Better-JS-Equal...

nfm · on Dec 3, 2014

`" " == 0 == false`, and `"" == 0 == false`, but `" " != ""`.

`"Infinity" == Infinity`, but `"true" != true`.

I could go on for a while like this.

protester · on Dec 4, 2014

Well, your first line doesn't make sense, since the first two comparisons are using type coercion but the third isn't, so it's like saying `0 == false` but `0 !== false` how can this be? (" " != "" is the same as " " !== "")

The second line at least uses type coercion, but still you are making the wrong assumptions. true could be coerced to many strings 't', '1', 'true', 'yes', 'on', but they chose to use '1' (You may not like it but I think it's a good choice). Infinity on the other hand has not many choices when coercing it to a string I can think of '∞' (which is difficult to type), 'Infinity', and maybe 'Inf.' so I think they made a good choice here.

I'm not saying type coercion in js has no problems, but you said that it's a mess and I just think you chose the wrong examples.

benaston · on Dec 3, 2014

That is not specifically type coercion, but the behavior of the equality (==) operator in JavaScript. Type coercion in JavaScript can be very useful, for example !!('foo') is coercive and easily understood (and has an unsurprising result). Thankfully the == operator is completely optional and has a more easily understood === (identity) correspondent, making your point less about a language and more about a specific operator within a language.

nfm · on Dec 4, 2014

It is type coercion. The runtime converts both operands to the same type before making the comparison (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...). This is also true of !, <, and friends.

Other operators in JavaScript may behave more intuitively than ==, but I don't think you can really make a good case for JavaScript's type coercion being 'unsurprising'.

benaston · on Dec 4, 2014

I suppose what I meant was that you point out that the coercive behavior of some operators is counter intuitive; ergo type coercion in JS is a mess. I dont think that follows because some operators behave in a coercive fashion more convenient than many other languages.

coldtea · on Dec 3, 2014

And which of those persist if you use ===?

tragic · on Dec 3, 2014

None - because === does not coerce types. Type coercion, not double-equal weirdness as such, is the gripe of the GGP. Non-transitive, inconsistent equality is nothing more than a symptom of JS's rules for implicit conversions.

Language-war disclaimer: I love javascript and everything, it's a very expressive language; but a good wodge of the tooling around it nowadays is to help people avoid things like implicit type-coercion 'surprises'.

sysk · on Dec 3, 2014

Then what do you propose we call the people who write Ruby, Python, Javascript, PHP, etc. code for a living? Have you heard of accidental complexity?

danbruc · on Dec 3, 2014

Ruby, Python and PHP all support several types of numbers. But even if they did not this would not preclude people developing in those languages from being able of making such decisions.

rtpg · on Dec 3, 2014

I write software, and the amount of times I even have to explicitly manipulate numbers in a given week is very close to zero. Even iteration is done through iterators instead of indexes, so writing a plus sign is done pretty sparingly.

Data manipulation beyond "pull out of database" or "submit user input to database" is a lot rarer in enterprise software like this than in scientific computing. I'm not saying it's bad to be aware of it, but software is more than numbers.

danbruc · on Dec 3, 2014

I develop enterprise software, too, and I definitely think it matters there, too. You better make sure your database columns have the correct number type or you will get in trouble if your inventory numbers or monetary values start showing rounding errors.

ionforce · on Dec 3, 2014

You are comparing two different types of numbers.

What was originally compared was different sized integral types.

What you are comparing is two numeric types that have a large semantic gulf (fractions vs integers).

So your point is disingenuous in the context of the former.

danbruc · on Dec 3, 2014

I just wanted to address your point that enterprise software does usually not involve dealing with numbers.

When it comes to integers you are right - signed and 32 bits is a viable choice in north of 90 % of all cases. And when I wrote you should be able to make good decision about the number type to use I was already thinking of all the number types, however I did not express this well. But then I really don't see a lot of difference between being able to choose between integer, floating point and decimal types on the one hand and various integer types on the other hand.

rtpg · on Dec 3, 2014

Yeah, you're right, data types matter. Like sibling said, this was more in response to signed/unsigned or different bit sizes. We could get rid of minutiae while still allowing for broad choice when it actually matters.

I do think that the difference between Integer/Fractional is important, but honestly if you're dealing with money you should be using some Money datatype that's smart about this instead of raw numbers.

cbhl · on Dec 3, 2014

Even if you abstract away the indexes, when working with datasets that large you have to worry about whether the standard implementation makes the same class of error.

For example, using Java's binarySearch on arrays of length over 2^30 was broken until 2006[0][1].

[0]: http://googleresearch.blogspot.com/2006/06/extra-extra-read-...

[1]: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=5045582

philosophus · on Dec 3, 2014

I don't believe he said it makes no difference.

danbruc · on Dec 3, 2014

»But in todays CPUs there is no advantage using the short thing. You can add 64 bits or 8 bits, takes the same amount of time. And you look up what is the cash value of having saved seven bytes on a number. When you add that up it is zero. So there is no benefit.« [1]

As long as you are concerned with adding a bit of eye candy and interactivity to a web page this may be true enough to get away with the JavaScript way of making every number a double precision floating point number but there a other domains where this will not fly. And even in the world of JavaScript asm.js is trying hard to overcome this limitation.

[1] http://www.youtube.com/watch?v=bo36MrBfTk4&t=42m17s

coldtea · on Dec 3, 2014

>but there a other domains where this will not fly.

Sure, and I believe everybody, including Doug, knows this not so subtle distinction. He wasn't talking about programming HPC for NASA.

danbruc · on Dec 3, 2014

He is comparing Java and JavaScript and implies that it is a bad choice of Java to offer several options. And given the broad range of applications Java is used for I don't think this is a justifiable opinion.

diroussel · on Dec 3, 2014

But in java (64-bit JVM) this statement is true: "You can add 64 bits or 8 bits, takes the same amount of time."

gioele · on Dec 3, 2014

Interesting, this is similar to the discussion going on for "int" in Rust (or the exact opposite, depending on how you view it). [1, 2]

One the one hand "implicit int" are being phased out in favour of explicit int size. Your variables cannot be `int` anymore, you have to sit down, think and choose: u8? u16? u32? i32? u64? i64? This avoids all the pains of programs behaving differently or crashing when compiled on different architectures.

On the other hand, a new "native integer for sizes that do not matter, minimum 32 bits" is being brewed, for example for pointer offsets or collection sizes. The idea is that you will not be able to have a collection with more of 2^32 elements in a 32-bit architecture nor more than 2^64 in a 64-bit architecture.

After this discussion, my hope is to see the introduction of a fast-ish dynamic bigint (that starts native and grow up to 256 or 512 bits) that can be used in all the cases where you do not care about the exact size type, yet you want to be future-proof (this `private int index` fits this case, IMO).

[1] http://discuss.rust-lang.org/t/if-int-has-the-wrong-size/454... [2] https://github.com/rust-lang/rfcs/pull/464

Narishma · on Dec 3, 2014

Wouldn't that limit rust to 32-bit or more architectures?

tinco · on Dec 3, 2014

It would affect the standard library of rust, not rust itself I think. Second, the word-size matching the architecture has only a slight performance impact, i.e. operations on a 64-bit word on a 32-bit architecture take more cycles/instructions than 64-bit words on a 64-bit architecture.

It's only important for things like pointers that they match the size of the addressing space, and not even that is a very hard constraint, just a very convenient one.

maxlybbert · on Dec 3, 2014

16 bit architectures are weird enough that a lot of tools don't support them. And 8 bit architectures are basically only used for things like Arduino nowadays.

However, using a 32 bit number to store values that will never be larger than 16 bits isn't that bad, it's just very slow.

robert_tweed · on Dec 3, 2014

I'm guessing (since this is Doug Crockford talking about JSON) that this was in reference to how JavaScript does things differently, in that it just stores everything as floats, which are quite capable of representing integers within the 32-bit range anyway.

However, an overflow to floating point isn't necessarily an improvement because, while a float will hold bigger numbers, it does so with limited precision and sometimes that lack of precision will cause bugs too. Probably more often, in fact.

In the example given it wouldn't be so bad, but you'd only get an approximate indication of where the error occurred rather than a specific line/character. So of course, whatever is reporting the error would now need to understand and handle the much more complex scenario of "fuzzy" location information instead of a simple unique index to a specific character. Depending on what it then needs to do with that information, the complexity could spiral from there.

If you want to just have things work no matter what, you have no choice but to use bignums. I was wondering about this recently, so did some benchmarks in Clojure. The performance was horrible, so frankly this is still not a viable alternative. Maybe in 10 years time, if every CPU has a bignum coprocessor by then.

Also, there are times, particularly in low-level graphics programming or cryptography, where you actually want integer modulo arithmetic, or to be able to do bitwise booleans predictably. In those cases, JavaScript-style loose typing can be a huge pain.

BTW, I've been a big advocate of JavaScript for about as long as Doug Crockford, so my point isn't that JavaScript-style type handling is bad: just that it's very far from a silver bullet.

cstavish · on Dec 3, 2014

>> Java gave me a choice that I didn't need, and I made the wrong choice

What if Java gave an arguably more useful choice--whether to use a signed or unsigned integer?

whyever · on Dec 3, 2014

If you use an unsigned 32 bit integer instead of a signed one then you run into problems at 4 instead 2 gigabytes.

takeda · on Dec 3, 2014

But who would have JSON files bigger than 4GB? ;)

guard-of-terra · on Dec 3, 2014

We do have a 18GB xml (daily). I guess it will turn to 10 GB json when we convert it over.

Why is it even a question? Imagine a JSON of all the people on Earth.

jmtulloss · on Dec 3, 2014

The parent is joking. "Why would you ever need X?" is a classic question that bites every software engineer at some point. Somebody always needs X.

xatax · on Dec 3, 2014

4194304 K ought to be enough for anybody.

onion2k · on Dec 3, 2014

Not many people. But in another 12 years?

vbezhenar · on Dec 3, 2014

While negative numbers very often provoke exceptions like IndexOutOfBoundException, with unsigned integers error could be uncaught for much longer time. I'm all for signed integers, unless storage requirements so tight that you really need that one bit.

on Dec 3, 2014

[deleted]

danbruc · on Dec 3, 2014

But only because asm.js works hard to make JavaScript differentiate between integers and floating point numbers.

on Dec 3, 2014

[deleted]

danbruc · on Dec 3, 2014

This is a decision a compiler can never make in a reliable way because it entirely depends on the actual input and is not known until runtime. You may get away with dynamic recompilation when you realize that the input is not what you assumed when you compiled the code but I really doubt that this is a smart and efficient way to go about it.

And asm.js is no evidence for the no side - the information is in the original source code, it does matter and asm.js works around the JavaScript limitation and makes this information available to the JavaScript compiler.

tdsamardzhiev · on Dec 3, 2014

If you are working with files bigger than 2GB, hoping they're smaller than 4GB is NOT a good habit.

And I certainly don't believe there are programmers making only 1 mistake for 12 years. I believe he's just making a joke, or using the example as a means to an end.

talles · on Dec 3, 2014

I loved how he ended

  *Java gave me a choice that I didn't need, and I made the wrong choice*

lerchmo · on Dec 3, 2014

What if someone else needs that choice? like anyone doing compression?

aidenn0 · on Dec 3, 2014

Make the default numeric type effectively unbounded, and allow those who need it to choose more compact types when needed. This is what many languages do, and it is possible to both generate efficient code when needed, and correct code is more likely to happen.

phillmv · on Dec 3, 2014

Let those people eat cake, and in the meantime protect me from my own misbehaviour.

lukeschlather · on Dec 3, 2014

When you're storing an integer, ideally "MAX_INT" is not something you have to worry about.

If you're doing compression you should use some sort of raw bytes type.

At least I think that's the platonic ideal.

batuhanicoz · on Dec 3, 2014

Reference for comment formatting can be found here: https://news.ycombinator.com/formatdoc

Tepix · on Dec 3, 2014

He's pushing for DEC64 now (http://DEC64.com)

peterashford · on Dec 5, 2014

I'm constantly amused how Java is supposedly stupid for protecting programmers from things they shouldn't do and yet also stupid for not protecting programmers from things they shouldn't do.

IMHO Java makes some choices about safety. If you don't agree with those choices, use a different tool. It doesn't make Java wrong for having a different opinion. Likewise I wouldn't berate C for being too low level or Ruby for favouring readability over performance.

lmm · on Dec 3, 2014

So for that kind of software you use Python. Doesn't everyone know that? Java gives you that choice because it's for the kind of software where you need that choice.

ColinWright · on Dec 3, 2014

This is a FAQ. Oddly enough, the "FAQ" link at the bottom of the page takes you to the FAQ, in which it says:

========

What kind of formatting can you use in comments?

http://news.ycombinator.com/formatdoc

========

Was that what you were looking for?

ChuckMcM · on Dec 2, 2014

The interesting meta-point though is that an audience of 20 million viewers is a big hit [1] so a billion views is 20M people watching it 50 times or, 200M people watching it 5 times. And 2 billion views is double that.

Put in perspective that is probably in excess the number of times the most favored "I Love Lucy" show has been seen. Or put another way, you've got a music video with the same eyeball impact as the highest rated television show ever.

That says to me that either advertising on Youtube is a bargain or advertising on TV is way over priced :-)

[1] http://tvbythenumbers.zap2it.com/2014/02/10/the-walking-dead...

[2] http://en.wikipedia.org/wiki/I_Love_Lucy

derefr · on Dec 2, 2014

Or advertising on TV seriously under-represents the total number of impressions over time through alternate consumption streams. Right now, supposedly "unpopular" shows are cancelled, and then immediately get a successful Kickstarter from what turns out to be millions of fans who happened to be watching only through Netflix, or iTunes, or DVD box sets.

(Of course, none of these streams show the same ads the original broadcast does—but if you're a clever ad agency, you're already doing product-placement instead of interstitials most of the time anyway.)

hsod · on Dec 3, 2014

> Right now, supposedly "unpopular" shows are cancelled, and then immediately get a successful Kickstarter from what turns out to be millions of fans who happened to be watching only through Netflix, or iTunes, or DVD box sets.

Can you name any examples of this?

The closest thing I can think of is Veronica Mars which was Kickstarted many years later and raised ~5 million dollars from 91,000 backers to make a single movie.

I think perhaps the "alternate consumption streams" viewers are not as lucrative as you think.

wmeredith · on Dec 3, 2014

The Firefly series got enough support (in the form of written letters - this was pre-kickstarter) to be made into movie after a comically botched distribution through normal channels (The first seasons episodes were aired out of order in random time slots on Fox. It never had a weekly time that was consistent. This was the only season, natch.)

Family Guy also had a similar fate, but not because of a botched launch, but because its audience existed, but did not consume television through mainstream sources. It was canceled after 2.5 seasons and then went on to become the best selling animated DVD series. Fox brought it back the next year.

coldtea · on Dec 3, 2014

>The closest thing I can think of is Veronica Mars which was Kickstarted many years later and raised ~5 million dollars from 91,000 backers to make a single movie.

That everyone regretted backing.

SnacksOnAPlane · on Dec 3, 2014

I absolutely did not regret backing it. Although admittedly I'll watch pretty much anything Kristen Bell is in.

akx · on Dec 3, 2014

[citation needed]

coldtea · on Dec 3, 2014

Or you can just watch the movie.

sesqu · on Dec 3, 2014

I read a few reviews of the movie when it came out. All were positive.

The movie may not be good (my personal opinion), but clearly it met expectations.

wmeredith · on Dec 3, 2014

It was fan service. I think it pleased all the backers. As a standalone, it was not a great movie, but I don't think it was supposed to be.

ethbro · on Dec 3, 2014

> (Of course, none of these streams show the same ads the original broadcast does—but if you're a clever ad agency, you're already doing product-placement instead of interstitials most of the time anyway.)

I think you just backdoored into the most interesting ad campaign ever: 1) Find a show with a directory / writer / production team known for producing content that "stands the test of time" (e.g. likely to have a high total_views_over_time:broadcast_views ratio) 2) Include product placement for a non-existent product by a currently-existing company with strong brand recognition 3) Test response to non-existent product by initial viewers 4) Start viral campaign around non-existent product (this likely favors "Hunh?" shows a la Lost or Fringe) 5) Trigger view bump in show (win award, produce new episodes in partnership with Netflix, produce new movie, etc.) 6) Launch real-product multiple years after initial product placement

tedunangst · on Dec 3, 2014

I think you're missing a unit. You should be measuring eyeball-minutes. An episode of The Walking Dead might be 20M x 45min = 900 megaeyeball-minutes. Gangnam Style is 2B x 3min = 6000 megaeyeball-minutes. Disregarding target demographics for the moment, that says the advertising spend for a first run episode of Walking Dead should be about equal to 15% of the lifetime spend for Gangnam Style.

MichaelApproved · on Dec 3, 2014

You're equating two things that have different lengths of attention which require different attention spans. They're also different in how the audience viewed that content. That gives the advertiser a different experience with the viewer.

For example, with I Love Lucy, the audience member likely sat and watched the entire commercial. With a YouTube video, the audience member can skip the ad or move on to other content.

TV = 22 minutes of content.

YouTube Video = 3 minutes of content.

Plus, the metrics that constitute views between the two media formats are completely different.

TeMPOraL · on Dec 3, 2014

Advertising on YouTube is even more stupidly annoying than on TV. Fortunately, we have AdBlock :).

bjz_ · on Dec 3, 2014

I watch lots of independent channels, and I purposefully turn AdBlock off for Youtube. :)

TeMPOraL · on Dec 3, 2014

I'd probably do the same if I had similar watching habits :). Right now I mostly use YouTube for either a particular search result or just to play some music that's not on Spotify, and having to listen through a minute of advertising to watch a three minutes long video is a bit anger-inducing.

eitally · on Dec 3, 2014

And god knows how many times the top music videos on YT are played at parties & other semi-public events! Heck, as the parent of young children, I've probably watched things like Gangnam Style >20 times just within my house.

davedx · on Dec 3, 2014

Indeed! There's a cartoon rabbit for small kids here in the Netherlands called "Nijntje", and there are a few "official Nijntje songs" on YouTube. Our 1 year old daughter's favourite is this: https://www.youtube.com/watch?v=20J8DUJMgA4&app=desktop "Nijntje dansles" - it has 12 million views, and there are only 20 million people in the Netherlands, total!

This song has been played many times by a relatively small section of the Dutch population :)

sliverstorm · on Dec 3, 2014

That assumes YouTube eyeball count is of equal value to TV eyeball count though, right? Which doesn't seem like something we can assume- YouTube's targeting doesn't seem great, and there are plenty of other things to do on a computer while you wait through the ad.

cloudwalking · on Dec 3, 2014

YouTube targeting is a lot more accurate than TV targeting...

spyder · on Dec 3, 2014

Maybe YouTube ads aren't the best, but the ads on the Internet can have much better targeting and performance tracking than TV ads.

>there are plenty of other things to do on a computer while you wait through the ad.

Yea, for example you can buy the advertised product with a few clicks. If you are quick enough then you can finish the buying even before the video ad finishes (sure it's not the most realistic scenario but it's possible). Or with a quick search you can learn more about the product to check how honest is the ad. TV ads cannot compete with this efficiency. The only thing TV ads can do better is reaching bigger and the less tech interested audience.

snowwrestler · on Dec 3, 2014

Advertising on YouTube is a bargain. Please don't tell anyone!

jickmagger · on Dec 3, 2014

> That says to me that either advertising on Youtube is a bargain or advertising on TV is way over priced :-)

NEITHER. Advertising on YT is worthless and yes it used to be way overpriced on TV.

xanderjanz · on Dec 2, 2014

Should have gone with unsigned ints, YouTube!

EDIT: Which is the solution they apparently implemented, converting signed to unsigned at some higher layer.

timothya · on Dec 2, 2014

From the Google C++ Style Guide:

"You should not use the unsigned integer types such as uint32_t, unless there is a valid reason such as representing a bit pattern rather than a number, or you need defined overflow modulo 2^N. In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this." [0]

[0]: http://google-styleguide.googlecode.com/svn/trunk/cppguide.h...

nly · on Dec 3, 2014

Which is a completely birdbrained policy given that signed integer under and overflow is completely undefined. If you want to catch implicit signed -> unsigned conversions then enable that warning on your compiler.... what they'd advocating is just dangerous.

rmrfrmrf · on Dec 3, 2014

In a strict typing environment, the other major issue is that int is cross-platform and forward compatible whereas uint32_t, uint64_t, uint8_t, uint16_t, etc. will all always be unsigned within a specified bound, so whenever we have 128-bit or 256-bit registers, we'll have to go back and update all this code that effectively "optimizes" 1 bit of information (nevermind the fact that int is usually more optimized than uint these days).

Furthermore, casting uintx_t to int and back again while using shared libraries is a huge pain in the ass and can waste a lot of programmer time that would be better spent elsewhere, especially when working with ints and uints together (casting errors, usually in the form of a misplaced parenthesis, are pretty small and can take a very long time to find).

kevinnk · on Dec 3, 2014

> int is cross-platform and forward compatible whereas uint32_t, uint64_t, uint8_t, uint16_t, etc. will all always be unsigned within a specified bound, so whenever we have 128-bit or 256-bit registers, we'll have to go back and update all this code

uintN_t (and intN_t) are MORE portable and cross platform than int in the sense that you get much better guarantees about it's size and layout.

Furthermore, int is NOT the size of the register (x64 commonly has an int of 32 bits) so any updating you'd have to do to uintN_t, you'd have to do to int as well. Regardless, I can't imagine why you'd need to do any updating in the first place - it's perfectly valid to stick a uint32_t in a 64 bit register.

> nevermind the fact that int is usually more optimized than uint these days

Where are ints more optimized than uint? Not in the processor, not in the compiler (modulo undefined behavior on overflow) and not in libraries.

sjolsen · on Dec 3, 2014

> so whenever we have 128-bit or 256-bit registers, we'll have to go back and update all this code that effectively "optimizes" 1 bit of information

This is why we have uint_least8_t and friends. In fact, int is really just another int_least16_t.

> Furthermore, casting uintx_t to int and back again while using shared libraries is a huge pain in the ass and can waste a lot of programmer time that would be better spent elsewhere

Could you give an example? It sounds like you're just talking about performing the casts, which shouldn't take much effort at all as indiscriminately as C casts about integral values.

mikeash · on Dec 3, 2014

You don't have to update code. If 64 bits was enough on a 64-bit CPU, it'll be enough on a 128-bit CPU. The one exception is when dealing with quantities that actually depend on the bit width of the CPU, like dealing with array sizes. The language already has good types for this, like size_t, and using int won't save you. (Quite the contrary, int will sink you, because int is almost always 32 bits even on 64-bit systems.)

dwd · on Dec 3, 2014

I had my first nasty production bug (back in the early 2000s) when I assumed an Integer was 32bit in VBScript.

2 billion survey results was never going to happen. 32,767 would have been fine as well except to compound the issue ops pointed the production site at the test database.

astrange · on Dec 3, 2014

Are your choices between variable-width "int" and fixed-width "uint_x"? After all, in C you can just declare something "unsigned" and it's the width of int.

However, I think this is a problem. The expected value ranges of your variables don't change just because your memory bus got wider - maybe you can use more than 4GB memory in a process now, but it's a mistake to plan for single array indexes being more than 32bit.

If you do try to be more flexible, I'm sure this would introduce more bugs than the forward-compatibility it'd add. Especially if 'int' is smaller than on the platform you tested on. That's why languages like Swift, Java, C# always have 32-bit int on every platform.

> casting errors, usually in the form of a misplaced parenthesis, are pretty small and can take a very long time to find

Agreed, but writing casts also adds unwarranted explicitness. What if someone made a typo and put the wrong type in the cast? How do you tell what's right? What if you change the type of the lvalue or the casted value? Now you have to think about each related cast you added.

What's the alternative? Well, the compiler should just know what you mean…

lilyball · on Dec 3, 2014

Int is not cross-platform and forward-compatible. It's implementation-defined, so it's up to the compiler. Practically speaking, every modern compiler defines int as 4 bytes, and can be expected to never change that (because of the vast swaths of bad code out there that is written with the assumption that an int is 4 bytes). So it's not forward-compatible. And while on most platforms you can expect the compiler to have picked 4 bytes, it's certainly possible for compilers to pick other sizes for int (I would assume compilers for embedded architectures might do that), which means it's not cross-platform either.

nly · on Dec 3, 2014

How is 'int' cross-platform and forward compatible? The size and range of int is implementation (meaning compiler and CPU ABI) defined.

desdiv · on Dec 3, 2014

The size of int is implementation dependent, but its minimal range isn't. If I'm representing integer quantities between -32767 and +32767 with int, then it will work reliably across all platforms and compilers that's C99 complaint. I believe that's what GP is referring to.

astrange · on Dec 3, 2014

"Completely undefined" is a good thing, because it's a strict line between good and bad (good and evil?). So, now that you know all integer overflows are bad, you can:

* dynamically test your program with ubsan to be sure they really don't happen, and then

* let your compiler optimize with the knowledge that integers won't overflow.

This last one eliminates maybe half the possible execution paths it can see, and loop structure optimizations practically don't work without it.

On the other hand, unsigned overflows? Some of those are bad, but some are fine, right? How will an analyzer know which is which?

Some notable libraries like C++ STL want you to write loops with unsigned math (size_t iterations), but those people invented C++, so why would you trust them with anything else?

nly · on Dec 3, 2014

Ubsan won't catch signed integer overflow unless you happen to hit the overflow case during your tests. Relying on dynamic analysis to catch errors you should have avoided statically is shoddy.

astrange · on Dec 4, 2014

It's certainly less complete, but it's a little harder to decide what you want to prove statically.

If a function must-overflow the optimizer (hopefully) replaces the entire thing with an abort under ubsan, so you could look for that. But that's probably not sensitive enough.

And if the function is just 'x + 1' that may-overflow, but it's not important.

Maybe you want this: http://pdos.csail.mit.edu/papers/stack:sosp13.pdf

yongjik · on Dec 3, 2014

To be fair, even though unsigned integer overflow is very well defined, it's most certainly NOT what you want when used as an index or counter of anything.

coolgeek · on Dec 3, 2014

From the coolgeek style guide:

"Never use a signed type for a number that can never be negative"

One of my pet peeves is developers using int (instead of unsigned ints) for primary keys in database tables.

sytelus · on Dec 3, 2014

+1. Everytime I see for(int i=0;...;i++) I wonder why we have developed this habit of defaulting all int as signed and consider uint as taboo (most coding guidelines asks not to use them unless "you know what you are doing"). Most of the time we use integers for counting and so uint should have been more natural. I did this in one of my libraries I was writing from scratch and I was happy for a while but then I got in to trouble because there is lot of code out there with interfaces expecting signed ints even though they should using uint. So ultimately the legacy forced me back to default again at using signed int.

TeMPOraL · on Dec 3, 2014

> I wonder why we have developed this habit of defaulting all int as signed and consider uint as taboo (most coding guidelines asks not to use them unless "you know what you are doing").

I'm pretty sure that it's just because "int" is one word and "unsigned int" is two, plus more than twice the characters. I suspect if "int" defaulted to "unsigned int" and you'd have to specify signed ints explicitly, the taboo would be reversed.

Never underestimate the power of trivial inconveniences.

yongjik · on Dec 3, 2014

Well, in my case, from time to time I have to do these stuff:

    for (int i = x.size() - 1; i >= 0; i--) ...
    for (int i = 0; i < x.size() - 1; i++) if (x[i] < x[i+1]) ...

Both will blow up badly with unsigned ints.

(Well, to be fair, both will blow up with signed ints if x.size() is greater than 2G, so it's a matter of expectations.)

detrino · on Dec 3, 2014

Forget about for statements for a second and let's write both a counting up and a counting down loop using while statements.

    // count up
    std::size_t i = 0;
    while (i != 10)
    {
        std::cout << i << "\n";
        ++i;
    }

    // count down
    std::size_t i = 10;
    while (i != 0)
    {
        --i;
        std::cout << i << "\n";
    }

After initialization a for statement repeats "test; body; advance", this is ideal for counting up loops, but what we need for counting down loops is "test; advance; body". Since C/C++ do not provide the latter as a primitive you have to use a while loop as shown above. Using a signed integer to shoehorn a counting down loop into a for statement at the cost of 1/2 your range is a hack IMO. Note that when working with iterators you have to resort to a while statement as iterating past begin is UB.

Aldo_MX · on Dec 3, 2014

  for( size_t i = x.size(); i-- > 0; ) ...

delinka · on Dec 3, 2014

Now i is no longer an index, it's index-plus-one.

Aldo_MX · on Dec 3, 2014

Nope, it's still an index, after the comparison and before executing the code inside the loop, i is decreased by 1.

Remember that a for loop does something like this behind the scenes:

  size_t i = x.size();
  while( i-- > 0 )
  {
    // your code which needs backwards iteration here

    ;  // do nothing because there is no third statement
  }

shdon · on Dec 3, 2014

No, it's not. The check is done before the loop, so i has the correct value inside the loop.

Still, the trick makes it look suspect and that's an argument against using it.

Aldo_MX · on Dec 3, 2014

> Still, the trick makes it look suspect and that's an argument against using it.

This is true. The code is confusing to people not used to it. A workaround could be to hide this code inside a macro, so people not interested in digging into the code would take the macro's word:

  #define REVERSE_LOOP( x, i ) for( size_t i = x.size(); i-- > 0; )

But unfortunately, that doesn't help with the fear that people has against unsigned types.

stinos · on Dec 3, 2014

I have to do these stuff: for (int i = x.size() - 1; i >= 0; i--) .

That yields compiler warnings for signed vs unsigned then, no?

dkbrk · on Dec 3, 2014

> Everytime I see for(int i=0;...;i++)

In this case, it makes absolutely no difference at all. It could be argued that writing unsigned int would make the code slightly harder to read. That said, I like to use stdint.h and unint32_t would, I think, not have any drawbacks.

> there is lot of code out there with interfaces expecting signed ints even though they should using uint

That's not a good reason to not use unsigned integers, it's a zero-overhead cast from unsigned to signed (at the risk of overflowing into the negative).

Joky · on Dec 3, 2014

It change the semantic on the loop bound, and thus what the compiler can/cannot do when optimizing the code.

Using uint limits the optimizer...

detrino · on Dec 3, 2014

Using int also limits the optimizer in some ways, for example division/modulo becomes more expensive.

Every time I see someone mention the optimization argument for signed integers I ask for examples and I've yet see a good one.

10098 · on Dec 3, 2014

yeah, some numbers can never be negative, but their difference can. and that's when it usually comes to bite me in the ass. i almost never use unsigned ints now.

nly · on Dec 3, 2014

I disagree, signed integer arithmetic in C and C++ is just toxic. Sure, if you need to compute the difference between two integers, which have both been pre-checked to lie between say -100 and +100, then fine, use signed ints... but for arbitrary input you need to do more work.

There's example code on the CERT secure coding guidelines here (look under 'Substraction'):

https://www.securecoding.cert.org/confluence/display/seccode...

Writing safe code to calculate the absolute difference between two unsigned integers is much less hairy: max(x,y) - min(y,x).

sjolsen · on Dec 3, 2014

All arithmetic in C and C++ is toxic. That's the reality of using bounded-precision types. Honestly, I wish they'd had the foresight not to use the traditional infix operators for built-in types; they practically beg programmers to implicitly treat built-in types like the mathematical types they very vaguely resemble.

Really, working directly in fixed-precision arithmetic is absurd. In order to be able to rely on its correctness with any degree of certainty, you need to very carefully track each operation and its bounds, at which point you may as well have just used arbitrary-precision types, explicitly encoded your constraints, and had the compiler optimize things down to scalar types when possible, warning when not.

10098 · on Dec 3, 2014

the funny thing is that fixed precision arithmetic is used literally everywhere and it just works. i'd say it's good enough for most practical purposes.

sjolsen · on Dec 3, 2014

It is not used literally everywhere. It does not always "just work," as the original post demonstrates. It often happens to be good enough for most practical purposes, yes, but arbitrary-precision arithmetic is better for most practical purposes.

Fixed-precision arithmetic has one main advantage over arbitrary-precision arithmetic: it is more time- and space-efficient. This advantage only applies if the fixed-precision arithmetic is actually correct and the fixed-precision arithmetic meets some concrete time or space constraint which arbitrary-precision arithmetic fails to meet. It generally takes time and effort to demonstrate that these conditions hold; because one can rely on the correctness of arbitrary-precision arithmetic without doing so, arbitrary-precision arithmetic should then generally be the default choice.

This assumes that you care about making relatively strong guarantees about the correctness of your programs. If for some reason you don't, then sure, use ints and whatnot for everything. If you do, though, I suspect you'll find that it's easier to track down a performance bottleneck caused by using bignums than an obscure bug triggered by GCC applying an inappropriate optimization based on overflow analysis.

rtpg · on Dec 3, 2014

do these problems disappear with unsigned arithmetic?

Joky · on Dec 3, 2014

This is true for a signed addition as well, since you are not allowed to overflow.

TorKlingberg · on Dec 3, 2014

There are two schools of though here, and I am not convinced either is obviously right.

1) If you don't need negative numbers, use unsigned integers.

2) If you don't need the extra positive range of unsigned integers (or defined wrapping), use signed.

You advocate (1), but C is generally based on (2), with the default int being signed, and many standard functions using plain int.

einhverfr · on Dec 3, 2014

PostgreSQL doesn't give you an unsigned int option but if they did I wouldn't use it.

Having a negative pkey space is actually useful. In LSMB we reserve all negative id's for test cases, which are guaranteed to roll back. This has a number of advantages including the ability to run a full test run on a production system without any possibility of leaving traces in the db.

dragonwriter · on Dec 3, 2014

Most DBs don't support unsigned int [0] as a type (though its perfectly sensible to have a constraint that enforces >0.)

[0] though several do support UUIDs, which are essentially unsigned 128-bit ints, and which (with a well-selected generation mechanism) are better as server-assigned surrogate keys than sequential integers, signed or unsigned, anyway.

ANTSANTS · on Dec 3, 2014

Every time I see

  if (index < 0) { /* error */ }

I die a little inside.

tiglionabbit · on Dec 3, 2014

Unsigned ints aren't supported by any sql database.

MrOrelliOReilly · on Dec 3, 2014

Sarcasm?

http://dev.mysql.com/doc/refman/5.0/en/numeric-type-overview...

esaym · on Dec 3, 2014

His sarcasm is saying that mysql isn't a real database since it has data types that break the sql standard I guess.

"SQL only specifies the integer types integer (or int), smallint, and bigint." http://www.postgresql.org/docs/9.3/static/datatype-numeric.h...

tiglionabbit · on Dec 3, 2014

Oh, I wasn't aware that there was a database that supported them. I mostly use postgres and sqlite, which both do not support them.

imanaccount247 · on Dec 3, 2014

>One of my pet peeves is developers using int (instead of unsigned ints) for primary keys in database tables.

Seems like a pretty ignorant pet peeve considering that's the only option for every database that doesn't auto-corrupt data.

mohawk · on Dec 3, 2014

That seems like bad advice to me. A possible infinite loop is given as justification in case of wrongly implemented reverse iteration (counting down an unsigned loop variable). Well, i claim that an infinite loop is a much more noticeable bug than undefined overflow behaviour, negative view counts, etc. Unsigned ints will make bugs impossible that with signed ints will (hopefully, famous last words) trigger assertions, if they are enabled...

Buge · on Dec 3, 2014

One problem with this is that the sizes of STL containers are returned unsigned, and with high warning levels, compilers will warn about comparing a signed int with one of these sizes.

cjensen · on Dec 3, 2014

The size of everything is unsigned. You may not know if size_t is unsigned int or unsigned long, but you can always be sure it isn't int.

blahedo · on Dec 2, 2014

Surely only temporarily, though. I mean, this is an exponential process---adding one bit only doubles the space, and it will not take another nine years before some video passes 4 billion.

sillysaurus3 · on Dec 2, 2014

By that line of reasoning, 15.75 years from now there will be a viewcount greater than 8 billion.

... and now I realize you may be correct, and that it's probably inevitable that a viewcount will not only exceed the total number of people alive, but will double or even quadruple it. Our total population is actually about 100 billion, but only ~7% of us are still alive.

The shadows of the dead will be forever enshrined as YouTube view counts. Our shadows.

trothamel · on Dec 2, 2014

YouTube is slightly under 10 years old. It's interesting how we're willing to project it forward into the far future. Not that i think that's wrong or anything - it's just that it's gone from nothing to essential in a fraction of a human lifetime.

"We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten." - Bill Gates

rattray · on Dec 3, 2014

That's a great quote, thanks for sharing

Retra · on Dec 3, 2014

With playlists and whatnot, you can easily watch a video numerous times, so the max could very easily exceed any population size.

mudge · on Dec 2, 2014

Wow, shit

lazaroclapp · on Dec 2, 2014

Why not just use some sort of unlimited BigNum implementation? Yeah, for small numbers it's still ~2x the size of just storing an int, depending on implementation (or it can be: "int unless MAX_VALUE, in which case bignum is stored somewhere else") and it might be slower to operate on... but, on the other hand, you are already storing and processing a full video for every such counter!

Edit: Now I realize that would mean Google couldn't have made this joke. But I am still not sure this was foreseen by Youtube devs from day one.

xanderjanz · on Dec 2, 2014

Yea according to a reddit post from a Googler this was more of a staged easter egg then a real bug. Google coding styles actually prohibit the use of unsigned integers in C++ code.

fishnchips · on Dec 2, 2014

Interesting, until very recently I was writing a whole bunch of C++ at Google but I can't recall any such restriction. Most internal data structures at Google are expressed as protocol buffers and surely these support unsigned integers. In fact if someone suggested that a view counter should be signed that would invite a snark (kind of Google's specialty) like 'can it ever be negative'?

[One problem with unsigned integers in protocol buffers is that some supported languages like Python have no concept of signed/unsigned integers. Which is why for example Thrift does not support that distinction. This has nothing to do with C++ though.]

haberman · on Dec 3, 2014

> One problem with unsigned integers in protocol buffers is that some supported languages like Python have no concept of signed/unsigned integers.

You can still implement range checks to enforce that the numbers are in the correct domain. It's not that much of a problem.

This is more of a problem in a statically typed language like Java, because it means there is no native data type to map protobuf's unsigned types to. In Python this doesn't matter as much because numbers will automatically promote themselves to bigints.

gohrt · on Dec 3, 2014

Why not both? Fix the bug, then celebrate.

sytelus · on Dec 3, 2014

I suggest someone write a browser add-in that re-plays this 100s of time when machines are idle to do massive distributed viewcount attack and force YouTube upgrade to 64-bit unsigned int now!

tdsamardzhiev · on Dec 3, 2014

Counting on the difference between signed and unsigned is asking for trouble. 64-bit ints would be a better option.

31reasons · on Dec 3, 2014

Hire people who know the difference Google. Math puzzles doesn't solve everything :)

jawedkarim · on Dec 3, 2014

When youtube launched in April, 2005, the initial source code was based on another completely unrelated website that I had worked on before, written in PHP and running on Apache and MySQL. It’s always fascinating how implementations of complex systems evolve.

diroussel · on Dec 3, 2014

What was the original site for?

pavel_lishin · on Dec 3, 2014

Maybe some sort of exchange site for Magic the Gathering cards?

SapphireSun · on Dec 2, 2014

I love that they added an easter egg to the actual video. If you hover over the counter, it briefly shows you the negative overflow value.

https://www.youtube.com/watch?v=9bZkp7q19f0

EDIT: I just realized that YouTube also posted a comment to that effect just below the video. :P

leephillips · on Dec 3, 2014

The interesting question to me is why this particular video is so wildly popular. I don't generally go in for music videos, but I find this one fascinating and have watched it a dozen times. I read an article that tried to explain to non-Koreans like me the meaning of it all, and apparently there are several layers of parody and social satire. I think I love it for its combination of attitude, surrealism, bizarre humor, and self-mockery, plus the music that seems to fit magically.

prawn · on Dec 3, 2014

The explanations of Korean parody/satire are largely irrelevant to its success given its popularity elsewhere, surely? I think it's the bizarre visuals that had it spread (why I tweeted it when it first emerged), then catchiness plus a repeatable dance move. It's the Macarena of its time in that regard.

Being Korean might've given it crossover appeal into much of Asia? Just a guess.

lmm · on Dec 3, 2014

> The explanations of Korean parody/satire are largely irrelevant to its success given its popularity elsewhere, surely?

I think they gave people who would otherwise have looked down on a silly craze an excuse to enjoy the video.

orblivion · on Dec 3, 2014

I love it because, in a world of fake pop musicians, this guy comes off as such a genuine goof. I can't help but like the guy, I'm very happy for him for this level of success on YouTube. And the song is super catchy. He's one of the very few pop musicians I appreciate (though so far, this is probably the only song of his I care for). The political satire makes it all the more compelling. I love the horse riding on top of a sky scraper.

Shivetya · on Dec 3, 2014

in the words of my niece, its fun. She likes the silly man and while the sexual connotations of some things he does might make parents wince, they fly right over her head. She still has that innocence of youth. So why we can enjoy he irreverent humor, the sexual innuendo, she enjoys the silliness at her level. (plus she can do his horse stepping dance)

Aldo_MX · on Dec 2, 2014

Next milestone: 19th January 2038 03:14:07 GMT

ravenkat · on Dec 2, 2014

Ah, So many systems going to fail on that day for using epoch with 32 bit.

Someone1234 · on Dec 2, 2014

Aren't most Linux servers already 64 bit? And we aren't even close to 2020.

I'm sure some software will need to be re-written between now and 2038, but I don't think it will be quite as bad as Y2K just because that was only a 15 year gap (Sometimes less), whereas this is over 24 years.

I just think a lot of software will be naturally replaced between now and then. And while there will be a slight mad scramble to fix stuff at the last minute, I don't think it is Y2K-2.

btilly · on Dec 2, 2014

People who think that 64-bit servers are immune are part of the problem. Even if you've got a 64-bit server, you've still got file formats with 32 bit timestamps embedded. For that reason, time_t remains a 32 bit integer, which means that functions like UNIX_TIME on MySQL will stop working. And then there is the mess of embedded software that most decidedly is NOT 64-bit and will be in machines that are still in operation.

See http://en.wikipedia.org/wiki/Year_2038_problem for a basic overview. None of this stuff is unfixable. But it is a real problem, and tracking it down will be hard.

RadioactiveMan · on Dec 2, 2014

It'll definitely be interesting to see how many 32-bit embedded systems remain in use in 2038 - and what effect the overflow will have on their functionality. https://en.wikipedia.org/wiki/Year_2038_problem#Vulnerable_s...

kevin_thibedeau · on Dec 3, 2014

Just as many as all of the 8-bit systems in use today. There is no need, in the vast majority of cases, for wide data busses in embedded applications. 16-bit is going to die out, though, like the 4-bit and bitslice processors.

0x0 · on Dec 3, 2014

I set the clock to one minute before time_t overflow on an iMac once. Recovering from that and just getting the machine to boot afterwards was no joke.

mtrpcic · on Dec 3, 2014

What version of the iMac was this? Surely modern OSX has already converted to a more appropriate storage mechanism for the date?

q3k · on Dec 3, 2014

I'm not a XNU hacker, but it looks like they haven't. Their time_t typedef seems to be a __darwin_time_t [1], which in turn looks like to be 32-bit signed long [2].

As far as I know, the only major operating system that has dealt with Y2038 is OpenBSD [3].

[1] - https://github.com/opensource-apple/xnu/blob/bb7368935f659ad...

[2] - https://github.com/opensource-apple/xnu/blob/bb7368935f659ad...

[3] - http://www.openbsd.org/55.html , http://www.undeadly.org/cgi?action=article&sid=2013081307224...

justincormack · on Dec 3, 2014

NetBSD fixed 2038 a couple of years before OpenBSD. Made less fuss over it though.

0x0 · on Dec 3, 2014

It was some years ago, I can't remember if it was a 32bit or a 64bit intel. Probably of the OSX 10.6/10.7 vintage.

codezero · on Dec 3, 2014

I did tech support when the 99->00 switch happened, got paid 3x overtime. I got one call, and it was actually legitimate, but was a third party piece of software so after that we left and went to a party :)

I doubt this will be a real problem in 2038, then again the prevalence of computing devices is much larger now and will continue to grow by 2038, but so will technical aptitude, so hopefully they'll cancel out and this will still not be a problem.

robin_reala · on Dec 3, 2014

Same, but 4x overtime here :) I was just on the PC team though so I left at 7pm after finishing the last few BIOS updates; the AS400 and HP-UX teams got the pleasure of staying past midnight.

ge0rg · on Dec 2, 2014

Time to print our "Y2K38 consultant" business cards! :)

rodgort · on Dec 3, 2014

Mea culpa. I can't remember why I didn't fix that when I reloaded the entire schema. At least I widened the video ids.

Animats · on Dec 2, 2014

This is a minor problem. In the 1980s, the number of tradable things with ticker symbols in US markets passed 32767, and some new issues had to be delayed until it was fixed.

jmount · on Dec 2, 2014

Nifty example. Billionaires, trillion dollar budges, billion-view celebrities, fast CPUs, and large memories: all reasons I am done with 32 bit architectures (old article of mine, but only on large memories http://www.win-vector.com/blog/2012/09/i-am-done-with-32-bit... ).

diego · on Dec 2, 2014

32-bit architectures have nothing to do with the size of different data types that have existed forever. We had 64-bit longs in 8-bit cpus.

Also, there are perfectly valid applications that require numbers of 8, 16, 32 or 64 bits (or variable encodings with arbitrary precision). Petabytes, embedded microcontrollers, etc.

jmount · on Dec 3, 2014

Sorry I was unclear. 32 bits architecture can mean a lot of different things (buss sizes, address word sizes, and so on). Mostly I am done with small pointers (having to use segments to address all of your memory, or not being able to memory-map a disk sucks) and small counters (only being able to put signed 32 bit integers into a collection sucks).

agumonkey · on Dec 3, 2014

True, the HP48 pocket calculator was a "4bit" cpu with 64bit fp-able registers.

rkachowski · on Dec 2, 2014

I saw this a few days ago, at first I thought it was an easter egg on youtube's part - saying "so many views we overflow!"

But it's real?! It seems incredibly absurd that it could actually overflow, how are signed values useful for a count of views? How are you going to have negative views?

IvyMike · on Dec 2, 2014

In C/C++, unsigned ints can result in a few very subtle bugs--more details here:

http://stackoverflow.com/a/1555186/67591

TL;DR: Google's C++ coding standard says: "Document that a variable is non-negative using assertions. Don't use an unsigned type."

As someone who has written a lot of C++ code to interact with hardware: I view an unsigned as a bucket of bits, and a signed as a number.

hk__2 · on Dec 2, 2014

It is an easter egg: http://www.reddit.com/r/compsci/comments/2nrjc7/gangnam_styl...

rkachowski · on Dec 2, 2014

Ah i see, i didn't expect Google to back up an easter egg with an associated G+ post

srtjstjsj · on Dec 3, 2014

Usually it's not cool to advertise an easter egg, but when you are a big corporation, you try hard.

spb · on Dec 3, 2014

Say you're comparing the number of views between two different videos as (video_A.views - video_B.views). How do you represent that the second video has more views than the first?

nly · on Dec 3, 2014

    video_B.views > video_A.views

tempestn · on Dec 3, 2014

And what if you want to say how many more? You'd have to first check which one is greater, and then subtract in the right direction. The code is simpler if you just use signed integers.

bobbykjack · on Dec 3, 2014

To say "b has n more views than a", you need two operations whether you use signed or unsigned ints. Signed let's you say "these two videos have n different views" but, to be honest, it seems unlikely that you be doing either in so many places in your codebase, or so frequently during execution, to make the extra operation that significant.

tempestn · on Dec 3, 2014

Yes, in this specific example you still need the same number of operations. My point was, when you just use signed integers, it's easier to conceptualize and harder to screw up. It's more flexible for unforeseen future use cases. Also, this is a very simplified example case; in most scenarios those unforeseen possibilities will be more significant.

Given that switching to unsigned saves you exactly one bit, it's just not usually worth it. How often do you need exactly 32 bits of unsigned space, when 31 isn't enough, and you can't use 63? (I'm talking about standard-length integers here, not extreme situations where you're trying to make maximum use of 8 bits of storage or similar.)

xxxyy · on Dec 2, 2014

Legacy code I guess (from some really ancient times). Also tiny bugs like this make great content for blog posts, just like the 301+ views thing.

lmm · on Dec 2, 2014

Lots of languages don't bother with a separate unsigned type. Maybe the backend is written in Java.

divegeek · on Dec 2, 2014

Java doesn't have unsigned integer types. Google is mostly a Java shop.

hobo_mark · on Dec 2, 2014

That is mostly false.

fishnchips · on Dec 3, 2014

Java is mostly confined to Android and the frontend (think GWT) which do not constitute a large proportion of the Google internal code as far as I can recall.

exo762 · on Dec 3, 2014

If anything, google is mostly c++ shop.

Someone · on Dec 3, 2014

Java's char is unsigned.

antimora · on Dec 2, 2014

It looks like it also broke the formatting on the number of the viewers: "2151501252". This string does not have thousands separators.

Direct link to the video: https://www.youtube.com/watch?v=9bZkp7q19f0

lstamour · on Dec 3, 2014

It's a joke. Hover your mouse over it to see why ;-)

DigitalSea · on Dec 2, 2014

Wow, this is cool. One video was able to exceed a 32 bit integer thus requiring a change to a 64 bit integer, all caused by one man and one video.

return0 · on Dec 2, 2014

Not sure if he did everything all by himself.

Regardless, it was bound to happen sooner or later as youtube is getting older.

acqq · on Dec 2, 2014

It's just a 2 billion limit crossed, 32 bits can count up to 4 billion. Afterwards, they certainly don't have to change to 64-bits, just add a few bits more.

srtjstjsj · on Dec 3, 2014

there's no reason to pick any number between 33 and 63

acqq · on Dec 3, 2014

If you do it at home, for your 10 videos, there isn't. At youtube scale, they certainly can benefit in having different number of bits in storage and transit and in the CPU. Only on the CPU, and only if you actually want to use the value in calculations the 64 bits is best step after 32. See also discussion here: https://news.ycombinator.com/item?id=8691291