Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"I am frustrated by the industry as a whole"

Unfortunately I have to agree as a developer. My job is to make a fast, reliable, stable product but at the same time I'm questioned the tools I use by people who don't have any knowledge but heard the latest trend.

But sometimes it's also very easy to please people. Big data: just insert 10M records in a database and suddenly everyone is happy because they now have big data :|



These discussions are perfect examples of why building good social skills can be more important than learning the next greatest programming language 5.0.


I love you for saying this because it needs to be said.

Thank you.


> But sometimes it's also very easy to please people. Big data: just insert 10M records in a database and suddenly everyone is happy because they now have big data :|

Since when is 10M records is considered big data?

My goto gauge for big data is that it can't fit in memory on a single machine. And since that means multiple TB[1] these days, most people don't really have big data.

[1]: *Heck you can even rent ~2TB for $14/hour! https://aws.amazon.com/ec2/instance-types/x1/


I get your point, but 10M records is big data depending what you're doing with it. Not big on disk, but extremely unwieldy depending how it's structured and how you need to query/manipulate it. I let internal product engineering at a large multinational for a long time, and we accrued so much technical debt as a result of having to handle the stupidest of edge cases, where queries against just a few million (or even thousands) of records took multiple seconds -- in the worst cases, we had to schedule job execution because they took minutes -- because of ludicrous joins spanning hundreds of tables, and imposition of convoluted business logic.

Most all of that is overall poor architecture, and most companies don't hire particularly good developers or DBAs (and most web developers aren't actually very good at manipulating data, relational or not), but it's the state of the union. That's "enterprise IT". That's why consultancies makes billions fighting fires and fixing things that shouldn't be problems in the first place.


I think that is why he had the :| face at the end.


Oh haha. I thought that was a typo!


> big data is that it can't fit in memory on a single machine

A Lucene index can be much larger than your current RAM. It can be 100x that. The data will still queryable. Lucene reads into memory the data it needs in order to produce a sane result. Lucene is pretty close to being industry standard for information retrieval.

My definition is instead "when your data is not queryable using standard measures".


I literally heard that "Big Data is something that is too large to fit into an Excel spreadsheet". The speaker was serious.

I unsubscribed from that (non-tech) podcast.


I would have said when a single file is bigger than the maximum size of disk so say 4TB - its whey we used Map Reduce back in the 80's at British telecom for billing systems - the combined logs would have been to big to fit on a single disk


I'd say if it can't fit in RAM, but still can fit on a single SSD it doesn't count as big data either


Not sure how accurate that is, since you can buy 60TB SSDs these days.


ergo, not big data.


Yeah that's everybody's gauge if they actually work with it, which was the point.


On a positive note, it sounds like proprosals to use newer technology are welcome. I keep seeing the opposite, "No this is too different, could break stuff."


IME that comes with: Sure, the new tool looks cool, but is it battle-tested? How many tools end up being relied on heavily while they're still in beta? And does it solve any of our current problems? or is it just neat?

As a grumpy SA, I see way too many people try to push for new tools because they "seem cool", instead of "Do they solve a problem we have?"


Personally I prefer to wait until technology is battle tested before adopting. New technologies are for side projects imo. If I had to categorize myself I would say early-late majority on this graph (https://en.wikipedia.org/wiki/Diffusion_of_innovations#/medi...)

Things we consider industry standard though, why should you need to fight for it? An example I can think of, dependency injection. Ideally you can test your software better and realease more reliable builds. Believe it or not I do come across companies that still are not aware of these concepts. Introducing it would be possible without breaking anything because you can continue instantiating services the old fashioned way.

With newish stuff that's still changing, if it won't impact production (i.e., tooling) I'm up for adopting it earlier than usual.

One example I can think of is javascript bundling and packaging. This would not impact production, but will have a pretty big impact on feature integration between team members and rate of completion. In MVC you need to hand type up the path of all your JS files and stick them into bundles. Not bad, not great either. Instead you could take your flavor of package management and have that bundle and minify your js files for you automatically.

I've been around government contracting and when you see problems that come up a lot, that we have industry standard solutions too, it's hard not to feel frustrated. I get where you're coming from though, just sharing my experience :)


It took me years to realize the reason programmers do this is because the tools that "seem cool" make their lives easier at the expense of everything else. This is where the popular traits of "laziness and hubris" become a liability instead of an asset.

More programmers need to embrace the suck.


> tools that "seem cool" make their lives easier at the expense of everything else.

I'd argue the opposite. Instead of spending time reflecting on how cool and useful their code is, or hardening it up, devs spend too much time reinventing the wheel. All this work to learn the next new fad is killing productivity.


Easier might not be the right word. 'tools that allow them to be lazier' might be more accurate. Gluing together pieces somebody else wrote and trying to get them all to work with as little effort as possible and are surprised when it doesn't work well.

> devs spend too much time reinventing the wheel

I'd argue the opposite. They spend too much time not reinventing the wheel. They strap factory made bicycle wheels onto a car and are surprised when the wheels break. They could benefit from spending more time trying to make a better wheel.


Or learn about better wheels designed by smart people back in the 60s and 70s, when no one had the capability to just keep sticking wheels onto cars to see what works - so they had to rely on thinking and solid engineering practices instead.


Precisely why I've started buying technical books from ages past. I'm working my way through Algorithms + Data Structures = Programs by Niklaus Wirth, Constructing user interfaces with statecharts by Ian Horrocks and Practical UML Statecharts in C/C++, Event-Driven Programming for Embedded Systems. The last one has been especially enlightening.

Do you have any suggestions for which 'better wheels' people should be looking at?


SICP is a classic I can highly recommend. It made me aware of just how much the "new, smart" approaches to organizing code that people like to attribute to their favourite programming model (like "OOP is best because classes and inheritance means modularity") are actually rehashing of obvious and general ideas known very well in the past.

I generally like reading on anything Lisp-related, as this family of languages is still pretty much direct heritage of the golden ages.

The stuff done by Alan Kay, et al. over at PARC is also quite insightful.


If it ain't broken, why fix it?


In my case, usually something is broken or breaking in production frequently enough to warrent some changes. Plus, there are other reasons you can make a change even though it's not broken.

Sometimes it can make you more productive. Or though your site is still responding to current customer demands in a timely fashion, you know that the mobile experience could be significantly improved now that browsing via cell phone is on the rise.

Another thing to consider is employability both from a company and individual perspective. If you can keep up with moderately current (not the latest and greatest) trends, you'll attract people who want to grow in their careers. I wouldn't want to work on C# 2.0 using Visual Source Safe. It's hard to convince a company that you can learn git on the job.

In general I like to move without introducing breaking changes. I'm not a cowboy coder, it's really exhausting working with one. I do think there's merit in realizing when it's time to change though.


As long as the database isn't relational, I guess.


10 million rows in a relational database doesn't need to be bad nor is it big data.

Rows is a bad measure of "big" when it comes to data. A measurement of bytes and probably more specifically bytes per field and how many fields the records have, as this gives a better indication into the way this will be written and potentially searched.

10 million rows of 5 integer values is pittance for any relational database worth using in production. 10 million rows of 250 text columns would be horrendous for a relational database.


Someone once suggested to me that 'big data' begins when it doesn't fit in RAM in a single rack any more.


Yup, that's essentially what looking that byte-size means. However, just because it doesn't fit in memory might not make it big data if it's just poorly engineered.

But many times this happens because of wasted or bloated indexes that aren't useful. Or it happens when data types are picked incorrectly.

For example, I once worked on a database where the original developer used Decimal(23, 0) as a primary key. This was on MySql and that ended up taking up 11 bytes per row, versus a Long which would have just been 8. In one table, maybe not so bad but when you start putting those primary keys into foreign key relationships... we ended up with a 1 billion row table in MySql that had 4 of these columns in it. That might make it "big data" by that definition but it's also just bad design.

Another example in that same database was using text fields in mysql for storing JSON. Since text fields in mysql are stored as separate files, this meant that every table that had one (and we had several tables that housed multiple) ran in large IO and disk access issues.

"big" data is probably a bad term to use these days because of easy it is to accidentally create a large volume of data but not need a big data solution outside of the fact that it's not the business that needs it, it's the poorly implemented system that does.

But the real reason we talk about fitting in memory comes from the core of the issue: IO. Even a super large memory set could end up being slow if it's postgres and single threaded reader that's scanning a 500 GB index. AWS offers up to 60 GB/s memory bandwidth and we'd need it for this index, since that would still take almost 10 seconds to warm up the indexes in the first place.


>Since text fields in mysql are stored as separate files

Bwuh? Over in MS SQL you just go for an NVARCHAR and forget about it. What is the right way to store this data (if you really do need to store the JSON rather than just serializing it again when you get it out of the DB)


varchar is different than a text field in mysql: http://dev.mysql.com/doc/refman/5.7/en/blob.html

It stores text fields as blobs.

I suppose now the right way would be the json data type. It didn't exist when I was working with these servers though (or they were on a much older version of MySql) https://dev.mysql.com/doc/refman/5.7/en/json.html


That's soon going to be on the order of 100 terabytes, so there will be only a handful of companies doing big data ;-)


I'm only aware of servers up to 12TB. Care to elaborate?


He/she said a whole rack of servers. I actually took 30 servers of 2TB each and rounded up to 100. With 12TB per server it will already be over that.


10M rows in a relational database is a very low number (depending on the size of the row of course).


I know, it was a sarcastic follow up to the "now they have big data" part of the original comment.

"SQL doesn't scale". It needs to be in Mongo or whatever NoSQl database is in right now. I have heard all sorts of nonsense regarding "big data" in the last few years.


ahaha, i didn't read the sarcasm that time, sorry for replying with tmi




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: