Get me out of data hell

baazaa · 2024-11-02T23:51:44 1730591504

People always say this guy just has had bad luck with his employers but I live in Melbourne and work in data and reckon the whole industry is a scam.

Like why didn't anyone catch the issue with the logs? Because it doesn't matter, every data team is a cost-centre that unscrupulous managers use to launch their careers by saying they're big on AI. So nothing works, no-one cares it doesn't work, most the data engineers are incapable of coding fizzbuzz but it doesn't matter.

People always wonder why banks etc. use old mainframes. There's like a 0% success rate for new data projects. And that 0% includes projects which had launch parties etc. but no-one ever used the data or noticed how broken it was. I don't think a lot of orgs which use data as core-infra could modernize, the industry is just so broken at this point I don't think we can do what we did 30 years ago.

ludicity · 2024-11-03T06:26:20 1730615180

Author here. I now know some places in Melbourne that have a good success rate on projects. Some of them are so small as to be invisible and rarely hire. One uses two specific independent recruiters or internal referrals. As far as I know, they are extremely profitable because the competition is a joke.

For many organizations, the success rate is indeed 0%. A Group of Eight university (our top 8 universities nationwide), for example, sent me a job description a few months ago where they misspelled the word engineer, and left change tracking on in the Word document. This allowed me to walk through the profiles of the people running their data projects, and it was super obvious that many of the people involved aren't going to do a good job. They could have saved millions by having a random HN person eyeball the CVs of their chosen leadership team.

svilen_dobrev · 2024-11-03T20:09:35 1730664575

hey. Congratulations on your decision.

i think it all goes deeper in overall culture/attitude there.

i was in Melbourne in 2012.. with idea to relocate wholesale, 2nd time. Worked 2 months at some "startup", that fired me when i finished the task given.. Seems it was cheaper to hire "permanent" then fire, rather than take someone on 2 months contract. So that's one red light on the dashboard.. There were other redlights from overall "society", feeling something-is-wrong but i did ignore them for quite a while - people have become evil, etc..

Then i started going around places and mailing my cv here or there (with 22y of experience making software, by that time),.. ibm, ernst&young, you-name-it.. to no avail, and more red-lights flashed on me.. And one day visited some kind of meetup, organised/held in some wellknown company. Seemingly it was kind-of "hiring" event or so, we grouped in teams of 3-4-5 people, with half from company, and other half outsiders.. and went solving some problems of theirs. Or that was the "label". Any solution that any of outsiders suggested, was shot down, with somewhat vague reasons, that at the end started to sound like "if we solve this there'll be no job tomorrow". And Smile :) Lots of smiles. Empty ones.

That was one of the Last red lights on my dashboard. Whether it was a financial balloon pressing everyone so they only smiled and did nothing, in order to pay the mortgage, or something else, i don't know. Next day i watched Sacrifice/1986/Offret by Tarkovsky, and.. bought a ticket out. Discontinued my oz dream. For good.

quoting meself, from 2007-8: "with time, places change people. Other way happens noticeably only while coming in - or switching on."

have fun

denimnerd42 · 2024-11-03T01:42:49 1730598169

I lead a team on a large data project at an enormous bank, hundreds of devs on the project across 3 continents. My team took care of the integration and automation of the sdlc process. We moved from several generations of ETL applications (9 applications) netezza/teradata/mainframes/hive map reduce all to spark. The project was a huge cost savings and great success. Massive risk reduction by getting these systems all under 1 roof. We found a lot of issues with the original data. We automated the lineage generation, data quality, data integrity, etc. We developed a frame work that made everything batteries included. Transformations were done as linear set of SQL steps or a DAG of sql steps if required. You could do more complicated things in reusable plugins if needed. We had a rock solid old school scheduler application also. We had thousands of these jobs. We had an automated data comparison tool that cataloged old data and ran the original code vs the new code on the same set of data. I don't think it's impossible to pull off but it was a hard project for sure. Grew my career a ton.

neeleshs · 2024-11-03T17:30:00 1730655000

This story is more an exception than norm.

I know startups that hired data engineers, deployed warehouses,DBT, a BI tool and churned hundreds of reports, and in one case their DBT project has hundreds of files. No one in that company knew why any of it was used.

All said and done the business users wanted three reports.

More often than not data teams are self-serving than anything else.

d0gsg0w00f · 2024-11-03T10:17:34 1730629054

I think the difference is that technical and business leadership at a bank understand that data is lifeblood. Bad data will get you on the front page of WSJ and a phone call from a regulator in Luxembourg.

For a lot of smaller Internet companies, data is just a fluffer. The real business is in image and which VC bbq you get invited to.

tmiku · 2024-11-05T01:16:33 1730769393

Can you define fluffer as you use it here, and maybe mention where you picked it up? I haven't heard it used much outside of a specific and very notorious Sankey diagram.

pokerface_86 · 2024-11-08T23:17:33 1731107853

girls in porn who blow the guys so they stay hard while they film.

i think

RHSman2 · 2024-11-03T06:00:28 1730613628

What was the main reason for your success?

jonathanlydall · 2024-11-03T07:03:54 1730617434

Not the person you’re replying to, but I would expect that a near universal answer to this across all kinds of projects (not just software) is effective collaboration and communication between stakeholders and teams.

Despite no shortage of technical talent on large projects they can still often fail, and it’s because building a technically impressive thing doesn’t matter if it doesn’t do what business needs.

So it’s about making sure you’re building the “right” thing that delivers on business’s actual needs, and the only way to find out what those are is through constant and ongoing good communication between technical and business people.

ozim · 2024-11-03T07:41:00 1730619660

Downside is lots of work business is doing is running around with wheelbarrows and they actively sabotage it when someone wants to build a conveyor belt.

moregrist · 2024-11-03T13:16:26 1730639786

The flip side of this is that the stakeholder has to actually care enough to invest in collaboration and have enough bandwidth to be able to follow through.

The kind of communication that lets cross-functional projects be effective is time consuming, and competent people tend to be overworked, no matter what part of the business they’re in.

RHSman2 · 2024-11-03T18:38:38 1730659118

I was fishing for that answer. Glad to hear this is the universal answer (not well implemented)

jorvi · 2024-11-03T16:09:53 1730650193

Specifically for the financial sector and especially banks and government tax departments, they’re on a clock.

As time moves on, there are less COBOL engineers. Hell, sometimes their systems have been written in a bespoke language. There is less and less understanding of why something is set up the way it is due to lack of documentation. Updates / changes to the code sometimes have to wait for 2-3 years because the system isn’t flexible enough (literally, not as in “this change will take 2-3 dev years”). Even code that old contains bugs, but due to the age of the code they’re inscrutable.

However, whichever new system gets tooled up has to be 99.999% flawless, or it could cause serious damage to the bank and even its regional market.

When there is that kind of pressure, dev teams are no longer considered a cost sink, money flows, and the world is possible.

data_marsupial · 2024-11-04T11:17:29 1730719049

A large project where the end goal is replicating (and possibly correcting) existing data outputs is much more likely to succeed than one that is integrating new data sources or building new data models. For the latter type of project, it's very common to find that the team is disconnected from the business users and original motivation for the project, with poorly defined success criteria.

denimnerd42 · 2024-11-04T16:11:30 1730736690

There was a clear, large cost savings and risk improvements with the project. The project was actually easy on the requirements front. They put all new non critical features on hold for 2 years and there was no question on the requirements: The new system's data must match the old system's data except for any bug fixes or agreed upon changes.

tiew9Vii · 2024-11-03T01:10:05 1730596205

Reluctantly worked on AU data projects for maybe the past decade. I don't classify myself as a data engineer, in fact I hate data engineering or data related work which is glorified ETL and SQL most of the time. They are the worst, broken projects I've done, not software engineering in the software engineering sense. I don't think I've worked on a good one yet despite the potential to be really interesting projects. I prefer general software projects doing a bit of everything as a generalist, data pays the bills though in AU.

Not seen/heard of this person before but reading this specific blog post it all sounds very familiar, it's depressing.

The "CTO" getting on stage taking a bunch of credit and everything being a mess or incomplete or lies is very familiar. Maybe not CTO but higher management. It's all smoke, mirrors, optics, self promoting, it works as these people end up making their way up the ladder when the lonely dev trying to do better work is just a dispensable cog in the wheel.

6510 · 2024-11-03T07:21:30 1730618490

> Not seen/heard of this person before but reading this specific blog post it all sounds very familiar, it's depressing.

Someone once told me he, as a form of therapy, rewrote the company he worked at in a few weekends. He never mentioned it to his coworkers, it was strictly a therapeutic effort. They apparently spend years "fixing" things without making any progress.

intelVISA · 2024-11-03T11:29:23 1730633363

Most apps are trivial for a decent dev to reproduce, I'd wager the root problem is rarely the codebase: the org is rotting. Years of 'fixes' with no progress is like blaming the water for sinking a ship.

Success attracts deadweight who (un)intentionally sandbag efforts to reverse this downward trend for their own self-preservation. I don't blame them, doubt there's a fix when the system requires most people work bullshit jobs instead of collecting UBI.

whstl · 2024-11-03T19:36:57 1730662617

Bingo. The #1 thing I learned in consulting is that you can't build good software if the processes and structures are wrong in the first place. Ditto with off-the-shelf software.

Something that takes a week in company 1 can take a year in company 2 purely because of organizational issues.

Rotting organizations will produce rotting software.

Seattle3503 · 2024-11-04T06:38:20 1730702300

Are there good books or resources that describe a good organization and how to build one?

392 · 2024-11-04T12:07:17 1730722037

Deming, CommonCog

CalRobert · 2024-11-03T10:58:01 1730631481

Ten years ago data engineering was another discipline in software engineering, like backend or frontend. Somewhere along the line the term was co-opted by “I can maybe barely string together some untested airflow pipelines” and it means something much different now.

WorldMaker · 2024-11-04T17:30:57 1730741457

It is said that every major, still living COBOL program contains a bespoke, (sometimes poorly optimized) database engine with no standard query language, the only query tool was more program code. Perhaps the longevity of Mainframes points to there was some wizardry/safety lost in standardizing databases, giving people the impression that data itself was standard and too many tools to footgun data into foot pain, that we lost when databases were defined entirely as COBOL internals?

(Not that we haven't gained a lot from modern database tools, just something to think about that maybe the data siloes were good sometimes, too.)

bob1029 · 2024-11-03T01:49:51 1730598591

> I don't think a lot of orgs which use data as core-infra could modernize

I argue this is a happy conclusion, not a problem to be solved.

What would "modern" bring to a bank except even more pain & suffering? Database technology invented in the 80-90s is more than sufficient for tracking information at the scale that 99% of financial institutions operate at today.

Virtually every core conversion project I've ever heard of has been a failure or is currently a burning wreck on its way to the bottom.

The only new bank projects that touch data and seem to succeed are LOB apps with highly curated experiences that are tightly integrated with the actual front/back office business. Having buy-in from staff regarding your UX is way more important than spinning out a 20 page AWS architecture diagram. The CTO can only take you so far through the vendor approval process at a bank. Retail operations (i.e. the people who are responsible for the brick & mortar branches) typically have substantially more pull in these organizations.

makeitdouble · 2024-11-03T02:28:46 1730600926

> What would "modern" bring to a bank except even more pain & suffering?

In the most simple term, a future.

Except if your bank is literally too big to fail, at some point you have to either move on from 80s technology or at least bring in an adaptation layer, because your profit center have also moved on or you're facing harder competition.

A typical example is banks getting merged: there will be a fight to see which system stays and which one disappears. If you froze your technology 4 decades ago it won't be your stack winning. [0]

Another is the evolution legal frameworks: EU countries passed laws requiring interoperable APIs to perform standard banking operations. Being a customer of a decent bank or a fossilized one made a huge difference and the market grew a lot more competitive. People would start hedging their bets when legacy banks looked too far behind.

[0] The most interesting and recent example of this is Mizuho bank just miserably failing at that task to the point the gov. intervened and anyone not married to them probably moved out.

https://www.mizuhogroup.com/news/2021/06/20210615_2release_e...

bob1029 · 2024-11-03T11:52:21 1730634741

> A typical example is banks getting merged: there will be a fight to see which system stays and which one disappears. If you froze your technology 4 decades ago it won't be your stack winning. [0]

In my experience (small/mid-size US banks), the institution with more assets or branches usually wins. It rarely has anything to do with technology. If a 6 region, 200 branch monster comes in and wants to buy some 4 branch relic in the West Texas desert, it doesn't matter if the smaller institution has achieved AGI and an intergalactic core platform. They're almost inevitably gonna be merging their records into some old boring IBM system.

shakna · 2024-11-03T15:30:47 1730647847

The landscape is a little different over in Australia. Most of the Big Four are closing as many branches as they can. Branches are no longer a mover or shaker, because most Australians never touch cash anymore. [0] Most transactions are digital.

Almost as many people pay with card as with phone.

Faster record systems, faster transfers, actually do win people over here.

[0] https://www.rba.gov.au/publications/bulletin/2023/jun/cash-u...

finnh · 2024-11-03T17:45:36 1730655936

I welcome the day when the US stops devoting enormous amounts of useful real estate to bank branches. They are a sad simulacrum of actual street life, taking up tons of space to advertise a bank and contributing to high rents that preclude less-profitable small businesses. One step up from billboards.

makeitdouble · 2024-11-04T06:24:55 1730701495

I think it depends on why they're merging. If the goal is just to increase size, as you point out doing it at the lowest cost will be the only POV.

If they're doing it for more strategic purposes, the calculation becomes more complex and there will be more "reverse" acquisitions where the entity closer to the target is prioritized.

iamacyborg · 2024-11-03T08:59:47 1730624387

> If you froze your technology 4 decades ago it won't be your stack winning.

I’m unclear whether this is bad for the business or just bad for folks hoping to keep their jobs.

pletnes · 2024-11-03T12:12:46 1730635966

I tossed out my credit card because the UX was bad. At this point most of the CC services are utilities or commodities. Just get another at a bank with better apps and website.

xelamonster · 2024-11-03T15:23:02 1730647382

Yup, I'm moving to a new ptimary checking account currently because I'm sick of my local credit union that is apparently so incompetent they can't handle sending email alerts correctly. Also, any bank or credit card that won't support Plaid seems not even worth considering at this point.

mixmastamyk · 2024-11-03T21:15:17 1730668517

Had to look up what plaid was. Think I’d prefer Fednow support and/or Aus/NZ style modern banking, that’s future proof. I see no reason for a third-party to be involved.

xelamonster · 2024-11-04T21:02:02 1730754122

It sucks, but it's the only service anyone ever uses. Doesn't really matter what I prefer when every financial service i want to use offers Plaid or nothing.

lmz · 2024-11-03T09:58:46 1730627926

Another system migration example (TSB, 2018) from the UK: https://www.tsb.co.uk/news-releases/slaughter-and-may.html

lmm · 2024-11-03T09:01:17 1730624477

Mizuho is doing great, they're probably the least awful of the big Japanese banks. Everywhere is like this, and "old" technology doesn't seem to make the places that use it appreciably worse.

makeitdouble · 2024-11-04T06:19:54 1730701194

Mizuho Finance as a group is doing fine, partly because their main business is not consumers and companies can't just leave their main bank on a whim.

And also partly because it's the third biggest bank of Japan and it wouldn't be allowed to not be doing fine ("too big to fail" doesn't even start to describe the impact of a group this size starting to go down)

Do they do "great" ? Arguably no. They shut down a number of consumer facing locations, had a hard time recruiting, and compared to Mitsubishi and Mitsui the gap has kept widening. In their small/middle customer businesses they're starting to face the rise of GMO and Rakuten where the two other are too far ahead to even need to care about it.

lmm · 2024-11-04T06:57:14 1730703434

> Do they do "great" ? Arguably no. They shut down a number of consumer facing locations, had a hard time recruiting, and compared to Mitsubishi and Mitsui the gap has kept widening. In their small/middle customer businesses they're starting to face the rise of GMO and Rakuten where the two other are too far ahead to even need to care about it.

The last year I can find figures for has Mitsubishi's assets -5.46%, SM -4.14%, and Mizuho -2.03% (so yes a decline in absolute terms, but that's the best performance in the top 10, and sounds like closing the gap with Mitsubishi and SM rather than widening it). I can't find a branch count but Mizuho has vastly more ATMs (around 4x as many as Mitsubishi) and that number is increasing. They've continued to make consumer-facing improvements recently like their wallet app allowing electronic money payment directly from you bank account, and English support in their main app. Of course like all of the big banks they're facing competition from the rise of the net banks, but as far as I can see they're doing as well as any of the big four, perhaps better.

8n4vidtmkvmk · 2024-11-03T05:29:21 1730611761

Yes, please, fix the UX. That was my biggest gripe, working as an FSR at the bank.

One particular thing was we had to convert transit #s into branch numbers regularly. We did this by looking at a sheet of paper of course. Eventually I got fed up and wrote a web app so you could just punch in the numbers and have it instantly convert. I checked and people are still using it 10 years after I quit, which means nothing has changed and they're still using the same god awful software.

They did move some data at some point. I know this because they screwed that up too and partially merged my mom's and my bank accounts, which is a pretty bad error. Would be worse if it was some rando. Speaking of... That's exactly what AT&T did.

chaxor · 2024-11-03T13:55:08 1730642108

>What would "modern" bring to a bank except even more pain & suffering?

It probably depends on what "modern" means here. If updating from tons of COBOL to {Julia, Python, Rust, or some other well known language} with an update to an SQLite backend (or perhaps postgres is acceptable for very specific scenarios), that is likely a good choice due to being able to fix old cruft and add maintainability for the future. If it's a switch to some nosql database backend with everything switched to some cypher-based lang or anything that touches javascript in any way, it's probably a mistake.

ericjmorey · 2024-11-03T15:56:06 1730649366

Why SQLite? Why Julia? These seem like poor options for banks.

neverartful · 2024-11-03T17:07:00 1730653620

In the case of SQLite, I'd say incredibly poor (to the extent that the person who made the decision should be fired).

__turbobrew__ · 2024-11-03T22:21:44 1730672504

I would like a bank which supports U2F.

le-mark · 2024-11-03T00:50:30 1730595030

This jives with my experience at a financial services company. I once sat next to the “big data team” and the company 5 year plan was all about delivering analytics and ai to customers using their data the company housed.

The team consisted of one guy (who had a business degree) and a lot of empty cubes they were trying to fill. A year later the company had been acquired and the big data initiative had evaporated.

RangerScience · 2024-11-03T01:02:00 1730595720

I have felt exactly this on regular full stack teams many, many times, so it’s also not just limited to data teams.

IMO a major factor here is that software engineering is both opaque and esoteric - at least with physical engineering, there’s something people can look at and think they understand.

baazaa · 2024-11-03T01:28:06 1730597286

My theory is that data is worse again because at least if you're making a website you're expected to end up with a website. The process is opaque and esoteric, but the end-product is somewhat tangible.

A lot of data projects are moving and transforming data no-one cares about. They can fail completely silently, a manager can lie and say 'we've successfully built the data platform which is going to enable AI analytics' and it'll be like a misconfigured S3 or something. No-one's checking the end-product or even understands what it's meant to be.

RangerScience · 2024-11-03T01:42:48 1730598168

Excellent point… and one I should know. I spent about 6mo as a data eng (only one at the startup) and long after, found out no one ever had a clue what I was talking about in standup. (To be fair, I was self-teaching, and no-one else knew anything so)

lelandbatey · 2024-11-03T01:13:13 1730596393

I have seen data work well, but it only worked well in a situation where we had management focusing on two very tangible things that even the CEO could verify (since the CEO did know the product). Those tangible things were:

1. Accurate, auditable billing down to per-chargable entity/event

2. Dashboards for each customer that reflect THE SAME numbers as we generate in #1, so customers could see relevant info quicker than just waiting for the bill from #1

The only reason those things were valued and made a focus though was because a HUGE customer threatened to completely drop our company because that customer did an audit and noticed that we had overcharged them 3% because we were actually billing them on estimated numbers. That led to our CEO being personally yelled at by a much larger CEO, and our CEO (to his credit) didn't blame us (we'd raised the alarms that the bills were estimates and not auditable) but did say "this can never happen again, I trust you, do whatever needs to happen to make sure this never happens again."

And once we had #1 solid and tight, we were able to leverage that solid auditable data to generate solid dashboard numbers that always squared with what showed on the bill.

RHSman2 · 2024-11-03T06:04:41 1730613881

A CEO worth working for

nyarlathotep_ · 2024-11-03T06:37:43 1730615863

>> I live in Melbourne and work in data and reckon the whole industry is a scam.

You needn't live in Australia to reach that conclusion.

atoav · 2024-11-03T05:43:04 1730612584

The problem is that most orgs seem to do the wrong thing, because the incentives of the higher ups don't align with ehat is good for the org.

E.g. if you are a bank ideally you'd like all your processes automated and streamlined with extremly transparent data flows etc — and you want as many of the banks employees to be proficient in these systems and constantly work on improving the systems within an controlled environment.

In practise this is not the kind of thing that allows single managers to come across like heroes — so it doesn't happen that way and you get island solutions with duct-taped connections between.

ludicity · 2024-11-03T06:33:11 1730615591

I also get to spend a lot of time with executives thanks to the blog's success, and part of it isn't just incentives, it's pure confusion. People have no idea what they're buying.

I also get invitations to "sponsor events" now, since people see "director" on LinkedIn and think I have way more money than I do. Their business model seems to be flattering executives by inviting them to events where they can network with other rich people, then ask me for "sponsorship" money so that I can go into the room and brainwash them with my marketing material. I might even try it at some point to see if that's an accurate read.

stackskipton · 2024-11-03T15:47:40 1730648860

>I also get to spend a lot of time with executives thanks to the blog's success, and part of it isn't just incentives, it's pure confusion. People have no idea what they're buying.

So, 43 years later and Putt's law is alive and well.

cratermoon · 2024-11-03T16:11:56 1730650316

Putt's Law: "Technology is dominated by two types of people, those who understand what they do not manage and those who manage what they do not understand." From the book Putt's Law and the Successful Technocrat, published in 1981. An updated edition, subtitled How to Win in the Information Age, was published by Wiley-IEEE Press in 2006

<https://en.wikipedia.org/wiki/Putt%27s_Law_and_the_Successfu...>

disgruntledphd2 · 2024-11-03T07:42:33 1730619753

If you want to make your consultancy a success, you probably should attend some of these and see if they help you get business.

ludicity · 2024-11-03T20:58:52 1730667532

We're bootstrapped so we unfortunately can't fling money at sponsorships. Or rather, we can, but it would cut into our runway quite a bit, and we have more promising avenues to pursue. If we acquire bad clients, the type that would let themselves be brainwashed and who status-seek by attending these events, that's just going to be as dumb as a regular job but without the luxuries afforded to employees.

disgruntledphd2 · 2024-11-04T09:23:03 1730712183

I totally get you, but it's worth noting that big, incompetent companies are where most of the money is. Anyways, best of luck with it!

rocqua · 2024-11-03T08:11:10 1730621470

As matt levine likrs to say. High level finance is mostly about seating charts.

In other words, it's about status, not money.

neumann · 2024-11-03T23:44:21 1730677461

100% agree with most of this. Most large organisations in Australia are just clueless about their needs and ride hype cycles with execs acting on FOMO. And they bought the lie that you need all these separate distinct engineers who are professionals in a niche. They think they are building a high end kitchen by having specialists and think they are hiring sous-chefs etc, but most of the time they are hiring line cooks, and in fact most of the time they should just be hiring short order cooks.

Most teams and businesses I know who are doing great things with data are mostly smaller companies and tech built up start-ups and hire one of two types of people. Generalists who can fill the gaps and are interested in everything, and staff software engineers who are amazing developers and asked to put together data pipelines with consideration of the whole infrastructure.

When I was looking for work, all the good companies were not hiring data engineers or machine learning engineers. They were hiring Senior Software Engineers with a remit to build data infrastructure, or build machine learning models. And they immediately removes 90% of the noise applicants.

fphhotchips · 2024-11-05T04:11:05 1730779865

Melbourne is easily the worst city in the country for this. Most of the tech sector is in the very large enterprise space lead by the banks, and as a result it's who you know and whether you went to Melbourne Grammar or Geelong Grammar that will determine which company you work for once you reach a certain level. Sydney is better just because there's more smaller stuff going on, and because CBA is better than NAB and ANZ combined on tech. (I hate Sydney otherwise and am based out of Melbourne)

Some places in Melbourne get real work done, even in the data sector. They're hard to find, but they exist.

photonthug · 2024-11-03T12:14:35 1730636075

> it doesn't matter, every data team is a cost-centre that unscrupulous managers use to launch their careers by saying they're big on AI. So nothing works, no-one cares it doesn't work

Yes. Lots of times the most important asset for these companies is actually contractual obligations in terms of exclusive access to data or customers. It doesn’t matter if the product works, you’ll have to buy the company to build a different one that does. But the (broken or nonsensical) product pushes up the value of mergers and acquisitions. If leadership completely makes shit up then they might go to jail, so, they burn X million on “work” and cloud spend as part of an elaborate argument that it should sell for 10X.

> the industry is just so broken at this point I don't think we can do what we did 30 years ago.

Well no, it’s never been easier to do high quality engineering, but mbas are in charge. They don’t think like philosophers or scientists and don’t traffic in common sense.

For anyone questioning their life / career choices because of this, it’s not about you. An individual working in an environment like this can still be a craftsman of integrity if they focus on small problems and solve them well, but you need to be able to get satisfaction from that, not from some overall mission (which again, is probably fake). If you’re most motivated to work directly on architecture, unification, etc, and want to change lots of things then you will probably be miserable.

But if you’re feeling shitty about the whole thing, it might help to realize that the actual nuts and bolts of adtech/martech data pipelines are much the same as the ones for cancer research or particle physics or climate science, so one can at least try and get transferable skills if circumstances are currently holding you hostage. Data isn’t a bullshit job. Leadership and management that just want to play games is the problem.

akdor1154 · 2024-11-03T11:35:01 1730633701

Agree.. I can tell you at least one Melbourne-based Flybuys retailer calculates your points with an unholy daily-scheduled stored procedure in Snowflake SQL, because.. big business dysfunction reasons lead to the data team being assigned to do it, and the data team didn't actually have any software engineer roles in it.

At least it has tests.

jpmoral · 2024-11-03T12:37:58 1730637478

Do you mean the points earned in-store that are then sent to Flybuys to add to your total? Or, god forbid, do they do their own total?

akdor1154 · 2024-11-03T19:39:48 1730662788

The former, in-store and online (which is beginning to touch on those business dysfunction reasons).

DeathArrow · 2024-11-03T09:33:39 1730626419

>Because it doesn't matter, every data team is a cost-centre that unscrupulous managers use to launch their careers by saying they're big on AI.

If you have lots of data flowing and have full teams "working" 24/7 on it, does it really matter if that data is junk and that is not processed in a meaningful way? You can still ask AI to generate some nice looking charts with big numbers to show to investors. Investors like nice charts and big numbers. Or so, some businesses people think.

But in all reality the investors will ask questions like: how will this solve problems for customers, how do you intend to sell this to customers, how much does it cost, how will this generate me money. Unless those investors plan an early exit by finding other, more gullible investors than them, kind of like knowingly investing early in a Ponzi scheme.

chubs · 2024-11-03T07:20:38 1730618438

Do you think perhaps the problem is rooted in people being dishonest, and honest people are driven mad by it all and self-select out? The dead-sea effect?

ludicity · 2024-11-03T07:33:07 1730619187

It's so many things. Dishonesty, lack of technical competence, political pressure, hype, organization structure, and incentives.

If I had to summarize though, it's that the median performance in any field will be at much lower levels than outsiders expect, and some fields with hazier results have this level set very, very, very low, especially when they're hyped up. But also that the market is actually at least a little bit efficient, but over long time scales. I think there's a 50%+ chance that the role of Chief Data Officer begins to die off, but also that it'll be replaced by something silly.

iamthepieman · 2024-11-02T21:27:26 1730582846

I do not use this term to refer to myself. I respect those who do and respect the meaning behind it but am just old enough that it feels alien to me 99% of the time.

But I am SO triggered by this piece. I had that intrusive feeling you sometimes get when driving where you think, "I could just close my eyes and see what happens", "Or that clif is so close and the guardrail doesn't really extend far enough"

Only for my career. Like I should just not show up on Monday. I should get in the car and drive far away and change my name and work at a nice retail joint in a mid-sized town.

I'm going to need to sit and stare into the distance for an hour and 3.

strken · 2024-11-03T08:29:14 1730622554

It's an almost exact copy of my last few months, right down to the 10am start.

Except that all our other senior engineers got laid off and there's nobody to pair with, I don't give two fucks about bullying because at this point the entire company knows I'll quit on the spot if they try, and our problems are mostly that the remaining team cannot understand the terrifying eldritch decision making process that led to fun little patterns like "wrap every API call in a try/catch and then ignore the errors".

I am seriously considering doing a TAFE course and becoming an electrician.

Buttons840 · 2024-11-03T11:45:40 1730634340

They took inspiration from the error steamroller: https://github.com/ajalt/fuckitpy

CSSer · 2024-11-03T23:18:49 1730675929

Perhaps if one coupled this with a multimodal LLM we could achieve a singularity

GreenWatermelon · 2024-11-03T23:25:43 1730676343

> wrap every API call in a try/catch and then ignore the errors

This describes the project I'm working on at $job with eerie accuracy :sob:

Dare I ask, was that project written in Java?

strken · 2024-11-04T00:09:56 1730678996

Unfortunately my personal hell is written in Node.js.

I am so very sorry, both that you are suffering through this too and that the infection has spread.

392 · 2024-11-04T12:13:59 1730722439

Will you feel better looking at a wiring job that seems guaranteed to burn down the building if you don't fix it? And will Management agree?

strken · 2024-11-04T23:04:25 1730761465

I wish the abominations software engineers create were as regulated and fixable as a bad wiring job. I would feel absolutely chuffed to work in an industry with licensed inspectors and standards bodies.

I am currently dealing with a system involving four separate serverless functions that call each other. There's no reason whatsoever why any of them need to be network calls. The fourth function just calls the first function again. One is in a different region for no discernible reason.

hmaxdml · 2024-11-05T00:21:20 1730766080

Would be nice if there was a serverless library that lets you do in-process orchestration so you don't need abominations like Step Functions ;)

e.g., https://github.com/dbos-inc/durable-swarm

Electricniko · 2024-11-03T00:22:11 1730593331

> There has been a point in my life where I ended every day in the dark, staring at a wall for an hour or two straight, trying to figure out why everything felt awful.

From his post about burnout and mental health. Also worth a read.

https://ludic.mataroa.blog/blog/on-burnout-mental-health-and...

SanjayMehta · 2024-11-03T01:46:09 1730598369

If on Monday morning you’re wishing it was Friday evening, it’s time to quit.

sph · 2024-11-03T07:29:16 1730618956

The frightening thing about serious work-related burnout is that three years after quitting, on Monday mornings you still wish it was Friday evening.

Any day now I'll be ready for the grind again. Any day now.

ludicity · 2024-11-03T20:55:58 1730667358

It took me about six months off to start feeling normal, and I think I got out much earlier than most people do. And if you read that post, I still clearly let it get pretty bad before I left.

chubs · 2024-11-03T07:23:46 1730618626

Many of us have kids to feed! The economy is not bursting with jobs anymore since rates rose post-COVID.

karel-3d · 2024-11-03T13:39:28 1730641168

No need to quit immediately, just apply for jobs on the side.

Aeolun · 2024-11-03T16:43:19 1730652199

And do the same thing elsewhere for less money and with less social capital?

tpxl · 2024-11-03T20:13:35 1730664815

I went from an important cog with low pay but high responsibility to a much higher paying job with no responsibility. You can too with a bit of luck.

Aeolun · 2024-11-03T22:36:25 1730673385

No, I mean, I’m there, but it’s anything but fulfilling xD it’s just that anything else would feel like a downgrade.

feoren · 2024-11-03T08:18:12 1730621892

What about if on Friday morning you're wishing it was Monday? Like, two Mondays ago? So you weren't quite as late on everything?

whstl · 2024-11-03T19:47:58 1730663278

As someone who managed to stay productive during a burnout despite constant bullying by a yelling CTO: it doesn't really help if you deliver on time.

andrepd · 2024-11-03T12:52:29 1730638349

Yep, in theory yes, but shame that the bills won't pay themselves

_proofs · 2024-11-03T18:26:39 1730658399

i don't buy that any situation is so hopeless, you're powerless to improve it. at least in the context of this field and its line(s) of work.

sounds a lot more like learned hopelessness making it harder to respond to stress with radical change because of (normal and human) fears of the unknown.

at some point though responsibility for the circumstances, the feelings, the stress -- the good, bad, and ugly or easy, hard, and nearly impossible -- has to be taken.

there's only one life to live. we owe it to ourselves and others to do more than -- to try not to -- just "roll over and play dead", so to speak.

humans have survived a lot and have adapted to just as much if not more.

if i ever allowed myself to even stay at any of my former jobs coming up in my life when i was paycheck to paycheck because of not making rent or just being flatout broke and homeless, i would have not progressed my career, or life, in any meaningful way, and just fed the negative feedback loop influencing what feels like a miserable existence (even privileged as it were).

can't hold myself hostage. and also, i can't hold those around me hostage as consequence of my non-action, either.

consf · 2024-11-04T16:53:38 1730739218

Finding work that makes you look forward to more than just the weekend can be transformational yet really hard

bryancoxwell · 2024-11-03T03:02:49 1730602969

> I had that intrusive feeling you sometimes get when driving where you think, "I could just close my eyes and see what happens", "Or that clif is so close and the guardrail doesn't really extend far enough"

L’appel du vide

albert_e · 2024-11-03T09:01:53 1730624513

Does the mention of such concepts or acknowledging it is real ... put some lisetners (if they are work certain professions) under an obligation to refer the person to a mental health assessment?

Example: a blog post like this one, with the author's real name, that acknowleges it front and center: https://ebb-and-flow.blog/2023/07/23/another-scan-lappel-du-...

josephg · 2024-11-03T04:56:06 1730609766

Seriously, quit then. It’s not worth it. You get one life. How many hours on this earth do you want to spend suicidally depressed? If you have a really high pain tolerance, maybe you can do that for years. How lucky would that be?

There’s a polish restaurant near where I live that makes amazing food. The owner is always out and about, chatting with customers and making sly jokes. Turns out he used to be an oracle sql consultant of some sort, and he turned it all in to run his restaurant. You can tell he’s thriving. I think he’s got the right idea.

andrepd · 2024-11-03T12:56:22 1730638582

Survivorship bias. In an ideal world yes but in reality there's bills to be paid and tech (generally) pays really well.

Not saying you shouldn't quit, just that it's not so simple.

josephg · 2024-11-03T14:41:42 1730644902

I hear you. But also, ... if you're literally feeling suicidal because of work, in a sense it really is that simple. You aren't doing anyone any favours - not your coworkers, your family or yourself - by living like that.

hackable_sand · 2024-11-03T16:07:44 1730650064

This is the answer.

Money is money, but money comes and goes.

The work you put into finding a healthy source of income is worth every minute.

hinkley · 2024-11-03T16:39:16 1730651956

There’s literally a 90% chance your restaurant won’t survive its first year.

josephg · 2024-11-03T22:15:45 1730672145

Then do something else! Literally billions of people are employed every day doing things that aren't software engineering. Pick anything.

isoprophlex · 2024-11-03T06:34:09 1730615649

Change something for the better. You are the one who cares the most, you are the one best suited to take control over your life.

You deserve to feel good. Life is too short to be a cog in a broken machine.

reverius42 · 2024-10-31T23:29:35 1730417375

> I've even degraded team morale because I've convinced some of the engineers that things should be better, but not management, so now some of the engineers are upset.

Oof, that hits a little close to home.

hinkley · 2024-11-03T17:50:35 1730656235

I have this illusion in my head that I stayed so long at my last company that almost all of my favorite people left, but one of my coworkers had my number.

After a person I liaised with on another team left, I asked his superior if there was someone else I should build bridges with. We started talking about one of the team members and he said, “I don’t want you to talk to him. We like him, and if you talk to him he’ll leave.”

This was on Slack so I don’t know if this was a jest or he was serious/mad. But it’s entirely true. I’ve convinced at least half a dozen people that we should expect better from a team environment and ourselves, and that this org (not the whole company, just this division) is a cult of stupidity.

I was trying to recruit collaborators to fix the bullshit but apparently they decided it would be much easier to just start over.

refulgentis · 2024-11-03T19:01:12 1730660472

There's some finer points to parse in your comment that I'm not 100% on (ex. if "this org" is your division or the partner division), so I'm out on a ledge a little bit here, might not relate to what you meant.

I was lucky enough to get ~6 years running my own tech company after 6 years as a waiter. Then I sold it, yadda yadda, went to Google 6 months later, got ~7 years in there.

It really, really, really, disturbed me how approximately every situation, in every division, with any people, ended up being boiling down to "how do we muddle through one more day without challenging anyones preconceptions", 95% of the time it was tribal antisocial stuff, and no one would speak up about it.

Direct example, for posterity.

I don't wanna speak too directly to it, so lets imagine Google Division A (hereafter, dApps).

New division lead (ex-dApps) joins dBytes with apparent bias against partnering division (dConsumer). Despite the project being previously framed as top priority, new lead consistently undermines dConsumer in meetings and shows little interest in understanding their work. Team adopts leader's negative attitude, becoming obstructive and uncooperative. I ended up carrying a critical launch, virtually alone, for 6 months. At performance review time, my boss questions why I didn't get more team involvement - despite the hostile environment that prevented exactly that - and speaks glowingly about how we need to support peer going for promotion based on their excellent job on part Y...which they didn't do. They spent 2 days on it then said it was impossible. And they were definitively the most cooperative because at least they tried, and wouldn't actively be aggressive in meetings with the outgroup.

Everything, always, came down to: A) don't cause conflict at all, at home, or you will be buried B) we'll bend over backwards to accomodate conflict you invent, as long as we can clearly define them as an out-group with 0 ability to affect us day to day.

hinkley · 2024-11-03T19:52:23 1730663543

At prior jobs we had an escape hatch for this: go to a fancy coffee shop with the dissenters and have all of our bitchfests out of earshot of the muggles.

But it’s trickier to coax people still on the fence to come out for multiple coffees.

ludicity · 2024-11-03T06:48:01 1730616481

The good news is that, since the others are also looking for work elsewhere, there will be more engineers out in gen pop that actually thinks tests are useful, hah.

tokinonagare · 2024-11-03T14:02:20 1730642540

> At two of the four businesses I've worked at, the most highly-performing engineers have resorted to something that I think of as Pain Zone navigation. It's the practice of never working unless pair programming [...] The fear and dread comes from a culture where people feel bad that they can't work quickly enough in the terrible codebase

Exactly why I burned-out at work, worked at most 2 hours per day on a good day and finally was ejected from the project after a PM that graduated last year from school noticed and went after my head. Author is a wizard for describing the situation this well.

It's been 3 days I've been free from the tyranny of Jira and project managers, and I worked more on my personal projets than I did in a week at my former workplace.

ctippett · 2024-10-31T22:42:24 1730414544

> of course, we're serverless, because how can you hurt yourself without a cutting-edge?

A beautiful epigram.

the_af · 2024-11-02T23:13:34 1730589214

On one hand, yes. Beautiful.

But on the other, the same sentence could be written about software deployed to traditional servers. "Because of course, how can you hurt yourself without the joys of badly configured servers?".

skydhash · 2024-11-03T00:07:27 1730592447

You can hurt yourself with a badly held butter knife, and you can hurt yourself juggling katanas. Which situation would get people saying you're crazy?

zdragnar · 2024-11-03T00:56:27 1730595387

Well, if you narrow down the metaphor to just knives, a dull knife is more dangerous to a chef than a sharp knife, because you need to apply more pressure and you get less control over the cutting action.

downut · 2024-11-03T14:50:12 1730645412

The people juggling aren't chefs.

the_af · 2024-11-05T13:11:39 1730812299

Dull knives are dangerous to most people, not just chefs. Most beginner's cooking books/lessons will tell you to keep your knives very sharp, because dull knives are dangerous (for the reasons explained by grandparent comment).

This affects amateurs just as much (if not more) as experienced chefs.

zdragnar · 2024-11-03T20:17:47 1730665067

Clearly, you haven't seen a good chef at work.

hinkley · 2024-11-03T16:42:12 1730652132

Yes, but which one will people say you’re crazy?

the_af · 2024-11-05T13:14:19 1730812459

Which of the two, "going serverless" or "managing your own servers" would you say is unequivocally like juggling katanas?

I don't think the analogy is very good, since juggling katanas is always a crazy idea, while choosing whether to go serverless or not is always a respectable discussion.

hackable_sand · 2024-11-03T16:08:56 1730650136

It's a pun.

the_af · 2024-11-05T13:06:19 1730811979

I understood the pun about the "cutting-edge" cutting you, I just went deeper than the joke to note many hurt themselves by not going serverless when they should have, and that server maintenance/configuration often becomes a mismanaged nightmare.

hackable_sand · 2024-11-11T16:52:08 1731343928

My bad

layoric · 2024-11-02T23:44:27 1730591067

Came here to praise the same sentence, well done author, gave me a good laugh!

storafrid · 2024-11-02T22:24:30 1730586270

Yes, but I'm surprised that they attribute "cutting-edge" to Lambda. It's about as old as Docker.

hinkley · 2024-11-03T16:45:01 1730652301

For a bank, Lambda is brand new.

You haven’t worked in conservative industries I take it? Late adopters, every one.

OP is still trying to replace Cobol. I know an insurance company that started that process 15 years ago. Fifteen years.

storafrid · 2024-11-04T13:43:41 1730727821

Do you mean figuratively that OP is replacing Cobol? Because I don't see that in the article. It mentions other technology that I would not associate with a super-conservative stack - like Databricks, JSON, Postgres and Google Analytics. So I'm a bit confused by your comment. And by all the downvotes, honestly.

I just pointed out that personally I would not consider Lambda - which has been a stable and popular technology for 10 years - to be cutting-edge. It's not old but also not cutting-edge imo. I would reserve that term to newer technology. Apparently a controversial view on HN, which is interesting.

To respond to your question, I did work for a bank in 2017 with moving certain burst-type processing to a set of Lambdas.

zdragnar · 2024-11-03T00:59:27 1730595567

I was part of a company that went all in on using lambdas for the majority of their web facing APIs. That was 7 years ago.

The cutting edge bit is a nice quip, though I agree not exactly accurate anymore.

stackskipton · 2024-11-03T15:58:35 1730649515

I worked for a company that went all in on Lambda as well. The knots they had to twist themselves into so that everything ran nice and smooth in Lambda environment was mindboggling. We have certain actions like orders that would pass through 8 Lambdas before completion because of execution time or just the big code base would result in 7 seconds start up time (node) so it would get broken down. If any of them failed, and it felt like failed a ton due to Amazon backend stuff, it was a nightmare to resolve.

All of it could probably been handled by larger node application in docker container somewhere but AUTO SCALING, FAILOVER, SERVERLESS!

Once I started as SRE for a new team, we built a larger monolith using Node and docker on EC2. We would get massive complements for our uptime and reliability but there were some architects extremely unhappy when I revealed in division presentation that it was just Docker + m4.xlarge running Ubuntu 18.04. When I left, more and more Lambdas were being broken down into docker running on EC2. They are probably on some container managed solution now.

sph · 2024-11-03T07:15:48 1730618148

It sounds like you like to deal with much sharper edges than I am comfortable with.

Or maybe I am too old with this shit. Still haven't found a use for "serverless" knives.

Aeolun · 2024-11-03T16:47:51 1730652471

It’s what everyone runs to when server based stuff would save them so much pain.

bartread · 2024-11-02T20:31:36 1730579496

I’m going to read the rest of this. I’m enjoying it. But, simultaneously, part II has me so triggered - it bears striking resemblance to repeated situations I’ve encountered where the meaning and content of columns in a relational database were overloaded in varying degrees of heaviness (which is a practice I absolutely detest) - that I need to take a short break.

hermitcrab · 2024-11-03T14:00:45 1730642445

I manage a database for a small local charity. I have set it up so that only I can add, delete or change the column structure. If someone wants a change, they have to email me and convince me (they are fine about this BTW). I'm sure the database would be an utter disaster zone by now if everyone was allowed to change it.

klysm · 2024-11-03T14:57:57 1730645877

I think database schemas deserve to be protected with one’s life as the holy ground of the system. If the schema is fucked, everything else will be fucked too.

chachacharge · 2024-11-04T18:59:25 1730746765

Schemas require domain knowledge. When domain knowledge is unclear or lacks ownership, it can lead to a range of issues that impact both data integrity and system functionality. Things that screw this up in the financial world include: working in different countries, acquiring new branches, new hires, and leavers. And people who think they can insist the database schema be protected somehow. A manager told me to add the last reason, it wasnt my idea and makes little sense.

hermitcrab · 2024-11-03T15:32:19 1730647939

With a database you can lock down the schema. In reality though, many data system are composed mainly of people emailing in Excel spreadsheets. Good luck enforcing any sort of schema there.

My day job is writing a desktop/file-based ETL system. I have just added in a schema version feature to cover these sort of issues. It was one of the most requested features, because most people aren't able to control the schemas of the data they receive.

klysm · 2024-11-04T04:05:11 1730693111

Yeah excel based data ingest is a pretty brutal problem to solve.

hermitcrab · 2024-11-04T13:03:11 1730725391

We can automatically handle some schema drift if columns are renamed or reordered, or columns added or deleted. But if they are both renamed AND re-ordered, you are out of luck!

bartread · 2024-11-05T23:59:18 1730851158

If you detect a level of drift that you can't handle, this is the perfect opportunity to delegate that bit of work to an LLM, if it's a problem that you deal with regularly enough to feel the cost of it to your business.

The latest generation of LLMs are pretty, actually very, good at this kind of situation, where somebody has renamed something - but kept some semblance of the meaning - and also moved it so a basic, or even a fuzzy, comparison might not be able to make a good match.

But a model like GPT-4o-mini will eat a problem like this for breakfast, and it's now incredibly cheap to use it for this kind of thing as well.

XorNot · 2024-11-03T00:15:06 1730592906

People do this with system hostnames a lot.

And it's almost impossible to get them to stop: the hostname should either be a random UUID or a random name from a pronounceable list depending on scale (or a syllabic UUID thing).

Because every other factor has one answer: you look up the other data you need in your CMDB. If that's too hard, you fix that so it's easy (DNS TXT records can be surprisingly useful here).

zombiwoof · 2024-11-02T19:55:50 1730577350

Data “engineering” is where all the cool kids go with no clue and create insane architectures to justify their incompetence

icedchai · 2024-11-03T01:19:29 1730596769

I've seen some "data engineering" scripts that were complete messes and beyond crazy. Some examples: Massively over engineered "pipelines" that process a few hundred rows a day, but somehow manage to take forever to run. Developers that didn't know SQL beyond "select * from table", so they do all their summarization in Python. Or, worse, I've seen a Python script calling a shell script calling R calling something else, several more layers deep, when the same result could've all been done in SQL with a few temporary tables.

Oh, then I'm asked to "give this a code review before so-and-so does a deployment tomorrow." Uh, it's a little late to address any of the fundamentals, but there are hard coded paths everywhere...

bob1029 · 2024-11-03T02:32:49 1730601169

I recently got a bit of a shocked reaction when I proposed to directly load daily files into temporary SQL tables and then use merge commands within the database to load the final tables. My use of code is essentially a shim between an SFTP client and SQL Server in this scenario. Maybe ~200 lines to connect, locate the files, run the bulk load operation, and then invoke the merge commands. Most of the fun bits are in the actual merge scripts.

Once your data is safely inside the database (temporary load tables or otherwise), there really isn't a good excuse for pulling it out and playing a bunch of circus tricks on it. Moving and transforming data within the RDBMS is infinitely more reliable than doing it with external tooling. Your ETL code should be entirely about getting the data safely into the RDBMS. It shouldn't even be responsible for testing new/deleted/modified records. You really want to use SQL for that.

You'll also be able to recruit more help if everything is neatly contained within the SQL tooling. In my scenario, business analysts can look at the merge commands and quickly iterate on the data pipeline if certain customers have weird quirks. They cannot do the same with some elaborate set of codebases, microservices, etc.

One specific thing that really sold me on this path was seeing how CTEs and views can make the T part of ETL 10000000x easier than even the fanciest code helpers like LINQ.

hinkley · 2024-11-03T16:53:12 1730652792

Except everyone wants microservices each with its own database.

392 · 2024-11-05T01:02:00 1730768520

If they can't explain why, they don't get the fun internet interview stuff

icedchai · 2024-11-05T15:56:23 1730822183

It's so much simpler to have a shared "integration" database. Until you have actual, real need for separate data sources, don't.

snidane · 2024-11-03T13:02:40 1730638960

The architecture is sound - typically called ELT these days. Dump contents of upstream straight into a database and apply stateless and deterministic operations to achieve the final result tables.

SQL server is where this breaks though. You'll get yelled by DBAs for bad db practices like storing wide text fields without casting them to varchar(32) or varchar(12), primary keys on strings or no indexes at all, and most importantly taking majority of storage on the db host for tbese raw dumps. SQL Server and any traditional database scales by adding machines, so you end up paying compute costs for your storage.

If you use a shared disk system with decoupled compute scaling from storage, then your system is the way to go. Ideally these days dump your files into a file storage like s3 and slap a table abstraction over it with some catalog and now you have 100x less storage costs and about 5-10x increased compute power with things like duckdb. Happy data engineering!

WorldMaker · 2024-11-04T17:51:10 1730742670

It amazes me how many DBAs think the limit on a varchar column impacts the disk space. The "on disk" size for `varchar(12)` and `varchar(32)` and `varchar(MAX)` are roughly the same and depends on the data itself more than the schema. That's what the "var" in "varchar" means: variable storage size. The limits like (32) were added for compatibility with `char` and for type-based "common sense" validation. Sure, it helps prevent footguns like accidental DDoS of ingesting too much data too quickly, but there are other ways to do that basic top-level validation of "is this too much data to insert?".

Five varchar(12) columns is more storage overhead than one varchar(60). There's a lot of great use cases for varchar(MAX) and everyone I ever had tell me that varchar(MAX) wasn't allowed didn't understand the internals of DB storage that they thought they did and somehow still believe in their internal model of the DB that varchar is just spicy char and fixed column size allocation.

icedchai · 2024-11-04T23:24:03 1730762643

With Postgres, we mostly just use `text` everywhere, unless there is an actual reason to have a size limit.

In other news, I haven't seen a dedicated "DBA" at a company in over a decade.

WorldMaker · 2024-11-05T14:33:38 1730817218

> With Postgres, we mostly just use `text` everywhere, unless there is an actual reason to have a size limit.

Yeah, there's still the very rare need to performance engineer out a fixed char field "to the left" of the table to speed up common table scans, but also so many of the reasons you might table scan strings have moved into proper full text search indexes or now all the rage is in vector embeddings.

> In other news, I haven't seen a dedicated "DBA" at a company in over a decade.

Yeah, anecdotally from LinkedIn and other sources it does seem like all the dedicated DBAs that have stayed that way have stuck to very specific niches and/or Oracle Products (including MySQL and derivatives these days; the "Oracle Effect" is strong). Especially in Amazon RDS and Azure SQL Server/Cosmos DB today, Postgres and Microsoft's SQL Server mostly run themselves and day-to-day administration is minor/trivial.

sevensor · 2024-11-04T01:17:06 1730683026

My experience with delta was that the catalog, being stored in s3 itself, was unacceptably slow, and for our data volume, Airflow was prohibitively expensive. We spent a lot of engineering time working around both problems. Which is funny because the consultants who advised us to do this told us it was the best possible solution; tailor made for our application, foolproof in every way. After that we proceeded to pay for their “data” “science” “services,” which went about as well as my scare quotes would suggest.

jamesblonde · 2024-11-03T15:21:54 1730647314

You're basically describing the Lakehouse Tables architecture. Store your data as tabular data in Iceberg/Hudi/Delta on S3. Save a bucket on storage. Query with whatever engine you like (Snowflake, Redshift, BQ, DuckDB, etc).

aoeusnth1 · 2024-11-03T15:59:23 1730649563

Yes, this is the vast majority of my data work at Google as well. Spanner + Files on disk (Placer) + distributed query engine (F1) which can read anything and everything (even google sheets) and join it all.

It’s amazingly productive and incredibly cheap to operate.

tgv · 2024-11-03T10:26:00 1730629560

Some of my colleagues use Microsoft PowerBI, and indeed, they upload a few hundred rows of data (and a few hundred columns, which get unpivoted in powerbi to a say 40k rows). When they upload it, the powerbi instance overloads, and people get timeouts and such. That can last up to 20 minutes. I stay away from that as far as I can.

Pxtl · 2024-11-04T15:23:56 1730733836

This is what bothers me with MS SQL related tools - they all seem horrendously brittle. Everything seems prone to deadlocks, has weird edge-cases, and incomplete coverage of the API of the next tool they're talking to so you keep having to break open the abstraction and manually tinker in the next level.

parpfish · 2024-11-02T21:21:40 1730582500

data engineer gives data science a run for their money when it comes to ambiguous job expectations.

I’ve seen it mean anything from “distributed computing expert” to “knows SQL”

sgarland · 2024-11-02T23:52:09 1730591529

> knows SQL

If by that you mean, “knows the commands to create, fill, and select from a table,” then yes. If you mean, “knows how to create a performant schema and queries that will serve them well into the future,” then no, absolutely not.

OTOH, IME data folk are much more cheerful and willing to change things than devs when I point out the innumerable ways their DB choices are choking them. Devs more often fall on the side of “we don’t have time for that on our roadmap; can’t you just fix it?”

zelphirkalt · 2024-11-04T15:21:11 1730733671

It is often not the devs telling you about the roadmap, but the management. Devs are more willing to fix things, if they are broken, but are not given the time.

marcosdumay · 2024-11-02T21:48:15 1730584095

> to “knows SQL”

There are still ways to go. I've seen it means "spends the days filling excel spreadsheets".

WorldMaker · 2024-11-04T17:54:21 1730742861

I've worked with people where "knows SQL" meant "knows the Access query builder UI, sort of, and demands the ability to query Production databases with the awful SQL auto-generated from the Access UI".

sevensor · 2024-11-04T01:19:24 1730683164

> “knows SQL”

Ah, clearly you’ve worked with a better class of data scientists than I.

the_af · 2024-11-02T23:08:21 1730588901

You can replace "data engineering" with "software engineering" in that sentence and it will still hold true...

fifilura · 2024-11-03T08:00:33 1730620833

Or chef or carpenter or automobile builder or priest or...

hobs · 2024-11-02T20:21:33 1730578893

As a data engineer I have seen absolutely bullshit pass for production, but it doesn't seem that different from all the other bullshit I have seen people deploy in my life.

It is one of the few types of jobs I have worked were someone credulously offering adding five more layers to fix an issue with latency is a normal operating procedure though.

halfcat · 2024-11-02T22:24:38 1730586278

What’s the solution to wrangling these data projects?

The author’s experience is not far off from my own.

1. Any solution in place can only be understood by the person who created it

2. ”No, we can’t change that because then we’d have to validate everything from scratch again”

And therefore, as the author says:

> ”we'll continue with the work instead of fixing the critical production error”

I’m honestly not sure how to address it either. With traditional software dev we’d write tests, incorporate those into CI/CD, and start to course correct. We can use sample data to validate the code does what we think it does and that we didn’t break it.

But in these data projects, it’s not only the code that’s changing, but the data is also a moving target. You can write a test with sample data, but tomorrow your data might change because someone in sales added a custom field to the CRM, or IT upgraded the accounting software and all of the unique IDs changed, or someone upgraded their Excel version, or whatever.

And your code that works on the sample data needs to handle all of this, which obviously it can’t. You can try to validate the data somehow, check the schema, check if the number of rows hasn’t doubled or halved, and so forth, and then stop it from importing until you look into it, but also you can’t stop inbound data because an exec has a meeting in a few hours and expects their report to be updated.

I heard something about “data contracts” that’s supposed to address this, but it sounds like the next in a long line of buzz words intended to get management to buy another data product.

Has anyone worked in this kind of project that went well?

ludicity · 2024-11-03T06:15:45 1730614545

Author here, and also executive director at Hermit Tech now where we do things like this. Your approach has the core of how I'd go about it. The contract stuff is legit, though you don't need to buy a product for it.

The thing that is hard at big organizations isn't that executives need the data for meetings. The issue right now is that many organizations are already 3-4 years into building their analytics platforms, and Chief Data Officers worldwide are trying to prevent their role from disappearing. They're already very much "Mom, we have CTO at home" in many companies, as evidenced by the fact they're usually reporting to the CTO or CFO.

So at this stage, they've already told the business that the platform is "ready", and they are onboarding data sources. With no way to measure data quality, the only thing visible at the organization level is number of data sources onboarded. The fastest way to onboard data sources is to have good CI/CD and a solid developer environment, but this would probably result in slowdown for 1 - 2 months even if you had executive backing to bulldoze all objections from the IT department.

That's the sort of thing I can commit my team to as a business owner, but most executives don't have the nerve to slow delivery down and aren't losing money out of their pocket due to the inefficiency - I get to talk with a lot of them due to the blog's success these days, and many of them really are just employees with more status, with the same incentives. And to make it worse, the loss of nerve is actually understandable, because the type of team that would build something this bad will also waste those two months then still deliver slowly! But most people aren't thinking in terms this complex, and yes, I know it isn't that complex.

I'm expecting to pick up some work in this area at larger orgs in a few years when these leaders rotate out and new leaders rotate in and go "what the hell IS this?", but for now we're mostly aimed at helping smaller places do it right from day one.

ewuhic · 2024-11-03T17:03:33 1730653413

Do you have a post describing the "right [way] from day one"? Spill your secret sauce.

ludicity · 2024-11-03T20:19:20 1730665160

The boring answer is that it's context dependent, but fundamentally "do data engineering the same way you do high-performance software engineering". Have tests that run fast, where fast means "a few seconds when you start" and "refactor as you go so the tests keep taking a tolerable amount of time". I think Kent Beck suggested 10 minutes in Extreme Programming Explained.

We're gradually forming our own, complex opinions in this. In the consulting context, this is essentially our product. A fascinating realization moving from software to marketing is that a sales pitch or marketing strategy can be built in a way that isn't entirely dissimilar to code, and that it has second-order effects. They aren't the same because... they aren't the same, but there's an artistry combined with principles to doing it "right".

And then as a consultant there's additional complexity, as each team is different. Some are high-performing and need a bit of an external jolt. Others need the help the most, but are in politicized environments so they're almost inaccessible until a new executive comes in who can admit there are problems (or indeed, even see that there are problems).

Joe Reis has some great stuff in Fundamentals of Data Engineering, which includes advice on early objectives when rolling out a new practice.

Disclaimer: Joe has hosted me on his podcast, and we are in the mutual-marketing whirlpool together. But I've been recommending his book long before I met him.

ewuhic · 2024-11-03T20:52:02 1730667122

Amazing answer, thank you, straight on point with tangible outcome.

Wish I had a LinkedIn and were an Executive, so I could connect with you.

3np · 2024-11-02T23:25:31 1730589931

"We should add a single-source-of truth validation microservice that we put in front of all insertions, putting failing messages on a separate queue"

And on it goes.

Liquix · 2024-11-02T23:41:12 1730590872

https://xkcd.com/927/

holden_nelson · 2024-11-03T00:18:06 1730593086

I went down the rabbit hole of this blog after reading this post. This person's blog is amazing. I particularly appreciated this piece: https://ludic.mataroa.blog/blog/quitting-my-job-for-the-way-...

sph · 2024-11-03T07:35:42 1730619342

"and still you will lay upon your death bed thinking that you may fend off the Reaper if you could but Estimate how long dying will take!"

What would I give to be able to compose sentences like this one.

sevensor · 2024-11-04T01:20:34 1730683234

My man has a podcast as well, which is worth a listen.

ludicity · 2024-11-04T01:32:09 1730683929

xoxo thanks! Although editing audio is one of my daily humility reminders. It's touuuugh.

tofflos · 2024-11-03T08:35:40 1730622940

> Like why didn't anyone catch the issue with the logs?

I see questions like these a lot and every time I feel that people immensely underestimate the effort required for curating data. In my experience data can only ever be as good as what it's being used for and in this story the logs haven't been used for this purpose before so they're not going to be any good.

It's some sort of data variation on the second law of thermodynamics - entropy is winning. Going in with the expectation that things should be better will only lead to frustration.

epgui · 2024-11-03T11:25:39 1730633139

This is not a data curation issue though, it’s a basic o11y issue.

andrewflnr · 2024-11-03T17:12:50 1730653970

> o11y

Seems to be "observability", for anyone else seeing that for the first time.

downrightmike · 2024-11-03T19:51:37 1730663497

It pops up every 8 months or so for me

jauntywundrkind · 2024-11-02T21:52:02 1730584322

The observability world still regards itself as a system for monitoring, but reading (and sometimes seeing) how these systems just go so bad continues to drive a conviction that perhaps their strategies and tools should become bigger. That they should converge with business pipines.

We shouldn't just have wide events/big spans emitted... We should have those spans drive the pipeline. Rather than observability being a passive monitoring system, if we write code that reacts to events we are capturing, then we shuffle towards event sourcing.

Given how badly coupled together with shoestring glue & good wishes so many systems are, how opaque these pain zones are, it feels like the centralization upon existing industry standard protocols to capture events (which imo include traces) is a clear win.

(Obvious downside, these systems become mission critical, business process & monitoring both.)

halfcat · 2024-11-02T22:35:24 1730586924

What’s this look like in practice? Is this something like business process modeling and workflow engines, or something else?

edejong · 2024-11-03T13:27:44 1730640464

Totally agree. Observability is just another dataset and should be modeled, managed and governed as other datasets. Data quality controls should be equal or of higher standard than regular data sets.

Monitoring, dashboarding and alerting should leverage other BI-class tooling.

brianhorakh · 2024-11-02T19:45:00 1730576700

Wonderful prose. I am in Melbourne also. I possibly used to work at the same place but I'm not sure.

I resigned due to the night terrors caused by the cyber security issues I saw everywhere. The more I explored and understood the more sleep I lost.

ludicity · 2024-11-03T06:49:53 1730616593

Hah, yes, I'm not in cybersecurity but am very close to a few people that are. The incompetence is not evenly distributed and not as bad as it is in data, but some companies are in terrible states, and the stakes are much higher.

hinkley · 2024-11-03T18:28:58 1730658538

I worked with an Aussie who was in the US on H1B because the aerospace industry was even worse than the status quo in Australia. Last I heard he went back. I sincerely hope he switched industry verticals.

pards · 2024-11-03T12:43:28 1730637808

> The word enterprise means that we do this in a way that makes people say "Dear God, why would anyone ever design it that way?"

Thank you for this phrase; I'll quote it at every opportunity.

hinkley · 2024-11-03T18:34:15 1730658855

I’m working on an SDLC app that will end up with inspirational sayings as interstitials once I run out of bigger features/desperately want to procrastinate.

I’ll stick heavy hitters from Goldratt, Fowler, Feynman et al in there, but there’s going to be a “dark humor” and “snarky” category and this will definitely go into one of them, along with some Ambrose Bierce.

salt-thrower · 2024-10-31T23:06:21 1730415981

Beautifully written and fun to read. Blog posts like this give me a boost of mental strength to keep going during my worst episodes of burnout.

hinkley · 2024-11-03T16:26:51 1730651211

> The word enterprise means that we do this in a way that makes people say "Dear God, why would anyone ever design it that way?"

I feel this comment in my bones.

schnatterer · 2024-11-04T17:54:55 1730742895

My search for a generic definition of the term enterprise finally comes to an end :D

pxc · 2024-11-02T22:43:37 1730587417

This blog post rescheduled all my appointments, tucked me in, sang me a lullaby, then woke me up with coffee and breakfast late the next morning. I am healed.

For real, a fun and refreshing read (if also a little haunting).

Muromec · 2024-11-02T23:48:17 1730591297

Great piece of writing from someone who truly cares about craft and suffers from the feeling that this craft is not what they are paid for.

Add: for people who sharer the feeling -- you can work in a place where velocity isn't all, managers are not assholes and you can dedicate yourself to craft.

hinkley · 2024-11-03T17:36:00 1730655360

This is the “I saved my company half a million dollars in about five minutes” person who got in trouble for saving them half a million dollars.

ludicity · 2024-11-03T19:47:45 1730663265

You are the only person in months that has called me anything other than "the AI rant guy" and I just wanted to say I appreciate you. <3

mildzebrataste · 2024-11-04T21:35:32 1730756132

This snippet is so relatable, so great:

"…what they actually needed to do was fire most of the staff in every team, leaving behind the two people who actually had good domain knowledge, then allow them to collaborate with good engineering teams to build sensible processes and systems.

Instead, they hired a bunch of Big Firm Consultants. You can see where this is going already."

I’m on month ten of my tech sabbatical and it’s been great. I’m no closer to wanting to return to the industry (former FAANG data jockey), not in a position where I can never go back but am in that middle-state where I want to use tech to contribute to the betterment of society, the community, nature and sustainability. I have a couple more months to try and figure out how.

sampo · 2024-11-03T21:07:32 1730668052

The "AI rant": https://news.ycombinator.com/item?id=40725329

hinkley · 2024-11-03T19:54:49 1730663689

I am an AI agnostic so even if I remembered that was you too, I’d only say it to our tribe.

I came into the industry exactly as a previous AI Trough of Disillusionment was in full swing so I’ve been vaccinated.

jitl · 2024-11-02T20:19:50 1730578790

I wonder what company they’re describing here. It sounds like so many self inflicted problems that that you could undo or set right in a couple of weeks if you had the time and latitude to make changes across the system instead of being confined to a small area of team ownership.

bartread · 2024-11-02T20:35:34 1730579734

I worked somewhere that had a lot of this sort of thing going on once. You cannot overestimate how hard it is to get anything done: politics and organisational dysfunction, not to mention that you probably don’t have access to half of what you need to in order to fix any given problem and are even more unlikely to be able to get it, mean there are just huge scads of problems that, on the face of them, look relatively straightforward to solve but which, in practice, are organisationally impossible to solve.

deergomoo · 2024-11-03T10:40:17 1730630417

> you probably don’t have access to half of what you need to in order to fix any given problem

I’ve found this to be one of my largest day-to-day problems even in a relatively functional organisation. Particularly when it involves something I can’t run on my own machine, like an AWS service.

In a previous role I often found myself constructing elaborate hypotheses about what was going on inside systems I couldn’t see into. I’d then need to try to verify it with someone on another team, in another timezone, who had the access but not necessarily the development background. Which usually meant getting on a screen share and asking them to click various things I wasn’t allowed to. If I was wrong, back to the drawing board and start again.

whstl · 2024-11-02T21:21:22 1730582482

Yep. I don't work with data engineering, but from hearing war stories from them, this could perfectly describe the last four companies I worked at. :/

> you probably don’t have access to half of what you need to in order to fix any given problem and are even more unlikely to be able to get it

I once sat down with a data engineer to try to fix a specific problem they had and that was 100% accurate. They were left to die by Ops and CISO.

Terr_ · 2024-11-03T05:11:14 1730610674

A framing/question I like to ask use is: "Look for a root problem that ought to be fixed through a change in policy, politics, or incentives, and the wasteful use of time/money is how the company tries to avoid or defer facing it."

For example, Operations might demand that Engineering develops an increasingly-byzantine approvals process, to stop Sales from over-promising impossible or unprofitable projects.

3eb7988a1663 · 2024-11-03T01:38:27 1730597907

I inherited a pipeline like this. It is as if everything is a global variable. You cannot "just fix" one thing in isolation, because some spooky action at a distance of which you were unaware relies upon this insane behavior. Each and every hack is the expected input somewhere else in the chain. You have to carefully inspect everything downstream of any kind of minor adjustment because your cleanup is quite likely to break something else.

Immensely frustrating and draining where you can have accomplished ~nothing in a full day of work to fix what should have been a five minute change.

ludicity · 2024-11-03T06:04:00 1730613840

I'm the author. You are exactly correct. Everything was so heavily interwoven that it was impossible to tell what would happen downstream without making an edit and then tracking the changes through dozens of steps through the architecture diagram.

baq · 2024-11-03T07:33:15 1730619195

`The purpose of the system is what it does.’