Hacker News new | past | comments | ask | show | jobs | submit login
I don’t know how to count that low (lesswrong.com)
309 points by CapitalistCartr on Oct 25, 2021 | hide | past | favorite | 261 comments



I am certainly guilty of this problem. I've pretty much hated MS Word for about 10 years now, more or less since I started being an engineer for a living. I hated its equation editor, I hated how the documents ended up looking, I hated the damn spacing between the letters, every time I used it I always felt like my documents ended up looking like ransom letters.

I discovered LaTeX + Pandoc, and it created documents that I thought looked much nicer, and I could use trusty Vim to work with it, so it felt perfect, and I subsequently made fun of people who insisted on using Word. After about two years of doing this, my dad eventually snapped back saying "Tom, we're all really sorry we didn't become software engineers, I guess we'll all quit our jobs and go back to college," which (rightly) made me feel like an asshole.

Obviously this stuff was easy to me, I live and breath in text editors and in plain text, but I had to realize that not everyone is trying to create publisher-worthy documents. They want a document that looks good enough to share with some coworkers, or print out for a school report, and for that MS Word will work fine for most use cases. They don't really want to sit there and learn the minutia of how to compile with `pdflatex` and dissect its weird error messages. They didn't really feel the need to finagle with begin and end tags, and that's totally fine, people should use the right tool for the right job.

After that I would still recommend LaTeX (particularly if the document required equations), but I became substantially less pushy about it.


A nobel prize winner in physics, who I had an office across the hall from, once asked me ~10 years ago if I knew how to write ħ in Microsoft word.

Unfortunately, I did not, and I was unable to help him. I was glad he asked me though. I probably should have tried what I just did though, google it, and copy+paste.


On my desktop computer that runs Linux, and where I mostly type code and write in English, I use US English as the input and a Dvorak-based layout in hardware that I made for my programmable keyboard.

Since I very rarely type in my native tongue on that computer, and I annoyingly frequently have had to reinstall the OS from scratch, I no longer bother with customizing xmodmap or similar things on that computer.

So the few times when I have to type æ, ø or å on that computer, I just Google aelig, oslash or aring respectively and I go to the Wikipedia article about the letter or some other page from the results that has the letter on it and copy it from there.

Works well enough given how rarely I type any of those letters on that computer.


With the Compose Key, which is a feature of all linux distributions:

compose + a + e = æ

compose + o + / = ø

compose + a + o = å

https://en.wikipedia.org/wiki/Compose_key

Something I find absolutely criminal is that this hasn't taken off in a broader sense, and that most users memorize the unicode terms for characters like ©, rather than just typing a simple, eminently memorable sequence like (compose + o + c)


Vim uses something similar to this which it calls digraphs (see :help digraph). http://vimdoc.sourceforge.net/htmldoc/digraph.html

Ctrl+k a e = æ


TIL. Thanks. I love how I keep learning new things about vim after years of daily usage.


You can also add your own digraphs. I've used this to make entering emojis easier, for eg.

    execute 'digraph :) 128578'
so that Ctrl-K :) in insert mode inserts . (`execute 'digraph :\| 128528'` => , `execute 'digraph /\ 128591' ` => , etc.) Edit: I guess HN strips emojis from comments? Just try these and find out which ones they are, I guess!

This was before I knew that the Julia plugin allows emoji entry too (\:<Tab>), but I still find this more convenient for the common ones because the names of emoji are all over the place.


Yes, I was so sad when I "upgraded" from Atari to Windows and I could no longer do this. That it's still not standard is a mystery.

It wasn't standard on Atari either but that it even was a thing 30 odd years ago shows yet again that we're not always as innovative as we think.


You might find http://wincompose.info/ helpful: it adds this functionality to Windows.


Thankyou! I wonder if there is some patent stopping this from being included in windows by default...


Microsoft Word does have compose, sort of(?). control+apostrophe followed by e results in é, for instance. I have no idea why it isn’t OS wide. It works some other places, but not all.


Windows most everywhere:

Hold ALT while typing the Extended ASCII number for a character on the numpad (not the row of numbers above the letters) results in that char.

For é: Hold Alt while typing 130 on numpad.

TIL, apparently this can be extended to unicode too.

https://superuser.com/questions/1024948/windows-10-alt-code-...


Yes, but learning numbers isn't realy comparable to be able to just guess the compose buttons. o¨ makes ö, ao makes å. If you want the copyright sign you type o+c.


True. Different compromises on capability vs usability.


Relevant excerpt for those wondering what "every Linux distro" means more specifically (xorg):

> On Xorg the default Compose Key is ⇧ Shift+AltGr [...] The X keyboard driver does not allow the key used for Compose to also function as a modifier.

Which might explain why my altgr doesn't seem to work when I try it since it's mapped to mod in i3.


Wayland uses XKB, as does everything else under the sun. XKB is the layer that implements the Compose Key (Actually, you need to configure it via XKB because other methods like ibus do not bother reading ~/.XCompose which allows you to set your own custom compose sequences)


A feature of only some keyboard layouts though? Ordinary US layout doesn't seem to have a compose key mapped by default.


It takes two seconds to map using Gnome Tweaks or something else (Usually XModMap). I map it to right alt which is usually AltGr and not used anyway, and it performs superior to AltGr


Works for Anglophones. For the rest of us, who have some keys on the right (near the Enter key) occupied by Å, Ä, Ö or similar, we need Alt-Gr to get curly and square brackets, backslash, etc.


I was thinking they meant "in Emacs" or "on a Macintosh" or something.


It's also worth giving your platform's character map a try if you haven't already. All the major desktop environments have one -- KDE, Gnome, Windows, macOS. Sometimes searching online is still faster, sometimes the character map app is faster.


> Since I very rarely type in my native tongue on that computer, and I annoyingly frequently have had to reinstall the OS from scratch, I no longer bother with customizing xmodmap or similar things on that computer.

A backup of your home fir, (or at least all dotfiles in it), etcd running on /etc (or $HOME too), and a list of packages that were installed might go a long way towards making setup after reinstall fairly trivial, and you might no longer feel like some customizations aren't worth it.


Another useful feature is some part of my environment (I've used it in Gnome 3 on both Fedora and Arch, but it doesn't work in RHEL; I'm honestly not sure what part of my stack is providing the feature) lets me pres Ctrl+Shift+U and enter arbitrary unicode. I've memorized the hexadecimal representation for a few common symbols and can enter them into basically any text field.


> I'm honestly not sure what part of my stack is providing the feature

It's Gtk. Gtk applications on Windows and macOS also use the same keybind to enter characters by Unicode code point. (It's kind of a pain if you use a lot of cross-platform applications, because you end up having to think a bit about what GUI toolkit the application is using before you can type a special character. “Does this one do the native thing or something else?”)


You could use US Intl. keyboard layout and use RAlt+z/l/w for æ/ø/å. If you have a QMK Firmware keyboard you could even reprogram it to use Alt + '/;/[ to send these keys and you'll have æ/ø/å in the usual position.


I'm almost in the same boat, though I use emacs' language-specific input methods for longer texts. The postfix input methods especially work great with Dvorak.


> and I annoyingly frequently have had to reinstall the OS from scratch

Is there a story behind that?


One of the reasons is that the computer in question has a problem where suddenly everything completely freezes, and I have to forcibly reboot the computer with the hardware reset button. And sometimes after rebooting I find that files on the file system have become corrupted.

Sometimes the machine will run for days without freezing up. Sometimes it can freeze up after just a couple of hours.

Sometimes I am browsing the web when it happens. Sometimes I am writing code and compiling stuff when it happens.

I think the longest uptime I have had on it after the problem started without it freezing has been maybe 8 days. It’s been like this for a long while. A couple of years at least.

And when it freezes it’s not just the UI. It even stops responding to ping.

A few months back someone suggested that it may be due to a faulty PSU. But even after changing the PSU to a new one, it still happens.

And it’s been happening both with different Linux distros and with FreeBSD.

My hardware is:

- MSI B350 Tomahawk motherboard

- AMD Ryzen 7 1700 CPU

- MSI GeForce GTX 1060 6GB graphics card

- Crucial DDR4 2133Mhz 32GB ECC (2x16GB) CT2K16G4WFD8213 RAM

I bought those components in March of 2018, according to the order history in my account on the store where I bought it.

I don’t remember if I ever did run a memory test, but I will do so again anyways soon because I am suspecting that faulty RAM may be the reason. It just takes a long time to run a memory test on 32 GB of RAM and so whenever I think about needing to run it I don’t have time to do so at that moment. But as I am typing this I realize that I’ve got to go ahead and actually find time for running it soon.

Another couple of ideas I’ve had is that maybe the graphics card or the nvidia drivers are to blame. Or that I may have damaged some component with static electricity at some point. Or maybe it’s a thermal issue.

I most recently did a fresh install earlier today, and at the same time I switched out the SSD that was in it for another one. As far as I can remember though, I think I was having this kind of problem even before I was using the SSD that I switched out today. But I am not sure about that. So it could be that the SSD was the problem. But it may be a few days until it freezes again if it’s not the reason. Or it could happen in some hours.

I’ve also tried to look for clues in /var/log/syslog but haven’t really known what to look for specifically. And since even pinging the machine stops working when it freezes I am not sure the system would even be able to write to any logs at the moment that it freezes, since it seems that at that point pretty much everything has stopped working.

Currently I am running tail -F /var/log/syslog in a terminal in the hopes of either seeing something relevant frozen on screen, or in the surrounding output when I reboot the machine since I will then know which line of output I should begin looking in the vicinity of.

But yeah, a memory test is the next thing I plan to do after that.


I *had* to get myself out of lurking mode to reply specifically to you; this issue seems widespread for 1st-gens Ryzen. I see your chipset also is close enough to mine (X370), and I felt a strong "déjà vu" by reading your freezing symptoms.

I reused my now old X370 Ryzen build to run TrueNAS Scale (Based on Debian), and have hard lockups like yours.

My personal notes on the subject seems to stabilize things a bit but not completely, and it's a mixture of BIOS Settings Tweaks and Kernel boot parameters that seems to help partially. Things I tried/applied with varying degree of success:

- Disabling Cool&Quiet

- Disabling C-States

- Gear Down Mode: Disabled

- Power Down Mode: Disabled

- VDDSCR_SOC: Offset by +0.00625v (seemed to stabilize things on Windows)

- Someone in the kernel bugreport mentionned the need to power off (as opposed to just reset) so all the BIOS Settings are applied correctly (didn't try it myself yet)

See those links for more infos:

- https://bugzilla.kernel.org/show_bug.cgi?id=196683 (a very long bugreport thread, people commented lots of things they tried to stabilize their build along with kernel parameters ideas)

- https://gist.github.com/diracs-delta/876d74d030f80dc899fc58a...

- https://web.archive.org/web/20201020144021/https://www.truen... (linked from Archive.org as TrueNAS WAS specifically mentionning Ryzen stability in the first paragraphs of this page)

Good luck; and if you ever found how to get rid completely of those freezes, let me know :)

(edit: formatting)


Thank you for emerging from lurk mode for me :) I will try those things.


I found some of the settings but not all of them, but here’s the ones I found, and changed now:

- Global C-state Control: Auto -> Disabled

- AMD Cool’n’Quiet: Auto -> Disabled

- CPU NB/SoC Voltage: Auto -> Offset Mode; CPU NB/SoC Offset Mode Mark: +; CPU NB/SoC Offset Voltage: Auto

(The offset value can only be auto with my machine, not a custom value it seems.)

Only ones I couldn’t find were Gear Down Mode and Power Down Mode.

Clicked save and exit in the UEFI.

Then I powered down the machine and even flipped the on-off button of the PSU to off and let it stay off for ~20 seconds for good measure. Then turned the PSU back on and then powered the machine back on.

Currently reading the bug report thread and will try some of those things as well.


Now I've read a bit of those links and also read a bit of the following other links:

- https://utcc.utoronto.ca/~cks/space/blog/linux/KernelRcuNocb...

- https://access.redhat.com/documentation/en-us/red_hat_enterp...

- https://help.ubuntu.com/community/Grub2/Setup

And I've changed the following line in my /etc/default/grub from:

    GRUB_CMDLINE_LINUX=""
to

    GRUB_CMDLINE_LINUX="rcu_nocbs=0-15 processor.max_cstate=5"
since my CPU has 16 threads. And I've saved it and have run

    sudo update-grub
Now I'm about to reboot the computer and then hopefully it will be more stable from now on :)

Thanks again for the help bilange.


Having now turned the computer back on I've also confirmed that these flags are indeed now being passed to the kernel when it is booted, as seen in the output of

    cat /proc/cmdline
which shows the following:

    BOOT_IMAGE=/boot/vmlinuz-5.11.0-38-generic root=UUID=4dcba509-efff-4ccc-a099-f919240c767c ro rcu_nocbs=0-15 processor.max_cstate=5 quiet splash vt.handoff=7
And that's the "rcu_nocbs=0-15 processor.max_cstate=5" we added to our GRUB2 config shown right inside of there.


> It just takes a long time to run a memory test on 32 GB of RAM

Maybe run memtest while you sleep? :)

> Or maybe it’s a thermal issue.

I'm no expert on thermal issues by any means, but I did have a laptop that overheated frequently in the past. It was very easy to identify that it was a heat problem, because the fans would start screaming and then it would reliably overheat and power itself off anytime I tried to do anything CPU-intensive with it. Sticking it in a freezer "solved" the problem.

Anyway, given that your freezes are completely random, it doesn't sound like a thermal issue?


> Maybe run memtest while you sleep? :)

Yup, did that tonight. The test is finished now and it found zero errors. The test took 7 hours and 56 minutes to run. I used a bootable USB stick with MemTest86 v9.3 Free to test it.


Some ideas just in case...

- Even if the RAM modules are not faulty, ECC is not officially supported by your CPU and motherboard, right? So this could be a problem... They might also not be correctly configured, so check the UEFI settings.

- Regarding the GPU, you could try lowering your PCIe mode to a slower one, especially if you are using a riser cable.

- I have had overly bent SATA cables cause complete freezes every few hours on a brand new PC. I luckily got the hint quickly because the HDD LED was staying lit.

- What about the quality of the power going into your PSU?

- Have you tried rebooting more gracefully with the SysRq commands?


> Even if the RAM modules are not faulty, ECC is not officially supported by your CPU and motherboard, right?

Correct.

> So this could be a problem... They might also not be correctly configured, so check the UEFI settings.

Most of the settings in UEFI that I can find about RAM are related to overclocking, and I am not using those features.

But it may well be as you say that using ECC RAM with a non-ECC motherboard could cause problems.

> I have had overly bent SATA cables cause complete freezes every few hours on a brand new PC. I luckily got the hint quickly because the HDD LED was staying lit.

My SATA cables are cheap ones I bought off of eBay. I may try and replace them.

> What about the quality of the power going into your PSU?

The freezing has happened in all of the different places I have lived during this time, so I don’t so. Unless the power cord could be doing something, but the power cord looks fine on the outside at least.

> Have you tried rebooting more gracefully with the SysRq commands?

Going to try that in the future.

Thank you.


> > Even if the RAM modules are not faulty, ECC is not officially supported by your CPU and motherboard, right?

> Correct.

I should also add to this for the record, that my mistake here was that when I bought the system, I checked to see that Ryzen 7 1700 supports ECC RAM, but that I was unaware that the motherboard also needed to support it in order for ECC to work.

Someone in a sibling comment pointed me to some links and gave me some info about some UEFI settings to try along with some kernel boot arguments and I've applied these. Hopefully it will be enough to take care of the issue, or at least to improve the stability a bit. But if the system continues to be unstable then I will probably try and save up a bit of money to replace the RAM that I have with non-ECC RAM instead, or alternatively that I save up money for a new motherboard which supports both ECC RAM and my current CPU as well as being able to host a newer generation of Ryzen CPU that I can then buy further into the future.

Also, I went to an electronics store today after reading your comment and bought a new SATA cable that I now use instead of the one that I had.


A Turing award winner needed help during a computational complexity Confrence (technically it was Karps birthday a day before the Confrence started). It was Donald Knuth and I figured out that changing pdf viewers fixed the problem of auto forwarding (instead of acrobat I used preview.app). He also wrote TeX.

Video here https://youtu.be/pP2D4DEQBTU?t=146


You typed U+0127, but he probably meant U+210F. Anyway, the answer is to either go the Insert > Symbol route or to type 210F in Word and then press Alt+X, which works since Office XP (released in 2001). Maybe this wasn’t as googlable 10 years ago. An arguably better way nowadays is to install WinCompose.


You are totally right.

We both knew it was \hbar in latex but could not figure out how to get there on Windows. He was collaborating with somebody on a paper and was not used to having to do so with windows/word (he had a RHEL 5 desktop).

He was also 83.


You always have the trusty character map application available on Windows!

Great to find characters you don‘t even know the name of.


Yet everybody seems to have the time and inclination to manage list numbering in Word, that doesn't work at all, does not allow direct manipulation, and has all the relevant information hidden. It's just that people don't mind losing months of productivity in a couple of years if it means saving a couple of full days of learning. (A few times this even is the correct choice...)

Anyway, I'll keep pushing for Libre Office. Word is a complete bullshit of a tool. If people's work involves numbering every odd paragraph into a list, they are probably better with LaTeX, but they won't learn it anyway. Yet, Writer simply works.


> Yet, Writer simply works.

Except that today i literally ran into the problem of Writer crashing whenever an image in the page was resized. There was a table that was spanning multiple pages and my dad was attempting to add pictures to it, along with small text comments below each picture. And, without fail, that particular document crashed whenever you'd use the drag tool to attempt to resize the picture, so that instead of overflowing into a new page, it'd fit within the current one.

Now, i didn't have the time to debug further (Windows 10, Writer 7), but one cannot say that Writer is bulletproof either, even if it's perhaps one of the very few office suites out there that are probably worthy of most people's time and present a usable alternative to the proprietary .docx and other file formats.

Honestly, at this point i'm convinced that Writer is okay for when you absolutely need a more advanced text processor, but for every other sort of note, shorter pieces of writing etc., Markdown is more than enough. Now i mostly just reach for Visual Studio Code, Notepad++ or even just nano when i want to take notes, write some fiction or anything of the sort.

Plus, i can currently combine all of the notes from my driver's ed by simply concatenating the .md files, as well as searching becomes infinitely more easy when compared to binary file formats and such.

Edit: Also, if we're talking about annoying cases of LibreOffice breaking, here's an example of references (bibliography) failing to work when i had to hand in my Master's thesis: https://blog.kronis.dev/everything%20is%20broken/libreoffice... (the tone expresses my frustration at that time)


> but for every other sort of note, shorter pieces of writing etc., Markdown is more than enough

Oh, yeah, not go adding formating that one doesn't need is a great way to avoid a lot of pain.

Markdown if more than enough for nearly all the writing people do. (But it's not enough for all the writing that most people do.)


> Word is a complete bullshit of a tool.

Maybe true, but Libre Office is not really any better, just free and open source. It's also nowhere near as familiar and well supported, and that's worth a lot.


more people are probably hit by the list managing mess, but the one that got me recently was a blue line of some sort. Some margin or page break or something. My attorney asked if I could help him remove the blue line from the middle of the page he was typing out on a contract we were working on. For the life of me, I was unable to remove the blue line in <10min of effort. I had to say sorry, talk to your tech guy.


My solution for these problems (kind of like turn off/turn on): Create a new word document, and copy-paste stuff in front and stuff after to the new document. Sometimes this works when normal editing doesn’t.


Our solution to anything like this has always been copy to notepad and back again. Old people knows exactly what is meant but people 25 and younger usually don't understand why or how. Seems to be a generational thing. Lately even notepad preserves utf8 so the method is starting to break together :)


In Notepad, you can save the file in a simpler character set (save as).


The Show Styles, Style Inspector and Reveal Formatting panes all help with stuff like this. The last one should give a pretty clear hint to where the line is defined.


For short, quick-turnaround documents that get exchanged and redlined by multiple non-technical types (e.g., contracts): Microsoft Word, because it's the global standard and the network effect is in full force.

For books and similar long docs: Definitely LaTeX (preferably with Emacs org-mode), because Word has a way of f[oul]ing up automatic cross-references even in short documents; pandoc can then turn the LaTeX into whatever other format you want.


> For short, quick-turnaround documents that get exchanged and redlined by multiple non-technical types (e.g., contracts): Microsoft Word, because it's the global standard and the network effect is in full force.

I agree for most people, though I feel like I'm pretty quick with Markdown + Pandoc -> LaTeX.

Definitely agree with the citations stuff though; Pandoc's BibTeX support is generally good, and I've had substantially fewer headaches with it than trying to get the reference stuff in MSWord to work.


> For books and similar long docs: ...

I've worked in the book publishing industry and I've typeset books both using QuarkXPress (now everybody moved on to InDesign) and LaTeX. Using Word to typeset a book is a major PITA. Nobody in the professional book publishing industry is using Word to typeset books (well, ok, maybe 0.1% of masochists out there).

Word is convenient to quickly edit crappy looking documents: it's a document editor. It's not a typesetting program.


> For books and similar long docs

Weirdly enough I know someone with a book published by a big programming book company, and they accept manuscripts in Word only.


This is likely unrelated to typesetting: it’s for the track changes feature. The final files that go to the printer are probably not produced by word.


Oh yeah that's for sure. And since they have a house style for everything it'd be a bit silly to faff about making your \lstset look just so when the printer would need to re-set (reset?) all of it.


I think even for casual users LyX (a LaTeX editor) is better than Word for documents with simple layouts or many equations.

Word is really complicated to understand and often makes formatting more difficult than it needs to be. So this doesn't seem like an elitist thing.


Sure, but even if you're technically right, for Word there's a million tutorials and YouTube videos if you get stuck, not to mention that basically every office seems to have "that MS Office person" who seemingly knows every single trick imaginable in Word. Even if Word is more complicated, it's relatively cheap and easy to unblock yourself. I've not used LyX, but I suspect that it doesn't have nearly the same level of domain knowledge in the corporate world, artificially making it harder to learn.

Is that fair? No, not at all, but that just seems to be how it is.


> I think even for casual users LyX (a LaTeX editor) is better than Word for documents with simple layouts or many equations.

That really depends on what you mean by "casual users". It's a lot of overhead for someone who isn't familiar with LaTeX already. I've tried showing this to students, and it's a much heavier lift than MS equation editor for limited tasks. Of course for professional typesetting or many complex equations LyX will win, but truly "casual" use it isn't close IMO.


Well in college I only used LyX and I never knew LaTeX and I never found it easier to learn Word's equation editor.


When I first got into LaTeX I tried LyX. Horrible, clunky experience. I never recommend it to anyone, instead I recommend just using VSCodium and associated plugins, its a far smoother experience


The word users may think they're more productive. And they might be right. I strategically avoid that question because I don't want it to drive my choice of tools.


> I've pretty much hated MS Word

Do you mean the app, the web or the Teams version? Or is it the way each allows different things, blocks certain actions or displays differently?

It is incredible that this mess is the industry standard.


The thing is, the vast majority of written communications are just fine in Word. And Word is a lot easier to integrate into other workflows than is LaTeX (and this is from someone who's finishing a book on using LaTeX right now). There are plenty of times that I wish for the kind of control that I have in LaTeX when using other platforms, but for just plain writing, I have to confess that Word is far more ergonomic and I haven't opened a LaTeX document to write fiction in for more than twenty years.


How do you feel about Markdown? It sometimes has issues, but generally speaking I feel like Markdown converted to LaTeX looks pretty good, and I think that Markdown is usually fairly pleasant to work with.


Markdown is designed to be simple and minimalist. It works well for the specific set of things it is designed to do, but once you need anything else - images, headers/footers, color, page numbers, and so on - it collapses entirely.


> Markdown is designed to be simple and minimalist.

Fair enough.

> It works well for the specific set of things it is designed to do, but once you need anything else - images,

It's never been particularly hard for me to add images in markdown when rendering with Pandoc. It's fairly easy to add images and adjust their size and whatnot, not 100% what you're referring to.

> headers/footers,

Yeah, I'll give you that one, you gotta muck with the preamble YAML in pandoc to get headers, and that's some irritating LaTeX.

> Color

Also fair, not sure it's possible to have color in Markdown->PDF with Pandoc.

> page numbers,

Page numbers are output by default in Pandoc with Markdown. You can also add an optional Table of Contents if needed. Not sure where that rumor started.


Often you want page numbers so that you can refer to them later in the text (“See Figure 1 on page n.”) I think at one point I knew how to do that in markdown but IIRC it was only an option in pandoc, and 2) it only worked by printing to pdf via LaTeX, and 3) I may have needed to implement it in LaTeX anyway.

“Pandoc lets you achieve X by embedding latex in your markdown file” is a lot different than “markdown supports X”


Page references are an interesting case because they’re very print-dependent. A I've been working on finl, I've thought about the fact that page references don't work well for documents which might be presented electronically.¹ A reference like “on page 32” doesn’t translate to say HTML or ePub. At one point, while I was reading the Rust book, I was excited to see how a page reference was translated into the online version. I quickly went there and they rewrote that sentence to eliminate the page ref.

The other challenge with page references is that they’re at least in theory, a potential source for a loop condition in rendering. We can have a reference to “on page xcix” which will cause the referred to element to move to page c, but then when the output is changed to “on page c” the element moves back to page xcix.²

⸻⸻⸻

1. PDF in a way doesn't count as electronic presentation in that it’s inherently an electronic representation of a print product, things like PDF reflowing notwithstanding.

2. I suspect, but can’t prove, that in LaTeX, this could even happen if a page reference becomes longer, thanks to how LaTeX’s line breaking algorithm works on a paragraph at a whole. When I get to that part of finl, I guess I’ll know for sure whether this is possible, and if so, will work out how to avoid it.


It’s nice for small-context stuff, but it depends on the editor context as well. Hitting cmd-I to turn italics on and off is no more difficult than typing * at the beginning and the end. And then there’s the fact that the verbatim equivalent doesn't provide a way to be able to easily have ` as an in-line verbatim. Word is nice because it’s designed for writing which most text editors are not.


I was referring to the app, whatever was recent in 2011 or so. 2007 I think? I haven't used it in a long time.

I always wanted my documents (and particularly equations) to look as nice as the formatting in my textbooks, and it felt like no matter how much finagling I did (which was not an insignificant amount of time being spent), it never looked quite as good.

Also, I never fully knew what copy-paste was actually going to do. Was it going to copy the source formatting? Was it going to adopt the style of where I was pasting? Was it going to do neither and just break everything? I was never sure, so I always ended up pasting into Notepad and then re-copying it to guarantee all the formatting was stripped out.

And sometimes there would be just a vanilla "I pressed a random keystroke by accident, all the formatting broke, and undo doesn't appear to do anything" issues. This might just be gross incompetence on my end (it wouldn't be the first time) but nearly everyone I've described this issue to has experienced similar problems.

Also, I can't remember the details of it, and it's possibly fixed now, but I had issues with the equation editor simply not having all the symbols I needed.

In fairness, I have similar complaints about most of the other WYSIWYG word processors, so I don't mean to pick on MS Word specifically (most of these complaints apply to Pages and LibreOffice as well).


> I never fully knew what copy-paste was actually going to do.

On a Mac, Option+Shift+Command+V is paste without style.

I can’t understand why this wasn’t the default (or when Command+V is pressed).

Edit: > I was referring to the app, whatever was recent in 2011 or so. 2007 I think? I haven't used it in a long time.

I was raging at my world. Things have got much, much worse in the MS Office world. Going back to it after a long absence had me baffled. It’s all online now, except when it isn’t, or when it sort of isn’t and it does not work well.


> On a Mac, Option+Shift+Command+V is paste without style.

On Windows, it used to be Control-Shift-V. Unfortunately, it does not work in Office 2016 (and likely other recent versions). While the new approach is more flexible, it would have been nice if they kept the old behaviour bound to the old key combination.

I agree with recent versions of Office being a baffling mess. The only reason why I keep Office 2016 around is for the rare case that I need to send an editable document to someone else and I have no intent to upgrade since figuring out that mess is more trouble than it's worth.


I was on Windows at the time. I'm sure there's an equivalent in Windows but I never learned it, presumably because I was too lazy.


In Word, if you paste something, it will show a little tooltip menu beside the pasted content that allows you to change if it keeps source formatting, adopts destination formatting, or has no formatting.


>but I had to realize that not everyone is trying to create publisher-worthy documents

Not to mention most LaTeX documents look like crap - the might get the "standard LaTeX paper-like formatting" but as a formatting it's nothing special to write home about, and the fonts don't look particularly good either.

What I find especially funny, is there was a foreign commercial book from LaTex experts (and contributors), about LaTex, typeset in LaTex (of course), whose typesetting was all over the place -- including text running off the page inadvertently.


Like you, I've never liked it. TeX has always struck me as more focused on the needs of a narrow set of TeX users, not the readers. And Bob Cringely tells a story about Knuth and the invention of TeX:

"""

It was a major advance, and in the introduction he proudly claimed that the printing once again looked just as good as the hot type of volume 1.

Except it didn’t.

Reading his introduction to volume 3,1 had the feeling that Knuth was wearing the emperor’s new clothes. Squinting closely at the type in volume 3, I saw the letters had that telltale look of a low-resolution laser printer—not the beautiful, smooth curves of real type or even of a photo typesetter. There were “jaggies”— little bumps that make all the difference between good type and bad. Yet here was Knuth, writing the same letters that I was reading, and claiming that they were beautiful.

“Donnie,” I wanted to say. “What are you talking about? Can’t you see the jaggies?”

But he couldn’t. Donald Knuth’s gray matter, far more powerful than mine, was making him look beyond the actual letters and words to the mathematical concepts that underlay them. Had a good enough laser printer been available, the printing would have been beautiful, so that’s what Knuth saw and I didn’t. This effect of mind over what matters is both a strength and a weakness for those, like Knuth, who would break radical new ground with computers.

Unfortunately for printers, most of the rest of the world sees like me. The tyranny of the normal distribution is that we run the world as though it was populated entirely by Bob Cringelys, completely ignoring the Don Knuths among us.

"""

From Accidental Empires. Which you can get as a book, but he uploaded a scan to his blog here: https://www.cringely.com/2013/02/10/accidental-empires-part-...


There are a few things wrong with Cringely's story:

- According to this story, TAOCP Volume 1 was typeset using hot-metal printing, Volume 2 was typeset using "photo-offset printing to save money for the publisher" (which Knuth didn't like), and Volume 3 was typeset using TeX. But in reality, the first editions of all three volumes of TAOCP (Vol 1 1968, Vol 2 1969, Vol 3 1973), and the second edition of Volume 1 (1973), were all typeset using hot-metal printing. It was the galley proofs of the second edition of Volume 2 (in preparation around 1977) which used photo-typesetting, but this was never published. Ultimately the second edition of Volume 2 when published in 1981 used digital typesetting (TeX + Metafont).

- He claims that in "volume 3" (let's charitably assume he meant the 1981 second edition of Volume 2, as this is the only thing that makes sense: the next editions of these volumes were from 1997–98, after "Accidental Empires" was published), he "saw the letters had that telltale look of a low-resolution laser printer" and "Had a good enough laser printer been available, the printing would have been beautiful" — but this makes no sense, as these published books were (obviously!) not printed using a low-resolution laser printer, but were printed using a good digital typesetter at 5000+ dpi.

For more history, see https://tex.stackexchange.com/questions/367058/wheres-an-exa...

So either the story is completely made up, or (accounting for lapses in memory etc), Cringely may have meant one of the following:

1. The fonts used in 1981 (Vol 2 2nd edition) were not the final version and were further refined (see Knuth's "burned with disappointment" linked above), leading to the current Computer Modern (BTW: try not to use the thin/spindly on-screen versions: https://tex.stackexchange.com/a/361722) after further consultation with leading type designers,

2. Cringely may have been talking about something like the 1979 book "TeX and METAFONT: New Directions in Typesetting", which is basically a very early application of the prototype TeX78 that was still being developed at the time, and was indeed (I think) typeset on the Xerox Graphic Printer (XGP), and you can indeed easily see the low-resolution artefacts. (They're impossible to miss: if such a conversation as above indeed happened, and about this book or something from that early period, likely what Knuth said was something like "yes of course the type looks bad in this book, but it's been designed for a real (higher-resolution) digital typesetter and it will look fine there", not that he couldn't see it.)


Thanks! I appreciate the further details. For what it's worth, I think of Cringely more as a tech gossip columnist turned memoirist, so I expect it's lapses in memory and being a bit slapdash, plus the difficulty of fact-checking anything in the pre-web era.


I guess there’s no accounting for taste, but I quite like the default LaTeX font, better than most of the fonts I tried in Office.


Amen to what tombert is saying there. LaTeX documents just look "nice" right off the bat and they do formulas perfectly.

I would personally say that it totally depends on what you're trying to do. Most of what corporate types need is done fine with MS Word and like others have said, there's always that "expert" that can get you out of tight spaces. It works well enough for the general public so to speak.

You want something scientific to look the part and not have any of the formatting hassles we all know from Word? Go use LaTex! I wrote both my Bachelor's and Master's thesis in LaTex. Yes, pure TeX is way outdated and even with LaTex you have to look at what the "current" packages are to use for various things (this was 15+ years ago and I have totally never had a reason to use LaTeX again, so I'm not current).

Fun anecdote (at least to me): I did my Master's thesis at a company, so sort of outside of the "scientific environment" where LaTeX would've totally been accepted. The company wanted my thesis as a Word document. I refused. I said I'd give them a Word version as an export afterwards but without any changes to the formatting, no niceties, nothing. Just a plain export, that's it. I was not gonna go through the formatting and editing hell of Word for a thesis, ever! Unlikely hero: my direct 'boss' at the company, who was overseeing the thesis work had written his own thesis like 10 or more years before that in LaTex and he defended me against his boss. Next day he comes to me and proudly tells me he went and grabbed his own Master's thesis from backup, downloaded a current LaTeX and ran it through. Looked as beautiful as ever!


Why use big Word when small Notepad do trick?


That Word doesn't work (well) in Linux is a big reason why I still have Windows on my home computers.


Did you notice they released a new layout engine for Word sometime in the last ten years?

Did you turn on kerning?


> "Tom, we're all really sorry we didn't become software engineers, I guess we'll all quit our jobs and go back to college," which (rightly) made me feel like an asshole.

The problem is that MS Word kinda require you to be a software engineer to be able to use it and produce a document which does not look like a ransom letter (and even then, it doesn't look great). Word, just like Excel, are extremely complex and complicated software.

Sure, they can be used in a "simple" manner, but god forbid regular users to select something by mistake then click on the wrong button (among the myriad of buttons around).

> I discovered LaTeX + Pandoc, and it created documents that I thought looked much nicer

About any document editor or typesetting program produces documents that are better looking than Word. It's not a thought.


> I didn’t know the limit because it had been many years since I’d had a problem that could be solved with a DB small enough to be held in its entirety in memory. And they were right to fail me for that: the fact that I was good at solving strictly more difficult problems didn’t matter because I didn’t know how to solve the easier ones they actually had.

No way it's "right" to fail an interviewee because they wondered outloud if a dataset can fit in memory. A candidate should be able to estimate the answer, but beyond that it's trivia that can be researched as needed -- indeed even asking the question is a positive signal not a negative one (it's not just people with "google scale" background who reach for over-complicated solutions).


Well, I think "can this be held in memory" can be a stupid question. What I thought was more of a problem is:

> And they were right to fail me for that: the fact that I was good at solving strictly more difficult problems didn’t matter because I didn’t know how to solve the easier ones they actually had.

Scale problems are a type of problem. They aren't strictly more difficult than problems on a smaller scale. Being able to scale to thousands users/cpus/shards is a skill. Thinking your niche skills are solving the "difficult" problems and other skills are easier is not a good attitude.


> Thinking your niche skills are solving the "difficult" problems and other skills are easier is not a good attitude.

This is one of the biggest obstacles I've encountered when hiring people out of big companies and into smaller startups.

At a very big, very established tech company it's easy to focus on your narrow problem and leverage all of the company's infrastructure to get things done. Moreover, you should be leveraging the company's infrastructure and not reinventing the wheel.

Some (though not all) of these people struggle to adapt when they move to another company where all of that infrastructure isn't available. Maybe they never had a chance to learn how to do it. Or maybe they've just been convinced that their previous company's infrastructure was the only correct way to do something. Either way, they get stuck in a position where they don't know how to solve problems without their previous company's infrastructure.

Best case, they admit the gap in their knowledge and work on filling it as fast as possible.

Worst case, they spend years trying to recreate their previous company's big-company infrastructure before they can get started on this new small-company problem that they've been given.


> they don't know how to solve problems without their previous company's infrastructure.

this is basically an indication of the lack of problem solving, and instead, they've just been "tool using".

It's the same sort of phenomenon as someone copying a snippet off stackoverflow, and tweaking it, without understanding _exactly_ how it works (and all of the underlying theory/understanding required).

Real problem-solvers don't need this, but only reach for a tool from their toolbag after they've completely understood the problem, and have a solution in mind.


> Well, I think "can this be held in memory" can be a stupid question.

absolutely not. questions like that give you a baseline for idealized performance. not that you should ever expect to reach it, but it does give a good sanity check for whether or not a technology choice is a sane one.

i've seen cassandra clusters used for datasets that would fit in a sqlite (with indexes!) in a ramdisk on a cheap laptop.

so yes, scale problems go both ways. and to your point, the "difficult" problems aren't harder necessarily, often just those that you (and your team) don't know how to solve.


> scale problems go both ways. and to your point, the "difficult" problems aren't harder necessarily

i like the old adage: anyone can build a bridge. Only an engineer can build a bridge that can barely standup.


I trust that is meant in the Colin Chapman sense? The racecar that fails the second after it crosses the finish line is the perfect racecar. Anything more reliable than that is excess weight, performance, cost, etc.

I like that as well, idealistically. However, we work at average places with average coworkers, and are ourselves average. I'm happy to work with folks that can "only" overbuild the bridge. There's far too many people that are on the wrong side of bridge endurance.


My point was that there are plenty of examples of systems/requirements where the dimensions of a problem are just comically out of proportion for that question- where you can't just fit this in memory, you can fit it in memory on a commodore 64, and that whether something fits in memory is not how you should thinking about the problem.


This 100%. I worked at a company and we hired on a self described "AI expert" (I think he was a friend of the founder). He didn't last long because when push came to shove and there wasn't any hard AI work to be done, the guy didn't know how to write HTML, and couldn't seem bothered to learn. And the product was a web app.

You know what else is a hard problem? Delivering a quality product on time and in good working order. Not everything has to be at a million user scale for it to be considered a difficult task.


> Well, I think "can this be held in memory" can be a stupid question.

Why's that? If I'm sure the answer is always yes, I can quickly write something to solve a user problem that is way faster and less expensive to run.


agreed. in fact, in my time at google i've focused on the other end of the spectrum, solving "small" (in terms of resources required) problems well so they can be used reliably by the people building larger-scale systems.


It seems likely that that was shorthand for "I failed the interview because I was unsuitable for the role, due to the effect described in this essay, as pithily illustrated by a particular exchange which helped us both realize that this was not a good fit."


They just probably got turned off by how senior the guy was and assumed he/she would be unable to be wrangled. It’s a bad judgement call, because usually you gain many years of experience at various companies because of exactly that, your ability to state your view, yet still compromise with leadership. Yet, in the interview they judge it as ‘this person won’t listen if I need them to’. They will, that’s how they made it that far. In fact, they are speaking to you this way because they’ve been trusted to speak this way in the past, and you are interviewing them because they’ve been trusted to also do the job in the past.


No one knows why they actually failed her on the interview. Have run many interviews, and it is very rarely the case that you fail a candidate for one technical slip up.


I want to agree with you, but having been part of the pair-interviewing process at one of the FAANG companies, it did seem like candidates would be declined for the most arbitrary technical screw-ups. One time a manager (at least nominally) voted "No" on a candidate specifically because, despite figuring out the solution to the problem, had a compilation error in their Java code and ran out of time to fix it. I remember saying to the manager that that seems like a pretty silly reason to decline someone, compilation errors are typically pretty easy to fix, but I was relatively new to the job so I didn't want to rock the boat too much. He claimed that they only want "the best" and I guess that implied that the best don't have compilation errors.


If it wasn't a compilation error it would have been something else to cover up an unconscious bias or a 'gut feeling'. Basically seeking out the evidence, no matter how weak it is, to fit the 'crime'.

I wouldn't believe anyone who said they've never been there. For all the times you're told to trust your gut, there are plenty more times where your gut is just plain wrong and you have to go against it.


For sure. It seems akin to throwing out half the resumes at random because you only want to hire lucky people.

Personally, when I'm interviewing I help people get past dumb mistakes. We all make them, and some of the best developers I have worked with are the type to get nervous in interviews, so they become more likely to make them and have a hard time finding them. The "best" interviewee is not always the best developer.


Exactly. We don't know anything the author didn't tell us. For all we know, the author adamantly gave the wrong time complexity for quicksort. Or also possibly they were (intentionally or unintentionally) rude and condescending. Or there was a way candidate with a way better fit than our plucky author. Or, or, or....

Thankfully the author left the name of the company off. I'd hate for half of a secondhand account like this to cause candidates to avoid interviewing at a company!


And I wonder how often a candidate actually follows up to ask why they failed and whether many interviewers will give honest/complete answers.


I'd never give a detailed answer, because I don't want future candidates to cheat. The last thing I want is for someone to share my interview, and then I can no longer filter out bad candidates. It takes a super-long time to write a good interview.

But, more importantly, very few candidates fail on a single question. They usually fail on patterns: Poor comprehension of the task at hand, too many snap judgements, or lack of professionalism.

(Another reason, though, is that I've rejected a lot of people who I believe shouldn't be software engineers. It's not my place to tell them they should pick another career, nor do I want to make it easier for them to bluff their way to becoming someone else's headache.)


Opinions are my own.

If you rely on a "not-leaked" interview to filter out bad candidates, it's likely that you may not be conducting an effective interview.

Good interview is about figuring out "what the candidate is good at" not about "gotcha I proved that you're a bad one to filter out". I don't have a set of questions I ask all the candidates. Instead, I based the questions on their experience and projects. It's common that I found out "the candidate is not as good as they claim for this subject", and that's fine. The interviewer can switch the topic and see if the candidate is good at another subject.

Everyone is good at something. The job of the interviewer is to find out that, and whether that fits the business needs. The job of the hiring committee is to decide whether it's worth paying for that compared to other candidates. The only "filtering candidates" job is the recruiter, not the interviewer.


Indeed... the whole "trick question" school of interviewing is basically an admission that this industry has absolutely no idea how to reliably rate job candidates (I'm not claiming that I have a magic solution myself, note...).

Doctors don't, in general, get asked trick/gotcha questions when they apply for jobs. Neither do lawyers. Neither do other kinds of engineers. Neither do musicians, for that matter.

From long-ago undergrad I remember that there is a data structure called a "red-black tree". Now, I've never had that as an interview question, but I could easily imagine it being one.

Would I be able to answer a cold question on red-black trees? Nope. As I said, that was a long time ago, and I've never had to use one to the best of my knowledge. Would that be a good reason to filter me out? Nope. If I needed to use a red-black tree (or, in general, had a problem where some unusual type of tree might be in order), I'd pull my copies of Knuth, CLR* and Sedgewick off the shelf, look at strengths and weaknesses of the different types of tree, and make a decision.

* mine is old enough that it's just CLR, not CLRS :-)


Yeah my interview questions are more jumping off points to have a conversation. At some point I'd most like the candidate to actively start teaching me something that I didn't know and generally geeking out about something and displaying some skill at getting to the lower levels of problems and its the "off script" discussions which are best. And then just avoid any personality red flags. You can't really cram for that.


I've seen a lot of this. FAANG practically release a study guide, while tons of smaller companies think their precious little process will fall apart if the candidates even know the sort of interview they conduct up front.

Will there be leetcode-type questions? Roughly which level? Laptop, or whiteboard? Should I expect to be quizzed on networking? Javascript "gotchas" for crap I reflexively avoid so may have forgotten about? UX principles? Will the questions be general, or specific? Will it be a mix? How much, if any, of the day will be whiteboard stuff, and which part of the day will it be in?

The answers to these, and more, you should provide up front, without being asked, well ahead of time. Your signal will go up, not down, if your candidates have half a clue what to expect, out of the space of all things that might be asked in a software developer interview. Keeping this stuff a mystery and ambushing your candidate at 9AM sharp with data structure whiteboard questions, after they've flown for two hours to get to you, got up early, were on a plane then in a taxi for a while, et c., when they didn't know you were gonna do something like that, is just giving you tons of false negatives. Knock that shit off, you're wasting money and giving people bad days for no reason.

That, or just stop expecting people to "naturally" be able to pull this stuff out of their head and perform for you like a monkey, while in a stressful and semi-adversarial situation that does not resemble collaboration and teamwork in a real job (there's the excuse that "well you have to be able to perform under pressure" but interview pressure has nothing to do with, say, "prod broke and the DB went bye-bye" pressure, so that's bullshit, too). And then communicate that fact—that you're not going to torture them—too, so they don't have to stress out over it.


> The only "filtering candidates" job is the recruiter, not the interviewer

Then what's the point of the interview?

Considering the amount of incompetent candidates I've had to filter out, that just won't work.

IMO: We need a licensing board to do that for us. It works really well in other industries, where they don't need to test competence in their job interviews.


Very rarely. And they shouldn't trust any answers that they receive.

It's difficult to give an honest answer to this question even if you want to be honest.

There are often so many little reasons that go into a rejection -- different reasons for different interviewers all feeding into an activation function that leads to a rejection or an acceptance (default state depends on team culture and how well you filter out candidates at the top of the funnel).

For most candidates, it's very hard to distill this into actionable criticism.

Edit: For anyone fundraising, this is also why you shouldn't listen too carefully to the reasons that investors give you when they reject your business as an investment.


I agree that people shouldn't trust the answers they receive.

All of us frequently give inaccurate explanations for our actions or attitudes because people often act on feelings, not on logic. Even when we want to be honest, we often lack the self awareness to know what what we're feeling and why.

On top of that, there are many who don't want to be honest, and will give an answer, any answer, just to make you stop asking and go away.


It's fairly common for companies to instruct interviewers not to tell candidates why they failed if they reach out for fear of being sued over the response. It really sucks for candidates, but given that it essentially costs companies nothing to have this policy, I'm not surprised they do; no company is going to make a choice that has even a small risk of a huge negative downside if there are no direct obvious upsides to counter it.


Even if a candidate gets an answer I'm doubtful the response will be much use. We like to think of ourselves as rational but I'm willing to bet most hiring decisions are anything but.

They might well tell you your skills are lacking when the reality is you remind the recruiter of the guy who ran off with his wife.


It's rare to get an employer to give any honest reason for a rejection. As others have pointed out, it may be for fear of getting sued, but that is only part of the general reason:

Further interaction with the job candidate almost certainly has a negative ROI (return on investment) for the employer.

If you tell the job candidate the honest reasons for rejection then at best you've used (wasted?) some time to tell the candidate this. At worst, the candidate sues.

But there are many other possibilities: The job candidate gets angry at you for telling them a hard-to-swallow truth and causes a scene. The candidate tries to argue or negotiate with you. The candidate accepts the truth and tries to use more of your time to discuss your original criticism and seek further advice.

None of these scenarios are a net positive for the employer.


At my last startup I was advised not to give honest feedback, since no matter your intentions it is possible it could be construed to be discriminatory :(


About ten years ago, I applied for a SWE role at Facebook. During the interview, the only question that I didn't feel confident was when they asked me how group-by works in SQL and to write a group by query. I said I didn't know and usually looked it up, but they didn't want me to look it up. They rejected me, and the recruiter literally told me it was due to a lack of familiarity with SQL (something I'd been using with PHP for 6+ years at that point).

Edit: I interviewed a year later, wasn't asked about SQL at all, absolutely fumbled my way through some networking questions, and was still hired, so it's clearly very dependent on who's interviewing you and what they ask.


        And I retrained on smaller numbers and got that job at that start-up.
Folks are probably much more likely to tell you why you didn't make it the first time you applied...


“No way it's "right" to fail an interviewee because [she] wondered outloud if a dataset can fit in memory.”

That made no sense to me, either. My reaction was that she was fortunate not to work there (but further down she mentions that she wound up there anyway).


I took it as poetic license, most interviewees don't know why they failed, maybe an interviewer's offhand comment caused her to fill in blanks or extrapolate from the general point -> She wasn't the right fit due to the scale of problems.


I also came in to comment on this excerpt that failing someone over something so trivial is literally insane.


My guess is that the interviewer was using the question to select for people who choose appropriate solutions.

Someone who suggests a complex highly scalable replicated database architecture for a 5MB database is very likely an unsuitable candidate.

Also the interviewer could be looking for whether someone asks the right questions to be able to suggest an appropriate solution, rather than someone who doesn’t ask, jumps to conclusions, and makes poor assumptions.

Once you have worked with one architectinaut that builds baroque solutions, you make sure you don’t employ that kind of person. Anyone can design something complex, it takes talent to simplify everything properly.


I was mostly in agreement. I suppose it depends on the scale of the data.

Fitting 1000 short rows in memory is something I'd expect anyone to know off the top of their head just works. It really depends on the specifics of the question and the seniority of the role...


It is clear by the mention of "several orders of magnitude" that we are talking about a database which fits in a few MB of disk space on a machine with a few GB of RAM.

If the interviewee has never worked with a database which would fit in the memory of the machine, and if the company only deals with such databases, then it is clear that it was not a good fit. Failing the interview does not mean that the interviewee is bad, just that they don't know how to count that low, and thus were not the person for the job.


> If the interviewee has never worked with a database which would fit in the memory of the machine, and if the company only deals with such databases, then it is clear that it was not a good fit.

Why? That's not clear at all to me. Most of the job is the same, and the parts that are different are themselves mostly removing steps that aren't needed at a smaller scale.


Even then, it's good to stop and consider if those 1000 rows truly represent the full data set you need to run this on.


I think you're taking this statement far too literally. It's part of the storytelling.


TBH author sounds like a bit of a broccoli from their time at Google. The public sees this same attitude "in the wild" though, with Kubernetes, which is a thoroughly inappropriate solution in many cases (but is absolutely right in some). But once you've bitten that bug, if there's a computer problem, step 1 is going to always be "first, set up a kubernetes cluster because I've forgotten how to computer without that".

If, during an interview, someone couldn't move past "well, I'd just use kubernetes" or give other possible solutions then I could see myself questioning how appropriate they are for a role and consider other candidates.


What is a broccoli?


As another commenter said, it's a reference to an internal parody video about how hard it is to do simple things at Google. It was made with some video generation software with stock animated characters and synthesized voices. I found Broccoli Man playing another role here: https://www.youtube.com/watch?v=8d0TVpCIyLY

In the internally famous Google video, Broccoli Man was the jaded production-savvy SWE/SRE who told the earnest SWE (Panda Woman, I think?) what hoops she needs to jump through to launch her service. Among the most famous lines: her "I just want to serve 5TB of data" to his "I forgot how to count that low".

The video had a huge impact because it was funny and had more than a little bit of truth to it. Lots of Google infrastructure was oriented toward huge problems and required the same heavyweight processes for things that aren't so huge. It inspired a bunch of changes to simplify deploying small (by Google standards) services. Many small things are still difficult at Google, but a bit less so now.

I've only said and heard "I forgot how to count that low" as a reference to this video, making fun of the unexpected difficulty of doing something small rather than actually making fun of someone for wanting to do something small.


I can confirm that this is a reference to an entertaining internal video with Fox Girl and Broccoli Man. Fox Girl just wanted to serve 5TB of data and Broccoli Man was explaining the many, many steps of configuration, services, and monitoring that were required to do this at Google. I agree with scottlamb; I only heard "I don't know how to count that low" in reference to the video, not as a "sign of superiority" that the post describes.

The video was also described briefly here: https://rachelbythebay.com/w/2012/04/06/5tb/


I wish I had taken a picture of that shirt before it had been run through the wash a few times. It used to look a whole lot better!

But yeah, that video was epic, and it was spot-on, too. A company that has no idea how to deal with people is going to suck at making tools _for_ people.


Funny how memes can sometimes shift culture that much. I love it.


I miss those internal Google culture things.


It’s a reference to an internal Google comedy video mocking these tendencies (one of the characters is an anthropomorphic broccoli)



I am also interested in this expression. Google searching turns up nothing, except maybe a reference to marijuana.

So maybe 'broccoli' is their way of saying craziness? Like being on Google drugs. But obviously that's not a perfect fit. Maybe its a misspelling of something else?


No one in Google uses this as an expression. It's close to something people actually do reference, but it's not described like this.


I think you are basically wrong about k8s. It's true that sometimes Netlify or Heroku is easier, but if those are too pricey or not flexible enough, k8s is pretty much the best mainstream app ↔ metal interface.

The doesn't mean it's the best — at home I use something different — but for your average business environment it's the safest choice.


Given the choices between app->k8s->vm->metal, app->vm->metal, app->k8s->metal, or app->metal, a lot of businesses really should be choosing the ones without k8s in the chain. Sometimes a single VM, with a hot spare ready for manual failover, is plenty.


Disagree. I've been on both sides of those choices, and the containerized deployment infrastructure is much nicer to work with, as a software developer. And I think it is less annoying for ops folks too, but I'm less sure about that.


If ops is handling ops and you're using automated CI/CD, then all you need to do as a software developer is "git push," and it doesn't matter whether it's containers, VMs, or bare metal running the code, nor does it matter what's orchestrating the containers, VMs, or raw servers.

I've also done everything from bare servers to VMs to containers, in varying capacities. For small teams with fewer than millions of users, the fewer moving parts the better. One monolithic app scaled mostly vertically, one database scaled mostly vertically, and maybe one caching layer scaled horizontally, will be far more productive for development.


Our GCP machines that are behind a Load Balancer are all app->vm->metal. Works fine. People keep bringing up kubernetes but it adds nothing.


it's a standard like Ansible. if you already have a great configuration management / deployment tool, then you likely wouldn't gain much


I dunno, I've done it both ways. I spent about a decade in small companies using all sorts of different kinds of ops solutions, from custom bash scripts on bare metal to capistrano on on-prem VMs to chef on those VMs to chef on AWS to heroku and some things in between all that. I did not at all see all the devops changes of the last five years as unnecessary complexity, I felt all the pain that solutions like those from hashicorp and docker and kubernetes are trying to soothe. And then I went and got a job at a big company with a mature containerized deployment infrastructure and it's just better. It's not like it's easy, but it totally solves the "blank sheet of paper" problem. You don't have to invent some deployment scheme to prototype or launch a new service, you just go figure out how to write the config code and where to put the build artifacts and run a couple commands and voila. Yes, it requires up front investment to get to that point, but if you see yourself as the kind of company that is going to be iterating on new ideas, entering new markets, creating new products, solving scaling problems, just anything that ends up with a lot of cycles of launching new services, then I really think the investment is worth it. Maybe that's still a niche case, maybe not many businesses really fit this, but honestly those that don't aren't really the kinds of businesses I want to work for.


"when that's explicitly the wrong answer for that interview question"

So to clarify: You're asking some sort of an interview question that is to some degree subjective, and then penalizing people when they have an "incorrect" opinion?

A better approach would be to discuss why they have that opinion and why they would go that route over another one.


You're not wrong :)

I discuss my opinion a bit more in depth upthread https://news.ycombinator.com/item?id=28990879 but we don't actually know what happened, so it's impossible to say.

The candidate that is unable to explain any details about how linux containers actually work, other than to say "I used kubernetes at my last job" is getting picked after the person that actually understands how they work (given that we're hiring for a low-level linux role in this hypothetical :) ).

Or to frame it in a different way, how would you sort an array of numbers? The answer "I'd call array.sort()" isn't wrong per se, but there's an answer the interviewer is looking for and that's not it. If the interviewee furthermore becomes rude and aggressive about it and doesn't even know what time complexity is, that's a pass. (Not remotely suggesting that that's what happened here, but sometimes the biggest interview isn't official - if you're a raging asshole to the person at the reception desk, don't expect to get a callback.)


Does the original generator for those "Fox girl and broccoli man" videos still exist?


Regarding the phrase "I don't know how to count that low", here's the video where it originated. Frankly I'm astonished this hasn't been shared before, and I hope I don't get in trouble for posting it (it's like 11 years old, surely nobody cares at this point, right?)

https://www.youtube.com/watch?v=3t6L-FlfeaI


I always heard "I don't know how to count that low" as more of a tongue-in-cheek joke about Google's scale, not necessarily a sign of smug superiority, or that work at smaller scales "didn't matter".

Like if someone was wondering "is it wasteful to store this data as a 1 GiB file, maybe we should compress it/structure it differently", someone might say "I forgot how to count that low", meaning, "that's a small enough amount of space that it's not worth it to worry about optimizing it".


> I always heard "I don't know how to count that low" as more of a tongue-in-cheek joke about Google's scale, not necessarily a sign of smug superiority, or that work at smaller scales "didn't matter".

This is exactly correct. I dunno where the author's coming from, it sounds like they might be one of those people who frequently misinterprets others' intent.


1 GiB? That's nothing. I just finished reformatting 280 GiB of Google Takeout data from .tgz format into a 130 GiB SquashFS filesystem (both for space reasons and to allow efficient random access). It would have been nice to be able to do that on the server without downloading 150 GB of extra data first, plus another 30+ GiB because one of the downloads failed partway through and couldn't be resumed due to the download URLs having very short expiration times relative to the file sizes. Apparently 180 GiB of data transfer and 150 GiB of storage doesn't register as worthy of optimization for Google, but it's quite a bit for your typical home user.


I'd imagine that the inconvenience of handling takeout data is beneficial for Google's business interests, so there isn't a big incentive to improve the situation.


How many other services make it relatively painless to download all your data in one place? Or at all, for that matter? For me, the existence of Google Takeout is definitely a selling point when comparing their services to the competition. There is room for improvement, and perhaps some conflicting interests are at work behind the scenes, but I wouldn't say they go out of their way to make the process inconvenient.


I'm not saying that they go out of their way to make it inconvenient; I'm saying that making it more convenient is probably not going to give them an ROI of any sort .


I was at Google, and I didn't see anyone using it with superiority complex.


Oooof, SAS (in the author's dad anecdote). If they get hooked young, it's expensive hell to get them to quit. I'm not sure how that colors by analogy my view of the main point of the article, but it's tempting to reach for: yes, if you get addicted to expensive (and/or proprietary) complexity, you sometimes lose sight of how to build simple solutions to simple problems. I'm not sure that's an argument in favor of the intellectual superiority of those who build the complexity...


I'm a huge SAS fan, but after college it fell out of my toolkit for the same reason as the author: license fees. It's a shame, because I found SAS far and away to be the best tool for data cleaning and querying. Plus, it's got great readability...especially over R, though that may be a contentious opinion.

I'm honestly surprised and disappointed to see how little SAS has adapted to competition. I'm sure they'll milk their existing customers for another generation, but it'll kill their business on a long enough timeline.


I had the misfortune of needing to install SAS on my computer (fiancee has a license through grad school) and the whole install process is built around a complicated licensing scheme that ties any of a hundred optional components together into a massive assembly depending on exactly how much money they were able to squeeze out of you.

There was a problem with our license file which meant we got through 5-10 screens of installer and then hit "The SID file you selected is not intended for the order you are deploying because the Tech Support Site Number values do not match. Select a SID file with a matching tech support site number."

And then two weeks later we were finally able to get a correct license file from the IT department. I can't say how much of that was IT's fault or if it really took two weeks to get the fixed license from SAS, but either way I'd be happier to never touch it again.


Yeah, you've summed up the pains of SAS. The query language itself is quite lovely.


The "which components do you want to install" page feels like the end result of 45 years of "one customer wants this feature, let's add it to the product and maybe somebody else will buy it too!"

I'm sure they're all useful to someone (or they were at some point), but boy am I glad to not be in charge of managing that license.


In case anyone wanders into this old thread, guess what I'm doing tonight!

If you guessed "unfucking a SAS license but the renewal utility won't work because it can't get a lock on some file" you win the satisfaction of knowing you were right.


Likewise I absolutely love Mathematica, and have a personal license I make good use of. I would never, ever use it in a business or consulting context though as the license fees are just way too high.


I've never used it, so I can't speak on it's complexity, but are there any comparable open source alternatives? Like an OpenOffice style project?


R studio is the typical alternative, along with GNU Octave. They're not super comparable; SAS is an Oracle-type product as others have pointed out in this thread


If there is, I don't know of it. A little googling suggests Gnu Dap, but I haven't tried it. SAS is a query language, so it has a lot in common with SQL, though it's much more powerful.


> SAS is a query language, so it has a lot in common with SQL, though it's much more powerful.

Well, as I recall it, it's more of a database management / software development package with an integrated data manipulation language (the original one, among the at least three "languages" it contains) to make new datasets based on existing ones. Yes, as an imperative Turing-complete language that's much more powerful than SQL, but it carries with it a huge overhead in the form of the whole rest of the system. Not to mention the lock-in.


It reminded me of my days in college where professors encouraged us to use Matlab. A variety of reasons made me not like it but one of them for sure was the realization that the moment I left college Matlab would no longer be "free" to me.

Yes, I know Octave exists, no it can't do everything Matlab can do and a lot of the programs I saw used needed Matlab specific things.


> the fact that I was good at solving strictly more difficult problems didn’t matter because I didn’t know how to solve the easier ones they actually had

I refuse to believe Google employees solve "scale" problems every day. Be realistic, you configure systems and make calls to libraries which perform the scaling for you. Yes, the systems and the libraries can be impressive but your glue code is not. That's why you fail the "easy" questions.


> if you didn’t need 100 database shards scattered around the globe, were you even doing real work?

Is this actually a prevalent mentality at Google?


Here is a concrete example where this mentality really hurt Google in the cloud space: App Engine. App Engine was really Google's first foray (or one of the first) into the cloud space. When it was first released, the only persistent store available was the Datastore. The Datastore was/is a great piece of technology, but it was painfully clear that this tool was optimized for Google's use cases and came from the mindset that things need to be infinitely scalable/shardable, e.g. eventual consistency and the fact that you couldn't have indexes with inequalities in different directions, so doing a query that is trivial in a DB (e.g "dateCol > someMinVal AND dateCol < someMaxVal") was annoyingly difficult.

So the early days of App Engine were really cool, but it took some time before Google actually made a plain ol' database (their Cloud SQL tech) available. As someone looking in, and as someone who has talked to a lot of ex-Google employees, it was clearly a cultural shift for them to need to focus on standardized enterprise tools that so much of the industry is familiar with, rather than all the tons of custom shit Google has due to their scale.


This makes a lot of sense and totally jives with my brief experience of working on a single App Engine app years ago. It felt like an easy platform to spin up a simple web app at first blush, but was actually such a frustrating experience when trying to do a lot of actual simple web app things.


The Datastore could do simple things well, and hard things well, and both exactly with the same level of effort.


How would you do a range query "well", then? With filtering on your own server definitely not being "well".


The same was true of early Firebase. (Which Google acquired.) Sometimes a "general" backend/datastore just sucks independent of scale.


There are a series of jokes in Google about things like this. Things like just wanting to serve a file, or run some small webapp, etc, etc. Google's tooling is geared towards building large, reliable apps. That makes wanting to build some tiny thing seem like a big bother (seemingly lots of overhead).

I remember my first hackathon at Google, probably in 2015. My group built a service and deployed it with some automated deployment tools. In "production", it was running 9 instances, spread over 3 continents. A little overkill for a hackathon project.

But I have never heard it (in 7 years) seriously.


A running gag at Google is that the infrastructure makes hard problems easier and easy problems harder. There's a new-hire parable centered around the phrase "I just want to serve five terabytes" that demonstrates all the layers of systems and services you have to plug into to get a simple static-data server running reliably, scalably, and with monitoring and support.

The parable hints that "The Google way" to solve that problem is to find the service someone already built and use it to vend your data; don't re-invent a wheel. But (a) that means part of your job is now keeping abreast of all of Google's wheels (and there are hundreds of them) and (b) if your work didn't require the creation of a new wheel, was it impactful enough to justify a promotion later?


I'm an outsider, but this book[0] talks about how they think about these questions. The mentality makes sense to me in the sense that they are also trying to scale development processes. Does it make sense for every team to start small with a single VPS running python scripts and then migrate to up to 3-tier architecture, then all the way up to k8s when the feature is a success so needs global scale and availability? Or could the company make k8s easy enough that its trivial to just host v1 there, and back it with a massive multi-tenant db that also works fine for low volume prototypes?

0: https://abseil.io/resources/swe_at_google.2.pdf


Why does everyone cite k8s as the tool for the job when there have been massive problems actually getting k8s to work at scale without constant tweaking and babysitting? In reality this is almost always either traditional load balanced clusters, ala-carte systems like Google App Engine / Elastic Beanstalk or (more so these days) serverless compute. At Google, they use Borg internally which is an early precursor to k8s that is more akin to traditional load balancing.


Isn't the promise of k8s a kind of dev-friendly serverless-compute engine, but your process(es)/architecture may be running for moments, days, weeks, or years, and spread across environments dynamically? I would expect aspects of such a thing to require tweaking and babysitting, as the demands of the application(s) evolve.


Right but the second I have to dictate the number of nodes to scale any further, that contract has been broken imo.


No, like the phrase "I don't remember how to count that low", I've pretty much only heard it used ironically to poke fun at how infrastructure can complicate simple use cases.

Want to host a 1PB dataset, globally available with enough failover to handle a moderately large nuclear war? Easy! Want to throw up a 200KB web page with best-effort reliability? Surprisingly complicated.


It's a bit tongue-in-cheek, but there's some truth in it at least for anything infrastructure related. If your thing is a success, it almost certainly needs to scale to Very Large numbers, and be reliable/distributed/performant/etc. So if you're proposing something that you expect to be successful but only needs a couple of VMs, it would be an outlier.

(I work there)


Of course not. Don't be silly. The post even tells you with its title: "I don’t Know How To Count That Low.

:)


I've heard someone from Google talk about how they only think about the "B's and T's" of their projects. B's and T's, as in, Billions of users and Trillions of dollars.


Personally I've never heard anybody utter something similar.


A video by someone named Taliver: Answer to What’s an inside joke among Google employees? by David Seidman https://www.quora.com/What-s-an-inside-joke-among-Google-emp...

Pretty old by now, don't know when you started at Google but probably not seen much anymore.


I was only at Google for a couple of years, but I remember “I don't know how to count that low” as a recurring punchline on the internal meme page. I never actually heard someone utter it seriously, but I saw it frequently enough in jokes that I assumed this article was about Google.


Sometimes a pervasive truth is often tacitly accepted and never uttered? Conjecture of course.


No, and that informs my estimate of the accuracy the other judgments in the story.


As a data scientist, I like to use the most accessible tool that can get the job done without pain. Can I do it in Google Sheets? Awesome! I'll do that. Can I do that in SQL? That's great too! Will I have to use pandas on a cloud instance? OK, I'll do that. Do I absolutely have to use some complex pipeline? Then I will do that.

There are benefits to using tools that many people can understand.


Same approach I take in a research group in computational biology. Our datasets are often around the ~1-10 million rows range so those get processed in R or the command line. But often the summary analysis fits into a single page in excel.

I personally make my notes in markdown, but when I share them with others I convert to word docs or html because it’s not fair to make others deal with my raw text when it is so simple to format it for them.


Right tool for the right job.

It's some of the simplest wisdom that's the hardest to follow :)


This attitude is why none of the web-native hyperscalers have been able to even come close to competing in the HPC space: you cannot optimize a large-scale problem if you cannot optimize a small-scale problem. The first, last, and only approach from Google, Amazon, and even Azure is horizontal scaling, which is ridiculous because they absolutely employ people who can go deep on anything they might need. Unfortunately, while throwing more hardware at the problem will eventually solve the problem, it gets horrifically expensive in the meantime, and never really makes optimal use of the resources.

One day one of these providers will figure this out, and then I'll either need to go work for them or retire... but today, I'm not worried.


> "I didn’t know how to count that low, but now that it matters I am learning"

This is the sign a good engineer.


I just wish google would release the brocolliman video where he says 'I forgot how to count that low'


It doesn't make sense unless you know the internal systems it's referencing, and Google probably does not want to release the details of those internal systems.


I was over in Android, where "borgmon" and "readability" were mythical concepts, and it was still hilarious.


This makes me think of how big tech job postings always talk about scale and impact in a way that makes it sound like they know that most of the day to day work there is boring and pointless as shit and scale is the only thing they can think of to try to convince you that you want to do it


> when it was several orders of magnitude lower than when I should even have begun worrying about that

Really confused by this phrase. What does it mean?


This is nerd-speak for "my guess was way, way off": the database under consideration would have to not just double in size, not just be 10x larger, but be 100x, maybe even 1000x or 10,000x bigger before it stopped fitting in memory. And her time at Google had her used to dealing pretty much exclusively with datasets at several hundreds of thousands times larger than the one this job was asking her to think about.

"Orders of magnitude" are a concept found wherever you have to deal with absurdly large or absurdly tiny numbers; if you've ever seen someone write a number like "2 x 10^9" then you're seeing someone use a little of this - they're saying "about two billion".

Wikipedia's page on the subject of orders of magnitude might be a decent introduction to the concept: https://en.wikipedia.org/wiki/Order_of_magnitude


> not just double in size, not just be 10x larger

"An order of magnitude" usually refers to 10x. I interpreted it as being 10x off with his guess.


"several orders of magnitude" would imply at least 100x or 1000x, depending on how small you can interpret "several" to mean.


The dictionary packaged with OSX defines it as "more than two but not many", and the copy of an old edition of Webster's that I installed also defines it as "Consisting of a number more than two, but not very many". So we're probably looking at at least 1000x.


I had it to read it a few times to. I think they were trying to say that database was so small that obviously you wouldn't need to think about distributed systems.

I'd be surprised if that was the reason they failed the interview.


It could be. If you ask the candidate to fizzbuzz and they think the solution is a key-value database of number to fizzbuzz pairs, and then they wonder, "Hey, could that fit into memory?" That kind of answer could raise multiple flags.


That's a common way I see well-educated candidates fail our interviews. What happens is that they only have N minutes to do the whole interview... If they burn twenty minutes of that time gaming out whether the DB needs redundancy, multi-tier storage, and synchronization, they don't have time to actually show me the algorithm that will generate the data to go in the database, and that's what I'm hiring them for. "Is the database the right shape" is rarely a problem at our scale.


Have you considered giving them a hint, that you primarily are looking for the algorithm that will generate the data?


I have, often. People get nerves in an interview and don't always hear what the interview is saying, I think.


Some candidates also can get into this “word salad” mode, where they just talk and talk, and don’t listen to what you are saying. They have their script and talking points and just go on autopilot, filling every silence with words and words. You can give them obvious hints and cues: “Just give me the simplest answer and we will go from there!” …and they launch right back into 5 more minutes straight of talking.

As an interviewer, I have to manage our time appropriately so we can get into all the questions, and will sometimes even have to firmly say “STOP TALKING. Unfortunately we need to move on to the next question.”


That much is true. I get so nervous in technical interviews, and it has definitely lost me some jobs I am confident I had the technical skills for.


Their database was likely measured in megabytes not gigabytes or terabytes.

I have a similar situation... it's awesome, I can afford to back the entire production DB up in its entirety once per hour for months on end.


I read it as "the database could increase in size by several orders of magnitude and still fit in memory"


Basically, you are presented with a small-scale (possibly toy) problem: the database consists of 50k entries and fits in 3 MB of disk space.

Then the interviewee starts wondering out loud whether he could load it in full on a machine with 8 GB of RAM.


Yeah. Or they veer off and discuss setting up Redshift and designs a whole ETL pipeline and data warehousing solution and a Hadoop cluster when a local file and sqlite is much more appropriate.


This is a perfect catchphrase for a problem I've warned other startup entrepreneurs about. Over-crediting a resume that has a very impressive name brand company on it. It may or may not actually signify much depending on what they actually did, know and how they think.

I learned it during the interview process with a candidate for CFO referred by a board member who was impressed by where they used to work. Fortunately, the candidate and I both realized he'd probably be fairly bad at working on the financial challenges ahead of our 35-person-going-to-100-people startup.


In startup parlance, by the time you need to shard your DB you have already solved the hardest business problems.

Without being to condescending to our 'inner engineer' - doing things at scale is relatively pedantic compared to building something that people want to pay for in the first place.

Yes, some of these problems are hard, we need to respect that - but they are ultimately, mostly problems to be solved - like difficult questions on an exam. You'll find the path through and that's that. It's more akin to operational complexity, just a party of doing business smartly.


It seems to me that there is a bias for complexity among software engineers. When faced with a problem, they seem to be deliberately choosing the most complex possible solution for it.

When another programmer runs a problem and proposed solution by me, the most common answer that he gets is: "the solution to that problem can be a lot simpler". Then he thinks about it and comes up with a significantly simpler, better solution on the spot. But I notice a touch of irritation, a hint that he doesn't really want to do it the simple way. It baffles me.


I like messing with "the small stuff."

I love "getting my hands dirty," and crafting a direct user experience.

I really don't give a damn whether or not anyone thinks that it's "beneath" them.


> the fact that I was good at solving strictly more difficult problems didn’t matter because I didn’t know how to solve the easier ones they actually had

The funny thing is: "small" systems are probably not presenting problems strictly less difficult than the ones Google have.

Different, with the need of caring about other abstraction, other layers, yes.

It's even astonishing the author does not understand that when she writes: "I didn’t know how to solve the easier ones" -- so how is that strictly less difficult.


This is a very systems-engineering way of talking about over-engineering, and possibility a better way of talking about over-engineering in a world where it's just as easy to spin up MySQL as it is to spin up DynamoDB on AWS.

Quite often the largest material difference for engineers these days is cost of operation, and we're not often trained to add that to our calculus when designing systems.


> ...Excel could have gotten better, except AFAIK they never fixed the problem with reading genes as dates so they get no benefit of a doubt from me.

Not that this is only Excel's fault: Before entering your gene names, select the column they're going in* (by clicking the column header), right-click, select "Format cells" from the pop-up menu, select "Text" from the list on the left of the formatting dialogue, click "OK", and you're done. Now it won't turn text that looks like dates into the numeric representation of a date value.

Not exactly rocket surgery.

___

*: Or, if you're just using it for data entry and/or all your data is textual, select the whole sheet by clicking on the grey square in the top left, at the intersection of row numbers and column-name letters.


Lighter weight UI version available here: https://www.greaterwrong.com/posts/koGbEwgbfst2wCbzG/i-don-t...


What this is grasping at is that everyone works at a different level of abstraction, and they are all important.

We need those that oversee the huge complex tasks (that are simplified by abstractions) just as we need those in the trenches.


Does it seem a little strange that a startup would not hire someone who had solved a harder problem (big concurrent sharded DB system) for not knowing exactly how to solve an easier problem (keep the database in memory)?

OP seems to imply that is some kind of arrogance to believe that. Seems like, if you could figure out how to crack the first kind of problem, you could probably figure out how to crack the second. Maybe it's dismissive to say this, but it seems like this was a problem with the hiring org, not the author.


Is a big concurrent sharded DB necessarily a "harder problem" when it's provided to you by another team at the company to use?

The skills you need just seem different to me.

In the large case, the more important skills are going to be the higher level reasoning of what is a good shard size, how do we avoid hotspotting, what is an appropriate way to partition for the set of queries we want to run. None of those really help you solve the smaller problem. You're also going to likely be relying on infrastructure-provided solutions like automatic backups.

In the small case, you're going to need to worry about things like those backups. Then there's the skill mentioned in the interview -- it seems like this person had never had to actually think through how much space N rows of a particular schema were going to take up in memory, or how much order of magnitude QPS you can get out of (e.g.) a MySQL server. You're also going to be optimizing to different peculiarities of what is fast to do on this different database technology, which sometimes does not translate directly. In a smaller company (more so in older days, but still in some places) you may also need to deal with the operational challenges of buying and installing a bigger box if it's called for.


I have inherited overengineered unwieldy architectures from programmers who had solved much harder technical problems than making our product successful.

Having to live with this kind of complexity slows you down in ways that you can't even really quantify (because it's difficult to run the required experiments while you're living the situation).

A list of issues it causes in web development (for example):

1. Makes it difficult to experiment with new features or different features in a product because every change is now 10 times as involved as it needs to be.

2. Makes it difficult to hire programmers with less experience because you are working with 3 or 4 complicated technologies (caches, job queues, etc.) when a single database server would do.

3. Makes it difficult to debug problems because you have to identify bugs across $(n \choose 2)$ microservice boundaries (where $n$ is the number of microservices in your architecture), each boundary involves a different set of components to enable communication (REST APIs, GraphQL APIs, Redis, RabbitMQ, Elasticsearch, etc.) each of which perhaps made sense locally but which certainly do not make sense as part of the whole system.

Apologies if I triggered anyone with this comment.


More than overengineering, misplaced engineering.

We have a lot of stuff that "scales" and would not solve a real problem if it didn't "scale", but there is also an equally good solution not in the domain of the person who was tasked with writing it.

So for example, we were distributing a "large" (~100e6) multi-dimensional lookup table to ~30 servers to do some transformation job. The engineer who did it didn't like to think about data formats, so they used JSON. They didn't like to think about cache friendliness, so it was just maps of maps of maps. They didn't look at the actual dataset contents - which was some mostly opaque-to-us customer catalog SKUs - so didn't notice the massive redundancy in it. Implementing their algorithm as a straight line process, yeah, it did really take several hours to run and it was too long and probably required more memory than we wanted to commit.

But the solution wasn't "farm it out to a Hadoop cluster", it was "translate the dataset into some DFA-like things and then run some optimization passes on them" which yeah, was slow to "compile" - like 2min - but once compiled could chew through the rest of the job in another 10min, with sub-1G memory.

Very big / indirect "everything looks like a nail" situation, combined with lots of people who want to rush for the Big Data / ML / whatever stuff and not fundamentals. And the reasons don't seem to be uniform either. I've seen it from careerists and architecture astronauts, but also bootcamp grads who just weren't properly educated, at startups often early non-technical or technically-adjacent employees trying to help out but don't realize the mess they're making, and sometimes people who just don't want to think very hard.


In the case of your JSON lookup table, as long as the damage was to the tune of < $50 and you don't need to run those jobs frequently, I wouldn't spend too much time on it.

I would ask for documentation and put a few notes about what to do if you started needing to run those jobs on a frequent basis - because it can be difficult to predict whether or not that one-off script that someone hacked together is going to become a foundational pillar of your product.


I'm talking about a daily service that's a core business process, not a one-off BI analysis. Way, way more than $50.

It's cute - and another example of not counting right, I guess - how many people think everyone is on AWS or some other elastic compute system. If we want to run a job, we have to find computers for it.


Definitely if it's a daily service, it's totally worth spending the time on automation.

I did make the assumption about AWS/equivalent, didn't realize how much of an unconscious assumption it had become. Sorry about that.


Well, finding computers internally is usually cheaper.


I've worked at places where the AWS bill was obfuscated/otherwise opaque, so an implementing engineer would have no idea if a job cost $50 or $5,000. (Without a lot of research that's outside the scope of their job.)

What tools are you using to display costs to internal developers?


My business is very small and there are only 5 of us writing code. I just screen share the AWS bill periodically and we make sure that things aren't insane.

We will be switching over to our own hardware in data centers towards the middle of next year. At that point, the whole team will need to be more aware of our constraints in terms of access to hardware.


It's easy to fall into the trap of thinking you need to make things complicated because it's the "correct" way to do something.

I once rewrote an internal tool that used AJAX requests to fetch data asynchronously for "performance"; the asynchronous fetches generally took 1-2 seconds because it had to use a super-slow API. The whole thing was a really weird mess of javascript, perl and Python and each component seemed to be tacked on to work around some issue (usually performance) with the other parts of the system.

My version ended up being a single file of terrible go code (it was my first go project) and a few templates. Everything is rendered server-side and directly does read-only queries to the source database, bypassing the horrendous API.

Of course, not using the API is "wrong" and liable to break at any time, but to this day the database schema has never changed such that that would have required me to rewrite the few bits of SQL I used, and the entire thing is fast enough that all user actions are effectively instant, even though the code is fully synchronous and does several database queries. Turns out databases are fast when you query them directly.

Writing that tool made me realize that I prefer writing "badly" engineered tools optimized to be disposable, so that I can scrap and rewrite them really quickly when I need to. Instead of thinking "what do I do if I need to extend this later?" I try to ask myself "How easily can I delete this code?"

I can better focus my effort on writing code that's reasonably performant and does not fail basic security when I don't really need to worry about whether the architecture is "good" or not, because as soon as I realize that it isn't, the bad part gets deleted and rewritten.

Of course, in larger systems you don't often have the luxury of doing full rewrites, but even then the principle is applicable to individual functions, modules, or whatever subcomponents your system has.


>caches, job queues, etc.

This is over engineered? Yikes.


Complex (and usually somehow broken) caches where optimization would suffice is probably the #1 overengineering outcome I see. And at all levels - from caching HTTP proxies all the way down to functions that take five arguments to avoid re-summing a dozen numbers.


Absolutely, if used in a situation that doesn't call for caches or job queues or horizontally scalable NoSQL databases or whatever.


There is a legitimate fear of employees overengineering.

If a candidate spins an AWS service for something a simple Python script could do, I too would not hire.


Same here. As a consultant I often find large “enterprise-style” and I get paid to make them a tenth of the size and with much less redirection. Simple is better.


I have the somewhat weird inverse problem. I've inherited so many services and such that deal with data in the 100MB-10GB range (much of which could even be streamed!) and all have been written to do ridiculous things to distribute the work. They even manage to sometimes make it not fit at first because they're loading UTF-8 into UTF-32 or quadruple indirecting uninterned strings or whatnot. (You ain't seen nothing until you've seen a Java UUID class implemented as `List<Byte>` - with length 32!) Most of the company looked at me like I was crazy when I said we'd be fine loading our next-gen 2TB datasets directly into RAM with some trivial sharding.

I have also worked with people who could cram an unbelievable amount of content into a GBA ROM but struggled to get a 10 node webserver cluster to do 10k/sec.

So yeah, I don't think the knowledge automatically works both ways from either side. Some people know how scaling works, but even more only know how to operate at specific scales.


> Most of the company looked at me like I was crazy when I said we'd be fine loading our next-gen 2TB datasets directly into RAM with some trivial sharding.

FWIW, you can get 2TB of ram on a single node these days. No need to shard.


> Does it seem a little strange that a startup would not hire someone who had solved a harder problem (big concurrent sharded DB system) for not knowing exactly how to solve an easier problem (keep the database in memory)?

Actually, it seems they knew how to keep the database in memory, they lacked an intuition as to the scale at which that solution becomes unviable and openly questioned its applicability when it should (in the retrospective opinion of the applicant and presumably the contemporary opinion of the interviewer) have been intuitively obvious.

Rejecting for that reason seems a kind of dumb though typical symptom of systems that overemphasize easily-acquired contextual familiarity, leading to preferring inflexible hyperspecialists in narrow problem domains and solution approaches.


No it's not strange at all -- I once had a colleague who went on to become a brilliant engineer at Google, but was horrible at the small startup we worked together at, because his solutions were over-engineered, fanciful, totally ignored the day-to-day messy problems we had, and did NOT help further our mission - to make the customer happy.


> did NOT help further our mission - to make the customer happy

I can't help but think he'll fit right in at Google.


Not at all. I'd be leery of somebody who jumped to a big, complicated, expensive solution instead of something simple for a simple problem. Been there, done that, it sucks, and if you're not in a position where you can slap it down early, you end up having to support that poor decision after the person who instigated it has departed.


Scaling down and scaling up are two very different problems. It isn't a given that scaling up is a harder problem than scaling down, it rather depends how far down and how far up.

If they were looking for a "subject matter expert", rather than someone who could become a SME then failing the interview is entirely expected. The techniques for scaling down are unrelated at best and at worst totally the opposite of the techniques used for scaling up.


It's a mindset as there's a hundred decisions you make without even thinking deeply about it. Startups need to build just enough to survive but no more since budget (money, time, etc.) is limited. So if you make decisions that are too scalable then you're likely spending "budget" inefficiently.


While someone with experience in solving a larger-scale problem could certainly be expected to solve one at a smaller scale, the result might be too costly and complex for a company that is not Google to build and maintain.

To use a very rough analogy, if you have a family and need to haul the kids to school and soccer practice, you'd probably want a minivan or small SUV. Sure, you could buy a school bus or minibus and hire someone with a CDL to drive it. That would certainly work, but would it be the right choice?


I think "easier" and "harder" are not used appropriately here. Technologies to be used at different scales of size do not always match to the scales of difficulty. Especially when you are the person to use the technologies, not inventing the technologies.

Maybe there is a sequence of learning, for example you always learn to manage small databases and then go on to the technics to manage larger databases. But again, the sequence of learning does not always match to the scale of difficulty.


A big concurrent sharded DB system is not a harder problem, it is a different problem. Knowing how to solve problems at one scale doesn't transfer very well to other scales.


Without knowing much about the author's time at Google, I'd suspect that they weren't actually responsible for solving the harder problem - they just used a service which did that for them (like spanner, or whatever) and maybe set some configuration values.


Those are not harder or easier problems, those are different problems.


On another similar note, I moved from writing flight software for aerospace to tooling for a media corporation. Others around me made similar moves. Many of us had difficulties realizing that rigor around code quality and infrastructure is a spectrum, a knob you could turn, rather than an absolute standard, and would hold up PRs that didn't demonstrate testing and validation of things like CSS changes to an internal tool. It caused and causes a lot of unecessary heartache.


It's like software engineers struggling to write code on a piece of paper? or without IDE?


How come folks don’t make an open source SAS interpreter? Or maybe a translator to Python?


There is -- or at least used to be -- something called, IIRC, "Data Step". As I understood it, that was an implementation of the base SAS data-set manipulation language. Can't recall if it was OSS or just Freeware, nor if it had any of the statistical PROCs. GIYF, as we used to say (before we knew better).


AKA Don't sweat the small stuff.


Is this mindset the reason why Google makes Android and Chrome worse and doesn't care about the users?


From the title, I was expecting an article on counter-offering in salary negotiations




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: