Hacker News new | past | comments | ask | show | jobs | submit login
Computer Latency: 1977-2017 (danluu.com)
244 points by 2pEXgD0fZ5cF on Nov 20, 2022 | hide | past | favorite | 101 comments



When I saw this page a few years back I had an idea for a project. I want to create the lowest-latency typing terminal I possibly can, using an FPGA and an LED array. My initial results suggest that I can drive a 64x32 pixel LED array at 4.88kHz, for a roughly 0.2ms latency.

For the next step I want to make it capable of injecting artificial latency, and then do A/B testing to determine (1) the smallest amount of latency I can reliably perceive, and (2) the smallest amount of latency that actually bothers me.

This idea was also inspired by this work from Microsoft Research, where they do a similar experiment with touch screens: https://www.youtube.com/watch?v=vOvQCPLkPt4


I remember when I got the iPhone X I was so used to higher latency that it felt like the iPhone was typing before I was. It was a very strange sensation until I got used to how quick it was.


If you ever end up doing this project, I find that sometimes it's hard to quantify if something is better when going in the normal-to-better direction, but it's always much easier to tell when something is worse when going in the normal-to-worse direction. So spend a few days or weeks getting totally acclimated to fast response times and then test if you can notice the difference with slow response times.


I like the idea, but note that (1) and (2) can depend on what you’re used to. The fact that one doesn’t notice a handicap doesn’t mean that there isn’t room for improvement, given some conditioning.


I believe there are specific methods to negate these effects (something like going in a specific order with the values)


If would probably be interesting to randomize the latency for certain intervals with some kind of feedback mechanism to provide a blind study.


Sounds like a fun project to do, I wonder if you could even implement it in full discrete logic and skip the FPGA


Is there an appreciable practical lower-bound in latency to that? I’ve never understood how-and-why electronic signals can propagate down a wire so gosh-darn quickly: the speed of sound is what I’d have intuitively expected, not 50-99% the speed of light ( https://en.wikipedia.org/wiki/Speed_of_electricity )


Essentially, energy is carried throughout a circuit by the electric field created by a voltage differential, not by electrons pushing other electrons like how sound is transmitted through a medium. So signals propagate on the same order of magnitude as the speed of light. I think this is a relatively intuitive explanation: https://www.youtube.com/watch?v=C7tQJ42nGno


An anecdote that will probably sway no one: was in a family friendly barcade and noticed-- inexplicably--a gaggle of kids, all 8-14, gathered around the Pong. Sauntering up so I could overhear their conversation, it was all excited variants of "It's just a square! But it's real!","You're touching it!", or "The knobs really move it."

If you wonder why we no long we have "twitch" games, this is why. Old school games had a tactile aesthetic lost in the blur of modern lag.


We still have twitch games. Celeste was released only a couple years ago


Celeste is not really the same sort of thing, and compared to e.g. IWBTG fangames it feels a bit like playing a platformer while covered in glue. IWBTG fangames themselves probably feel similar to people who are used to playing Super Smash Brothers Melee for the Nintendo GameCube because those people play on CRTs which eliminates a couple frames of latency compared to LCDs.


I want to play a platformer while covered in glue


Classic example of two steps forward, one step backwards. Though there's schools which are exceptions to the norm.


FWIW, a quick ballpark test shows <30 ms minimum keyboard latency on my M1 Max MacBook, which has a 120-hz display.

  Sublime Text: 17–29 ms
  iTerm (zsh4humans): 25–54 ms
  Safari address bar: 17–38 ms
  TextEdit: 25–46 ms
Method: Record 240-fps slo-mo video. Press keyboard key. Count frames from key depress to first update on screen, inclusive. Repeat 3x for each app.


How do you determine at what point the key switch is activated? Or is the travel time from start to fully depressed negligible compared to measured latency?


You wire an LED to a button (like a mouse left click) and with a 1000Hz camera you can count how many frames it takes for the screen to update after the LED lights up. Repeat many times to account for being in varied stages of the refresh cycle.

Well, that's how it was done 10 years ago.


There is a good app to help with this, "is it snappy".


With that method I would just double check that 240-fps slo-mo video is synced to realtime. It may be applying an unnoticeable slow-motion effect (e.g. 90% speed playback) that would throw off the results

So e.g. put a clock in the video


I wonder if a compositor, and possibly an entire compositing system designed around adaptive sync could perform substantially better than current compositors.

Currently, there is a whole pile of steps to update a UI. The input system processes an event, some decision is made as to when to rerender the application, then another decision is made as to when to composite the screen, and hopefully this all finishes before a frame is scanned out, but not too far before, because that would add latency. It’s heuristics all the way down.

With adaptive sync, there is still a heuristic decision as to whether to process an input event immediately or to wait to aggregate more events into the same frame. But once that is done, an application can update its state, redraw itself, and trigger an immediate compositor update. The compositor will render as quickly as possible, but it doesn’t need to worry about missing scanout — scanout can begin as soon as the compositor finishes.

(There are surely some constraints on the intervals between frames sent to the display, but this seems quite manageable while still scanning out a frame immediately after compositing it nearly 100% of the time.)


For fullscreen apps one can do something even better: skip compositing or buffering entirely. Instead cooperate with the GPU and raster directly into the output buffer ahead of the pixels sent to the display. In wayland that's called direct scanout.

But yeah, for non-fullscreen it helps. See https://github.com/swaywm/sway/pull/5063


Uhm... aren't you basically describing wayland?

This Xorg dude did exactly the tuning you want on wayland https://artemis.sh/2022/09/18/wayland-from-an-x-apologist.ht...


You mean the max_render_time? That’s exactly the kind of kludge I’m suggesting that adaptive sync can eliminate.


Adaptive sync can only delay drawing, never make it happen sooner. This means it can only harm average latency of response to unpredictable events, such as human interaction. (Individual events may have lower latency purely by luck, because latency depends on the position of the raster scan relative to the part of the screen that needs to be updated, and adaptive sync will perturb this, but this effect is just as likely to make things worse.) The lowest average latency is always achieved by running the monitor at maximum speed all the time and responding to events immediately.

Adaptive sync is beneficial for graphically intensive games where you can't always render fast enough, but IMO this should never be true for a GUI on modern hardware.


> Adaptive sync can only delay drawing, never make it happen sooner.

That’s a matter of perspective. If your goal is to crank out frames at exactly 60 Hz (or 120 Hz or whatever), then, sure, you can’t send frames early and you want to avoid being late. But this seems like a somewhat dubiously necessary goal in a continuously rendered game and a completely useless goal in a desktop UI. So instead the goal can be to be slightly late for every single frame, and then if you’re less late than intended, fine.

Alternatively, one could start compositing at the target time. If it takes 0.5ms, then the frame is 0.5ms late. If it goes over and takes 1ms, then the frame is 1ms late.


With text content, most frames are exactly the same. So what adaptive sync can do is delay a refresh until just after the content has been updated. At a minimum, it can delay a refresh when an update is currently being drawn, which would lower the max latency.


The time taken to update text should be negligible on any modern hardware.


The point of the article is that it is not negligible.


Global Ping Data - https://wondernetwork.com/pings

We've got servers in 200+ cities around the world, and ask them to ping each other every hour. Currently it takes our servers in Tokyo and London about 226ms to ping each other.

We've got some downloadable datasets here if you want to play with them: https://wonderproxy.com/blog/a-day-in-the-life-of-the-intern...


The fundamental physical limit to latency caused by the speed of light is gleefully ignored by many web "application" architects. Apps that feel super snappy when hosted in the same region run like molasses from places like Australia. Unless the back-end is deployed in every major region, a significant fraction of your userbase will always think of your app as sluggish, irrespective of how much optimisation work goes into it.

Some random example:

Azure Application Insights can be deployed to any Azure region, making it feel noticeably snappier than most cloud hosted competitors such as New Relic or logz.io.

ESRI ArcGIS has a cloud version that is "quick and easy" to use compare to the hosted version... and is terribly slow for anyone outside of the US.

Our timesheet app is hosted in the US and is barely useable. Our managers complain that engineers "don't like timesheets". Look... we don't mind timesheets, but having to... wait... seconds... for.... each... click... is just torture, especially at 4:55pm on a Friday afternoon.


As a developer in Australia, we are painfully aware of this haha. Plenty of web services that aren’t deployed here feel painfully slow due to the latency costs — regardless of having decent bandwidth today.

Because our product is global, our backends are replicated worldwide too. Otherwise we’d be forcing the pain we go through daily on our users too


Gamers in Australia as well (when not using local game servers). South Africa idem (idem).


Your point is completely valid but physically there still is some room to improve. Hollow core fibers for instance allow light to move one third faster.

With 40 Mm circumference and 300 Mm/s light speed in vacuum you have as a physical limit latency below 70 ms from opposite places in the world.


> Hollow core fibers for instance allow light to move one third faster.

Could you expand on this? The speed of light is constant, no?


In vacuum. In glass it’s significantly lower. Air core fibre brings it back up to about 99% or thereabouts…


Even if you fix where the backend is & use something like Edge workers around the world, you still run into the issue of where the database is hosted. Making all the work useless. Any useful endpoint is going to change some state like the timesheet app.


I've used your ping data before, it was useful to know where to place my servers, and how nice of you to publish a dump as well! If I can wish for mor data: a min-median-max client latencies for all those servers would be swell, but I can see that you might not want to publish the results of that maybe on per month basis? Just a couple of thousand packets every hour should be enough: tcpdump -w stats.pcap -c5000 "tcp[tcpflags] & (tcp-syn|tcp-ack) != 0"


I'm curious if you have a theory or explanation as to why some pings appear to be asymmetric. For example for the following cities, it seems West->East is often faster than East->West:

                Chicago         London         New York
  Chicago       —               105.73ms       21.273ms
  London        108.227ms       —              72.925ms
  New York      21.598ms        73.282ms       —


Seems like those numbers are likely all within a margin of error. If you hover over the times in the table, it also gives you min and max values, which are often +/- 2ms or so.


*Adds to mental list of cool resources that exist on the internet*

Looking at the map with the blue dots, a cool rainy-day project would be to show the pings and pongs flying back and forth :3


I recently had some free time and used it to finish fixing up an Amiga 3000 (recapping the motherboard, repairing some battery damage on the motherboard). I installed AmigaDOS 3.2.1 and started doing things with it like running a web browser and visiting modern web sites.

The usability is worlds better than what we have now, even comparing a 1990 computer with a 25 MHz m68030 and 16 megs of memory with a four core, eight thread Core i7 with 16 gigs of memory. Interestingly, the 1990 computer can have a datatype added which allows for webp processing, whereas the Mac laptop running the latest Safari available for it can't do webp.

We've lost something, and even when we're aware of it, that doesn't mean we can get it back.


Previous discussions:

https://news.ycombinator.com/item?id=25290118 (December 3, 2020 — 454 points, 259 comments)

https://news.ycombinator.com/item?id=16001407 (December 24, 2017 — 588 points, 161 comments)


This could use a (2017) at end of title (no, its not obvious; that'd be based on assumption).


Going through the list of what happens on iOS:

> UIKit introduced 1-2 ms event processing overhead, CPU-bound

I wonder if this is correct, and what's happening there if so - a modern CPU (even a mobile one) can do a lot in 1-2 ms. That's 6 to 12% of the per-frame budget of a game running at 60 fps, which is pretty mind-boggling for just processing an event.


I guess you can waste any amount of time with "a few" layers of strictly unnecessary indirection.

Speaking of games: I had just the other day the realization that we should look into software design around games if we want proper architectures for GUI applications.

What we do today instead are "layers of madness". At least I would call it like this.


Games have privilege of controlling everything from input device to GPU pipeline. Nothing desktop is going to be that vertically integrated easily


> Nothing desktop is going to be that vertically integrated easily

Why? Are there any technical reasons?

I think this is a pure framework / system-API question.


Only things I can think of is that for windowed apps, you have to wait for the OS to hand you mouse events, since the mouse may be on another window, and you have to render to a window instead of directly to the framebuffer.


Which brings us back to "system APIs".


Has anyone else used an IBM mainframe with a hardware 327x terminal?

They process all normal keystrokes locally, and only send back to the host when Enter and function keys are pressed. This means very low latency for typing and most keystrokes. But much longer latency when you press enter, or page up/down as the mainframe then processes all the on-screen changes and sends back the refreshed screen (yes, you are looking at a page at a time, there is no scrolling).

Of course, these days people use emulators instead of hardware terminals so you get the standard GUI delays and the worst of both worlds.


Using emacs on an SGI Iris in 1988 was … sublime.

Every computer systems since then has been a head shaking disappointment, latency-wise.


Something I recently observed is that cutting edge, current generation gaming-marketed x86-64 motherboards for single socket CPUs, both Intel and AMD, still come with a single PS/2 mouse port on the rear I/O plate.

I read something about this being intended for use with high end wired gaming mice, where the end to end latency between mouse and cursor movement is theoretically lower if the signal doesn't go through the USB bus on the motherboard, but rather through whatever legacy PS/2 interface is talking to the equivalent-of-northbridge chipset.


Some still have two, for a keyboard and mouse:

https://static.tweaktown.com/content/1/0/10071_10_asus-rog-m...

Latency is lower because it's interrupt-based and much simpler than the polled USB stack. IMHO if you're going to always have a keyboard and mouse connected to the computer, it makes perfect sense to keep them on the dedicated simpler interface instead of the general USB; especially when the dedicated interface will be more reliable. The industry may be partly moving away from the "legacy-free USB everything" trend that started in the 2000s, finally.

AFAIK all SuperIOs support a pair of PS/2 ports, so from a BoM perspective it's not an extra cost to the manufacturer, but they still market it as a premium feature.


I'd like to see older MS-DOS and Windows on there for comparison; I remember dualbooting 98se and XP for a while in the early 2000s and the former was noticeably more responsive.

Another comparative anecdote I have is between Windows XP and OS X on the same hardware, wherein the latter was less responsive. After seeing what GUI apps on a Mac actually involve, I'm not too surprised: https://news.ycombinator.com/item?id=11638367


Bare C GUI applications with somewhat modern integration are always that 'verbose' when compared to older or framework-based ones.

I do wonder about MS-DOS on different machines, technically it would be the BIOS and VBIOS doing much of the heavy lifting so vendor variations might have a real impact here. Same as a CSM on UEFI; DOS wouldn't know the difference, but traversing a firmware that as as complex (or more complex) from DOS would cause a whole lot of extra latency.


Powershell isn't a terminal (it's a shell, obviously), so the windows results are most likely tested in conhost. If it's on windows 11 it might be windows terminal, which may be more likely since I think cmd is still default on windows 10.


It might still be a valid test, because PowerShell needs to have a bunch of code in the stack between the keypress event and the call into the console API that actually displays the character. Among other things, the entire command line is getting lexically parsed every time you press a key.


If you think "parsing the command line" should or does take appreciable time on a human timescale when executed by a modern superscalar processor, then your mental model of computer performance is "off" by at least 4 or 5 orders of magnitude. Not four or five times incorrect, but many thousands of times incorrect.


Just for context, I worked on the feature team for a lot of early versions of PowerShell, so I kind of know where the bodies are buried. Here are some empirical numbers, just for the parsing part:

10K iterations of $null=$null took 38 milliseconds on my laptop.

10K parses of the letter "a" took 110 milliseconds.

10K parses of a 100K character comment took 7405 milliseconds.

10K parses of a complex nested expression took just over six minutes.

You're probably imagining a lexer written in C that tokenizes a context free language and does nothing else. In PowerShell, you can't run the tokenizer directly, you have to use the parser, which also builds an AST. The language itself is a blend of two different paradigms, so a token can have a totally different meaning depending on whether it's part of an expression or a command, meaning more state to track during the tokenizer pass.

On top of that, while it was being developed, language performance wasn't a priority until around version 3 or 4, and the main perf advancement then was to compile from AST to dynamic code for code blocks that get run a minimum number of times. The parser itself was never subject to any deep perf testing, IIRC.

Plus it does a bunch of other stuff when you press a key, not just the parsing. All of the host code that listens for the keyboard event and ultimately puts the character on the screen, for example, is probably half a dozen layers of managed and unmanaged abstractions around the Win32 console API.


All of those work out to be microseconds per parse, a far cry from the ten+ milliseconds that would be noticeable to humans.


The test is valid for any combo of shell and terminal, it's just a matter of figuring out which methodology was used so it can be better understood.

But yeah, I agree with the other comment that powershell is likely adding less than 1ms.


I just measured 3ms from a simulated keyboard event (through COM) to presence of the character in the console buffer, so that's OS time + PowerShell time without keyboard or screen time. Unfortunately measuring the same through CMD or a custom console app is more work than I care to put in tonight, so who knows what the real delta would be.


I always thought that Apple ][ + was as good as it gets. It's been downhill from there, for Apple and for the rest of us.


Once I got good at typing on it my Acorn Electron (we couldn’t afford the whizzy bbc master!) was an extension of my brain.

Instant response. A full reboot was a control break away. Instant access to the interpreter. Easy assembly access.

I thought, it executed.

I remember our school moving from the networked bbc’s to the PC’s and it was a huge downgrade for us as kids. Computer class became operating a word processor or learning win 3.11 rather than the exciting and sometimes adversarial (remote messaging other terminals, spoofing etc) system that made us want to learn, to just more drudgery.


I agree with all of this except for one point:

Having an ordinary key on the keyboard that would effectively kill -9 the current program and clear the screen was a crazy design decision, especially for a machine where saving data meant using a cassette tape!


The break key unless you held down control was only a soft break though.

Your program would still be in memory with an \>OLD command.

As long as it was a basic prog, the machine code loaded *RUN was lost and had to be reloaded from tape, yes.

A pain for games but I don’t really recall accidentally pressing the break key much it was out of the way up right.

I could talk about this all day!


It’s true that you could get a basic program back with old.

But any data was lost and I saw break get pressed accidentally fairly often at school and amongst friends.


Fair. Not everyone spent 12 hours per day on their computer like me! They probably had friends and stuff. :)


I was shocked to see the TI-99/4a so high up. Just listing a BASIC program on a TI-99 is about as slow as a 300 baud modem.

Example: https://youtu.be/ls-PxqRQ35Q?t=178


iPads predict user input https://developer.apple.com/documentation/uikit/touches_pres... . Did they do this back when this article was written or is this a newer thing that lets them get to even lower user perceived latencies than 30ms?

In general, predicting user input to reduce latency is a great idea and we should do more of it, as long as you have a good system for rolling back mispredictions. Branch prediction is such a fundamental thing for CPUs that it's surprising to me that it doesn't exist at every level of computing. The JavaScript REPL's (V8's REPL) "eager evaluation" where it shows you the result of side-effect free expressions before you execute them is the kind of thing I'm thinking about https://developer.chrome.com/blog/new-in-devtools-68/#eagere...


Yet more proof we should have just stopped with the SGI Indy


There is hardware input ( Keyboard and mouse ) latency as well as output like display latency. Unfortunately the market, and industry as a whole doesn't care about latency at all.

While I am not a fan or proponent of AR / VR. One thing that will definitely be an issue is latency. Hopefully there will be enough incentive for companies to look into it.


Isn't this experiment a bit bogus? Extrapolating a terminal emulator's behavior to represent a machine's latency /in general/... what if the terminal emulator just sucks? Dan Luu is of course aware of this but he's willing to swallow it as noise:

> Computer results were taken using the “default” terminal for the system (e.g., powershell on windows, lxterminal on lubuntu), which could easily cause 20 ms to 30 ms difference between a fast terminal and a slow terminal.

If that was the only source of noise in the measurements then ok, maybe, but compounded with other stuff? For example, I was thinking: the more time passes, the further we drift from the command-line being the primary interface through which we interact with our computer. So naturally older computers would take more care in optimizing their terminal emulator to work well, as it's the face of the computer, right? Somebody's anecdote about PowerShell performance in this thread makes me feel more comfortable assuming that maybe modern vendors don't care so much about terminal latency.

Using the "default browser" as the metric for mobile devices worries me even more...

I like Dan Luu and I SupportThismessage™ but I feel funny trying to take anything away from this post...


Should it be personal computer latency.

Wonder about hat as we just talked about the importance of sub-second response time in 1990s (full screen 3270 after hitting enter; even if no ims or db2 how can it be done …). The terminal keyboard response is fine (on 3270). Network (sna) …

1977 still have mainframe and workstation.


On my state-of-the-art desktop PC, Visual Studio has very noticeable cursor&scrolling lag. My C64 had the latter as well, but I used to assume the cursor moved as fast as I could type / tap the arrow keys


I really found this valuable, particularly the slide at the top that enables you to visualize low level latency times (Jeff Dean numbers) over the years. tl;dr: not much has changed in the processor hardware numbers since 2012. So everything right of the processor is where the action is. And sounds like people are starting to actually make progress.

https://colin-scott.github.io/personal_website/research/inte...


I didn’t quite catch why we have 2.5 frames of latency and not just up to one frame of latency.


So much added sluggishness and they still cannot bring themselves to show us a current dynamic keyboard mapping to this day.


What's a current dynamic keyboard mapping?


Things (a dialog/popup box?) that let you see what each key are mapped to based on its current window and/or its mouse position.


I wonder how this was all measured.

I didn't dig into the text blob to ferret that out.

Did anybody?

Because this doesn't pass the sniff test for data I want to trust


He used the 240 FPS camera on an iphone to measure time from start of key movement to screen update several times and rounded to the nearest 10ms. He also used a 1000 FPS Sony camera to recheck the 40ms and under results.

He does mention that he includes the keyboard latency; other latency test results he found exclude that step.

I find it fascinating that you think no one would bother to read details about a nerdy subject on Hacker News. Why else are we here?


I find it fascinating that a nerd wouldn't open with a description of their methods, which is what nerd readers want to know right up front. It's buried on the last page.


That is because latency on it's own is an often useless metric.


Cynic comment ahead, beware!

---

Does this actually even matter today when every click or key-press triggers dozens of fat network request going around the globe on top of a maximally inefficient protocol?

Or to summarize what we see here: We've build layers of madness. Now we have just to deal with the fallout…

The result is in no way surprising given we haven't refactored our systems for over 50 years and just put new things on top.


If you aren't familiar, check out Winning Run [1]. A 3D arcade racing game from 1988, about the best possible with custom hardware at the time. Graphics quality is primitive by modern standards. But make sure to watch the video in 60 fps. If there's any hiccups, it's your device playing the video. Smooth and continuous 60 frames per second rendering, with some tens of millisecond delay to respond to game inputs. It's still very hard to pull that off today, yet it's fundamental to that type of game's overall quality.

[1] https://youtu.be/NBiD-v-YGIA?t=85


WipEout HD on the PS3 managed to get super stable 60FPS at 1080p. It dynamically scales the horizontal rendering resolution for every frame and then scales it to 1920 pixels using hardware. So the resolution might vary a bit but at that framerate and such speeds in races it's not noticeable. The controls were super smooth at any speed, only achievement popups caused the whole game to freeze for half a second.


RPCS3 emulator can play it at 120Hz, I recommend.


I guess keyboard latency is also the biggest problem if you play old games in emulators. I feel is often very difficult to play old action games, because you can't hit the buttons exactly enough.


I'm using a Razer Huntsman V2 keyboard, which has 8kHz polling and optical switches. I do not notice any obvious latency from it, and the specification claims sub-millisecond latency from switch activation point. This is better performance than is possible from a PS/2 keyboard, because the PS/2 interface is bottlenecked by the slow serial link.


Just for my own Reference, the link [1] below, 8000hz is 0.125ms. And they show 0.2ms as reference. I believe ( Need to Check up on this ) even with USB 3.0 you will require custom driver to achieve 30-50us per packet transfer.

So that is about 0.175ms.

I know people might think this over the top, but AFAIK this is far cheaper than say to work around latency within the software system. Next we need to work on Display Latency and Display Connection Latency.

[1] https://www.razer.com/gaming-keyboards/razer-huntsman-v2


This video is not a steady 60FPS. Lots of frames are duplicated or torn. Maybe this was originally 60FPS and got mangled by the recording process.


Yes. Upon close re-watching, I notice several sections of relative frame-drop. Still, there's long sections of low-complexity animation that are > 30 fps. My point was that, for certain interactive tasks that are governed by human visual response time, there is a hard limit of maybe 50 ms. Such a period was just barely adequate to render enough for a convincing 3D animation ~40 years ago. But even today, only so much can be done in that much time.

For completely imperceptible computing, display refresh must be dealt within 50 ms, roughly. Input must be sampled, all relevant computations for the current rendered display must be computed, or in easily accessed storage, and all updates to the display propagated to the framebuffer. Sensitive humans can notice as little as roughly 50 ms of lag or jitter.

This means for a program dealing with highly interactive graphics that are linked to an input device, the core event loop must execute in less than 50 ms or so. Even with current blazing-fast machines, this is a tough challenge for anything complex. If this deadline is not met, potentially perceptible lag in rendering will occur. In a 3D rendered scene, the graphics may perceptibly hang or tear for a couple frames. This is perceptible by a human, though with sustained suspension of disbelief, we can mostly ignore it, much as we can ignore the various visual artifacts in 24 fps cinema.


That inefficient network has better latency than your computer when trying to show you a pixel: <http://newstmobilephone.blogspot.com/2012/05/john-carmack-ex...>


Only that such a network call can't replace the pixel output.

It just adds up to the overall latency.

Also real latency of web-pages is measured in seconds these days. People are happy when they're able to serve a request in under 0.2 sec.


Fifteen years ago I used to target 15ms as seen in the browser F12 network trace (not as recorded on the server!) and if I mention such a thing these days people are flabbergasted.

For example, I had a support call with Azure asking them why the latency between Azure App Service and Azure SQL was as high as 13ms, and they asked me if my target user base was "high frequency traders" or somesuch.

They just could not believe that I was expecting sub-1ms latencies as a normal thing for a database response.


> if I mention such a thing these days people are flabbergasted

I think I'm just learning this the hard way, given the down-votes of the initial comment. :-)

Maybe people really don't see the issue with adding layers after layers of stuff, and that we've reached, no, surpassed even, some tragicomic point? Computers are thousands of times faster, yet the end-user experience becomes more sluggish with every year passing. We have an issue, I would say. And it's actually not even funny any more.


I worked at a customer that upgraded to major new MySQL version and boosted the hw to 4x more performant (ram+cores). Result: their average transaction time climbed from 0.1ms to 0.2ms and all support personnel started complaining since the their software started taking 4 seconds to load a new screen instead of 2 seconds. The support systems operated on raw data with no cache to show the actual state.

Managed to fix the problem and restore performance by suggesting we disable the MySQL query cache which was slowing down the system with more cores.

And this was nothing to do with banking or high frequency anything. Just code that did lots of small queries to fill a screen full of data to support persons.


> Managed to fix the problem and restore performance by suggesting we disable the MySQL query cache which was slowing down the system with more cores.

This approach for a fix sounds very unintuitive.

Did you manage to understand why this was like that? The explanation is likely interesting!

What I could think of: Context switches across cores constantly invalidated data in the CPU caches. In such a case CPU pinning would help maybe. (Just speculating! I'm not an expert on such things. But I know that the CPU cache, and memory I/O in general, is the single most important topic when dealing with usual performance issues on modern CPUs. Today's CPUs are only fast when they have their data available in their local cache(s). Fetching form RAM is by now like fetching from spinning rust a decade ago; it will kill your performance no mater how fast your CPU is).

Tangentially related: https://news.ycombinator.com/item?id=14888360


The query cache was behind a single lock. Thus while fast, it serialized all DB operations. And the more CPUs you have the slower the performance. Especially since any write has to go through the cache and invalidate all affected queries.

It was meant to accelerate PHP code in the 90s with single core CPUs. It was never designed to be scalable and in-development (back then) versions had already disabled it, which gave me the idea that it was something to try.


Thanks!

Indeed unexpected. But interesting to know.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: