Hacker News new | past | comments | ask | show | jobs | submit login
The Hallucinated Rows Incident (medium.com/epsio-blog)
20 points by dkgs1998 on Aug 20, 2023 | hide | past | favorite | 14 comments



I hope this counts as productive feedback if the author of the blog is reading this - the post you put so much effort into writing truly deserves a better presentation experience than this: https://cdn.fosstodon.org/media_attachments/files/110/923/56...

Using a static site generator is surprisingly simple!


I actually wrote the post in markdown, so even getting it into medium was superficially complicated process


Author is a Branden Sanderson fan, good taste. https://en.m.wikipedia.org/wiki/Yumi_and_the_Nightmare_Paint...


You might even say I have an incredible taste in books.


So, they got a crash because one part of their system made a distinction that another didn't. OK.

What's really left unexplained here is where the need came about for precision to be represented in this way. Like -- at first the article made it seem like they were using (presumably IEEE) floating-point decimals, and then merely serializing them as strings. But floating-point decimals don't include precision; the precision can't vary from one to another in the way described here.

This means that the decimal strings described here are not merely serializations of the decimal floats, rather they're prior to the decimal floats. That leaves the question, then, of where they're coming from. So where, then?


Like I wrote in the article, the engine uses `rust_decimal` to represent arbitrary precision floating point numbers- this is how the engine supports SQL types likes postgres's `decimal`.

We support using ieee754 floating point numbers as well, but those have no special serialization requirements- the structure of a ieee754 floating-point number was designed so that lexicographical comparisons and numeric comparisons give the same results (I took inspiration from that to write the serialization code that solves the issue, which I'll lay out sometime when I write part 2 of this story).


That also suffers from integer overflow if they implement it naively as they seem to show in the blog with a `pub trait Modification: PartialEq + Default + Add<Output = Self>` and using primitive integers.


With `isize` you can support up to 2^63 duplicate rows, which means that if a row is as short as one byte a single diff can represent several exabytes of rows


Why on earth should I have to make an account to read a free article? Why is Medium not banned from HN?


Your website's home page slows my entire computer down to a crawl.


Wow, what's wrong with your computer?


404?


Fixed now. Our software switched the URL because the page contains this:

  <link data-rh="true" rel="canonical" href="https://gist.github.com/ThinkRedstone/79302fcd932659e076842259da9619c8"/>
... but that URL doesn't exist.


As part of importing my markdown into medium, I created a github gist with the markdown of the post- after I imported the post into markdown, I deleted the gist, but it seems medium keeps this tag in the page HTML.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: