i see you are using strlen in the inner loop in simdjson which is an extra O(n) ...

peterohler · on July 5, 2020

I tried to make the test realistic. Agreed that in a benchmark where the string is the same every time the length could be hard coded but in a real world scenario the length of a string is not known until you get the length.

As for the length of the test, I also tried with 10 times the number of iterations with the same results but then the simbbench takes almost a minute to complete. I figured 5 seconds was good enough to get the results and someone can bump up the iterations if they want to.

glangdale · on July 5, 2020

As discussed elsewhere, simdjson is comfortable with the notion that JSON files might be of different sizes, but does rather better if you can pad the buffers that it reads data into. This is required to avoid the small but real risk that it reads over a page boundary in its SIMD loops.

The test is not realistic. Measuring performance on small cases is a legitimate use case, of course, but using the same data every time means that both ojc and simdjson will train the branch predictor almost perfectly on any modern architecture. This will help either parser during its "branchy" code, which is all of ojc and a significant portion of simdjson (despite all our yelling about how great SIMD is, once we have tokens, we go to a fairly traditional state machine).

On a tiny message, startup costs are probably more significant. Fixing buffer alignment, -O3, and working with larger messages I would not be shocked to see a factor of 20x. There's nothing wrong with ojc but it is a typical parsing strategy that isn't all that different from the other 'normal' JSON parsing libraries that we evaluate (RapidJSON, sajson, dropbox, fastjson, gason, ultrajson, jsmn, cJSON, and JSON for Modern C++).

It's not clear to me why you were so convinced that your library, by extension, would run 10-25x faster than all these other libraries. With the exception of RapidJSON and sajson - which have some performance tricks to go faster - most of these libraries read input in the same way.

gliptic · on July 5, 2020

If you read the docs, you can see that the string must be allocated to len+SIMDJSON_PADDING bytes. I suppose that still works by accident because it's followed by other static data.

EDIT: Actually, there is a parameter 'realloc_if_needed' that defaults to true and that will reallocate the string because it assumes you haven't padded it. You're likely just measuring malloc performance.

peterohler · on July 5, 2020

I suspect the need to reallocate in simdjson is a factor in performance but that is really part of the benchmark. When using a parser it would be a very rare case to have the luxury of embedding the string to be parsed in the code. Usually a string comes in through some other means such as over HTTP or maybe a simple socket. Then again it could be passed from some other part of an application. If the parser has a requirement to expand the string to modify it in some fashion then a realloc or really a malloc and copy is part of the parsing process.

jkeiser · on July 5, 2020

simdjson doesn't care what is in the padding and won't modify it; it just needs the buffer the string lives in to have 32 extra addressable (allocated) bytes. It doesn't ever use the bytes to make decisions, but it may read them as certain critical algorithms run faster when you overshoot a tiny bit and correct after.

Most real world applications like sockets read into a buffer, and can easily meet this requirement.

If you are interested to know what it's for, the place where it parses/unescapes a string is a good example. Instead of copying byte by byte, it generally copies 16-32 raw bytes at a time, and just sort of caps it off at the end quote, even though it might have copied a little extra. Here's some pseudocode (note this isn't the full algorithm, I left a out some error conditions and escape parsing for clarity):

  // Check the quote
  if (in[i] == '"') {
    i++;
    len = 0;
    while (true) {
      // Use simd to copy 32 bytes from input to output
      chunk = in[i..i+32];
      out[len..len+32] = chunk;

      // Note we already wrote 32 bytes, and NOW check if there was a quote in there
      if (int quote = chunk.find('"')) {
        len += quote;
        break;
      }
      len += 32; // No quote, so keep parsing the string
      i += 32;
    }
  }

glangdale · on July 5, 2020

Yes, this is exactly why simdjson wants padding. It certainly doesn't need the string to be embedded in source or any such nonsense.

I wish there was a standardized attribute that C++ knew about that pretty much just said "hey, we're not right next to some memory-managed disaster, and if you read off this buffer, you promise not to use the results".

It is awful practice to read off the end of a buffer and let those bytes affect your behavior, but it is almost always harmless to read extra bytes (and mask them off or ignore them) unless you're next to a page boundary or in some dangerous region of memory that's mappped to some device.

This attribute would also need to be understood by tools like Valgrind (to the extent that valgrind can/can't track whether you're feeding this nonsense into a computation, which it handles pretty well).

gliptic · on July 5, 2020

You don't have to embed the string in the code, obviously. If you want to use a certain lib, I'm sure you can arrange for the buffer to be slightly bigger.

hyperman1 · on July 5, 2020

Sorry, this is not realistic in C. The moment you get the json data, you have to also get the length, or you could not allocate enough memory for it.

Also, number of iterations is not the issue. Length of the input is. What happens if you r json is a few kb, is bigger than l1/l2 cache, ....

peterohler · on July 5, 2020

Have you tried running the tests with a pre-calculated length? When I made that change there was no noticeable difference. The iterations/msec went from 195 to 192 which is not statistically significant.

I have no doubt a more complete set of benchmarks could be made with various sized JSON and from file as well as string. This was put together quickly to get an answer to the questions raised in this post. Since it does seem to be a topic of interest I'll expand and cleanup the tests in the future but for now it does give at least one data point.

You might notice that the somdjson::dom::parser is reused. Without that optimization to allow warming up the iteration/msec was only 160.

hyperman1 · on July 5, 2020

Nope. You've got a code review from an interested random stranger on the net that spend 10s looking at your code, who noticed something suspect is going on.

Given your response, I spent 10s extra reading the simdjson docs and noting you violate the SIMDJSON_PADDING requirement. So either your code is a crash waiting to happen, or you use a very non- optimal code path in simdjson that requires it to re-copy all data.

That's also the maximum amount of attention you'l get from this random stranger. my time is up ;-)

peterohler · on July 5, 2020

The code example followed was in the simdjson basics.md. The example in the error handling example left off the realloc_if_needed argument which does default to true. I have updated the tests to be explicit that third argument is set for clarity.

As for the path in simdjson being non-optimum and having to copy bytes, that should be expected if the string is to be modified. The buf argument type is a const after all so it should not be modified. In any case, glangdale's code is clean and solid so I doubt his code is anything but optimized and the examples correct.

hyperman1 · on July 5, 2020

You can thank the cat on my lap for some bonus attention. I'll try to answer some points in this and other threads of yours. All this from a simdjson non-expert with a reasonable amount of C experience.

Suppose there is a socket with json data. What you do is, at init you create 1 or more buffers of, say, 64kb+required padding. Then, when epoll or whatever says there is data, you call read on this buffer. This gives you a length.

At this point, you have a padded buffer and a length, so requirements for simdjson(reallocifneeded=false) are met, so you can now parse at full speed. When done, reuse the buffer for the next epoll/read cycle.

There are complications, of course. Data is not guaranteed to arrive all together in 1 read call. There might be a http library feeding you. All of this amounts to mostly buffering and chunking. When carefull, they can be solved in a mostly zero copy way, ready for optimal simdjson.

The example simdjson code you refer to is a kind of demo mode. It gets you of the ground quickly, but is far from optimal.

I assume simdjson does not write to the buffer. It just reads more than 1 byte at once, so if it reads say 8 bytes and there is only 1 left, it will read 7 bytes of random junk. And discard them when it notices its unheeded optimism. However, it needs to be allowed to read these extra bytes without causing a SISEGV, hence the padding.

UPDATE all of this an educated guess, the simdjson authors are welcome to fix/finetune whatever I said.

hyperman1 · on July 5, 2020

Seeing the downvotes I assume something is wrong in here. Feel free to add a correction.

svnpenn · on July 5, 2020

The issue is youve been pretty arrogant/rude over multiple comments. Talking about your time like its some kind of gift from the heavens.

hyperman1 · on July 5, 2020

Ok sorry. Apologies to everyone involved, the author first of all. It wasn't ment like that, for what its worth, but it clearly came out like that.