What I found was that it's typically not the binary encoding vs string encoding that makes a difference. The biggest factors are "is there a predefined schema", "is there a precompiler that will generate code for this schema", and "what is the complexity of the output format". With that in mind, if you are dealing with chaotic semi-structured data, JSON is pretty good, and actually faster than some binary encodings:
I see it the other way around: tracing is essentially logging with hierarchy. If you can keep the context of a "parent" span around, then you can log all of the entries out and build up the entire trace from spans, although you can get pretty confused if you don't write them out in the correct order.
However, if you don't have context, then the log entry is the atomic unit -- metrics are log entries with numbers that can be aggregated, spans are log entries with a trace id and a parent span, and "events" in Honeycomb terminology are logs with attributes covering an entire business operation / HTTP request.
MDC/NDC only reliably works when you don't have asynchronous code (Akka/Vert.x/CompletionStage) through your application. As soon as you start using multiple threads, it becomes significantly more complex to keep MDC context carried over from one thread to the next.
I thought we would have different debugging tools.
Specifically, I thought languages would be designed around debugging use cases. The fact that we are still using debuggers and loggers and printlns to debug blows my mind.
This is neat! I have a tool blacklite[1] that logs to sqlite, so using this I can export those logs automatically to S3 and keep persistent logs even if the instance goes away. I can see this being really useful for k8s and docker containers.
I've published a logging appender called Blacklite that writes logging events to SQLite databases. It has good throughput and low latency, and comes with archivers that can delete old entries, roll over databases, and compress log entries using dictionary compression; compression that looks for common elements across small messages and extracts it into a common dictionary.
So that's the sales pitch. Now let's do the fun question – how and why did it get here?
I started off this blog post by writing out the requirements for a forensic logger as if I had total knowledge of what the goal was. But that's not what happened. The real story is messy, discovering the requirements piecemeal, and involves lots of backtracking over several months. I think it's more interesting and human to talk about.
I looked into logging in protobuf when I was seeing if there was a better binary encoding for ring-buffer logging, along the same lines as nanolog:
https://tersesystems.com/blog/2020/11/26/queryable-logging-w...
What I found was that it's typically not the binary encoding vs string encoding that makes a difference. The biggest factors are "is there a predefined schema", "is there a precompiler that will generate code for this schema", and "what is the complexity of the output format". With that in mind, if you are dealing with chaotic semi-structured data, JSON is pretty good, and actually faster than some binary encodings:
https://github.com/eishay/jvm-serializers/wiki/Newer-Results...