More

wsargent · on Sept 12, 2022

> I'm surprised this is up for debate.

I looked into logging in protobuf when I was seeing if there was a better binary encoding for ring-buffer logging, along the same lines as nanolog:

https://tersesystems.com/blog/2020/11/26/queryable-logging-w...

What I found was that it's typically not the binary encoding vs string encoding that makes a difference. The biggest factors are "is there a predefined schema", "is there a precompiler that will generate code for this schema", and "what is the complexity of the output format". With that in mind, if you are dealing with chaotic semi-structured data, JSON is pretty good, and actually faster than some binary encodings:

https://github.com/eishay/jvm-serializers/wiki/Newer-Results...

wsargent · on Sept 12, 2022

Same! https://github.com/tersesystems/blacklite/

wsargent · on Sept 12, 2022

I see it the other way around: tracing is essentially logging with hierarchy. If you can keep the context of a "parent" span around, then you can log all of the entries out and build up the entire trace from spans, although you can get pretty confused if you don't write them out in the correct order.

However, if you don't have context, then the log entry is the atomic unit -- metrics are log entries with numbers that can be aggregated, spans are log entries with a trace id and a parent span, and "events" in Honeycomb terminology are logs with attributes covering an entire business operation / HTTP request.

wsargent · on Sept 12, 2022

MDC/NDC only reliably works when you don't have asynchronous code (Akka/Vert.x/CompletionStage) through your application. As soon as you start using multiple threads, it becomes significantly more complex to keep MDC context carried over from one thread to the next.

kaba0 · on Sept 12, 2022

Well, that has more to do with reactive programming not playing by the rules. Once Loom arrives with Extent-Local scopes it will solve it very well.

wsargent · on Jan 6, 2022

Here are the benchmarks [1] using a no-op appender. There are some micro-optimizations I can make, but I think this is a pretty good start.

https://github.com/tersesystems/echopraxia/blob/main/BENCHMA...

wsargent · on Jan 3, 2022

I'll add that. I do have the benchmarking from Blindsight, the Scala logging API, and it's on the order of nanoseconds:

https://tersesystems.github.io/blindsight/performance/benchm...

wsargent · on June 22, 2021

I thought we would have different debugging tools.

Specifically, I thought languages would be designed around debugging use cases. The fact that we are still using debuggers and loggers and printlns to debug blows my mind.

wsargent · on Feb 11, 2021

This is neat! I have a tool blacklite[1] that logs to sqlite, so using this I can export those logs automatically to S3 and keep persistent logs even if the instance goes away. I can see this being really useful for k8s and docker containers.

[1]: https://tersesystems.com/blog/2020/11/26/queryable-logging-w...

wsargent · on Nov 28, 2020

I've published a logging appender called Blacklite that writes logging events to SQLite databases. It has good throughput and low latency, and comes with archivers that can delete old entries, roll over databases, and compress log entries using dictionary compression; compression that looks for common elements across small messages and extracts it into a common dictionary.

So that's the sales pitch. Now let's do the fun question – how and why did it get here?

I started off this blog post by writing out the requirements for a forensic logger as if I had total knowledge of what the goal was. But that's not what happened. The real story is messy, discovering the requirements piecemeal, and involves lots of backtracking over several months. I think it's more interesting and human to talk about.

wsargent · on Oct 4, 2020

How does this compare to Amazon Ion?

https://amzn.github.io/ion-docs/

kstenerud · on Oct 4, 2020

Ion almost got it right. The problems are:

- Binary format is big endian. All modern processors use little endian.

- The time format doesn't have a time zone.

- The binary time format is in UTC, while the text format is in local time, and you have to convert (WTF!)

- The binary time format is HUGE.

- Typed nulls seems a little excessive for very little gain.

- No efficient small integer encoding.

- Containers in the binary format are prepended by the element count, which makes progressive container filling impossible.

- No custom types (except maybe s-exps, which are overkill)

- No metadata

- No URI, UUID types

- no markup container

- no references