More

__all__ · 2024-06-18T17:58:33 1718733513

I've carefully read this a few times but I'm not getting the point. What the author wants to show?

__all__ · 2024-06-15T13:50:39 1718459439

I was thinking exactly this while reading the article. Open source goes further than external contributions.

__all__ · 2024-06-14T10:44:58 1718361898

How the author knows its website has hundreds of subscribers? AFAIK is not possible to identify subscribers to RSS feeds and counting hits won't help. Am I missing something here?

susam · 2024-06-14T11:01:09 1718362869

Hi! Some feed aggregators include the subscriber count in the User-Agent header, so I can pick these counts from the access logs and add them up. This is how the logs look:

  [14/Jun/2024:00:03:46 +0000] "GET /feed.xml HTTP/1.1" 304 0 "-" "Feedly/1.0 (+http://www.feedly.com/fetcher.html; 87 subscribers; )"
  [14/Jun/2024:00:06:29 +0000] "GET /feed.xml HTTP/1.1" 304 0 "-" "Feedly/1.0 (+http://www.feedly.com/fetcher.html; 34 subscribers; )"
  [14/Jun/2024:00:09:31 +0000] "GET /feed.xml HTTP/1.1" 304 0 "-" "Feedly/1.0 (+http://www.feedly.com/fetcher.html; 31 subscribers; )"
  [14/Jun/2024:00:16:36 +0000] "GET /feed.xml HTTP/1.1" 304 0 "-" "Feedbin feed-id:2815708 - 3 subscribers"
  [14/Jun/2024:00:29:42 +0000] "GET /feed.xml HTTP/1.1" 304 0 "-" "Feedbin feed-id:2330316 - 3 subscribers"
  [14/Jun/2024:00:40:58 +0000] "GET /feed.xml HTTP/1.1" 304 0 "-" "Feedbin feed-id:1714691 - 8 subscribers"
  [14/Jun/2024:01:21:01 +0000] "GET /feed.xml HTTP/1.1" 200 188077 "-" "Mozilla/5.0 (compatible; BazQux/2.4; +https://bazqux.com/fetcher; 5 subscribers)"
  [14/Jun/2024:01:44:21 +0000] "GET /feed.xml HTTP/1.1" 304 0 "https://susam.net/" "Inoreader/1.0 (+http://www.inoreader.com/feed-fetcher; 24 subscribers; )"

Picking a few days of logs where the subscriber count has not changed much, I get a rough estimate of the total count of subscribers reported by the feed readers like this:

  $ for i in 1 2 3 4 5; do echo $(head -n 1 access.log.$i | grep -o '../.../....') $(awk -F'"' '{print $6}' access.log.$i | sort -u | grep -o '[0-9]* subscribers' | awk '{s += $1} END {print s}'); done 
  13/Jun/2024 335
  12/Jun/2024 335
  11/Jun/2024 336
  10/Jun/2024 334
  09/Jun/2024 337

In case anyone is wondering why we see multiple entries for Feedly and Feedbin in the first log snippet, that's because in an older design of my website, I had multiple sections each serving its own feed at paths like /blog/feed.xml, /maze/feed.xml, etc. Later I consolidated all of them into a unified feed at /feed.xml. So the feed readers still hit the old feed URLs and then get redirected to the unified feed URL.

Erethon · 2024-06-14T10:58:15 1718362695

You can get a rough estimate based on unique IPs hitting the RSS feed. Moreover, some of the online feed readers report the number of subscribers of your feed as part of their User-Agent. An example from my blog logs: `"Feedbin feed-id:2688376 - 9 subscribers"`

skilled · 2024-06-14T10:54:28 1718362468

Some sites like Feedly and others show the subscriber count in the User-Agent string.

__all__ · 2024-06-14T10:58:19 1718362699

Interesting, didn't know. Will check that!

skilled · 2024-06-14T11:01:47 1718362907

In my own logs, the ones that show are Feedly, Inoreader, Newsblur, Feedbin, The Old Reader, and a few small/personal ones.

Of course, they only show the subscriber count for their own platform. And then you can also pool together all the separate requests fetching /feed/ and add it all up.

__all__ · on Oct 3, 2023

So far I have tried with:

- Technical (software and maths) - Psychology and philosophy - Well-being and anthropology

__all__ · on Sept 10, 2023

Sounds interesting, mind sharing?

__all__ · on Sept 8, 2023

> Whatever eventually supplants Postgres is quite likely going to be based on Arrow - polyglot zero-copy vector processing is the future.

Can you elaborate this? I understand it's a very opinionated statement but still I don't see how "polyglot" and "vector processing" could be considered the future of OLTP and general purpose DBMS.

refset · on Sept 8, 2023

Polyglot means not having to fight with marshaling overheads when integrating bespoke compute functions into SQL, or when producing input to / consuming the output from queries. This could radically change the way in which non-expert people construct complex queries and efficiently push more logic into the database layer, and open the door to bypassing SQL as the main interface to the DBMS altogether.

Vector processing means improved mechanical sympathy. Even for OLTP the row-at-a-time execution model of Postgres is leaving a decent chunk of performance on the table because it doesn't align with how CPU & memory architectures have evolved.

__all__ · on Sept 8, 2023

Thanks!

Honestly, I can't envision a near future where SQL is not the main interface. Happy to see the future proving me wrong here though!

Despite I can buy the arguments about how having a better data structure to communicate between processes (in the same server) could help, it's a bit difficult to wrap my mind around how Arrow will help in distributed systems (compared to any other performant data structure). Do you have any resources to understand the value proposal in that area?

Same for vector processing, would be great to read a bit more about some optimizations that would help improving Postgres leaving out pure analytical use cases.

refset · on Sept 8, 2023

> it's a bit difficult to wrap my mind around how Arrow will help in distributed systems

Comparing with the role of Protobuf is perhaps easiest, there's a good FAQ entry [0] which concludes: "Arrow and Protobuf complement each other well. For example, Arrow Flight uses gRPC and Protobuf to serialize its commands, while data is serialized using the binary Arrow IPC protocol".

This will be increasingly significant due to the hardware trends in network & memory (and ultimately storage too) compared with CPUs. I posted about that in a comment a few days ago [1], but it's worth sharing again:

> here’s a chart comparing the throughputs of typical memory, I/O and networking technologies used in servers in 2020 against those technologies in 2023

> Everything got faster, but the relative ratios also completely flipped

> memory located remotely across a network link can now be accessed with no penalty in throughput

The graphs demonstrate it very clearly: https://blog.enfabrica.net/the-next-step-in-high-performance...

> would be great to read a bit more about some optimizations that would help improving Postgres leaving out pure analytical use cases

Unfortunately I don't have a good reference on that to hand but I'll take a look around and reply again soon.

[0] https://arrow.apache.org/faq/#how-does-arrow-relate-to-proto...

[1] https://news.ycombinator.com/item?id=37365816

[2] https://www.singlestore.com/comparisons/postgresql/

refset · on Sept 8, 2023

Okay so on the Postgres question this mailing list thread is interesting: https://www.postgresql.org/message-id/8181205c-69e5-bde7-15e...

I am no expert on Postgres but the thread seems to suggest the default out-of-the-box JIT performance is actually more efficient than a custom vectorized executor that was built for the PoC. That probably rules out any low-hanging optimizations based purely on vectorization for OLTP specifically, but there are undoubtedly many wider ideas that could in principle be adopted to bring OLTP performance in line with a state-of-the-art research database like Umbra (memory-first design, low-latency query compilation, adaptive execution etc.). As usual with databases though, if the cost estimation is off and your query plan sucks, then worrying about fine-tuning the peak performance is ~irrelevant.

__all__ · on May 14, 2023

Is this a DuckDB clone done using ClickHouse project?

qxip · on May 16, 2023

Not a clone, but surely sharing the potential, and an attempt at a viable in-process OLAP alternative for ClickHouse fans and enthusiasts.

__all__ · on Jan 25, 2023

The idea is nice but requires trusting their users a lot. How do they prevent users from setting super high "linker_workers" to be consumed by every agent? This could open the door for malicious users to saturate the entire system...

alejandromav · on Jan 25, 2023

Hi! Alex here, Product Manager at Tinybird.

_peregrine_ is right, we do have control over the linker_workers value for every agent. Agents are not exposed to our users, everything is transparent for them.

As a user, you just need to create a new connection to your Kafka cluster providing your credentials, and then choose the topic you want to ingest into Tinybird. We take care of everything for your, we balance the worker using the approach explained in the blog post.

We can also fine tune it for specific users, or relevant events such as Black Friday. It's a bit different for Enterprise customers, in those cases we can set up dedicated agents and fine tune linker_workers and other parameters to optimize for their use case.

If you have any further questions you can reach us on our public Slack channel or via email.

Thanks!

_peregrine_ · on Jan 25, 2023

I'm pretty sure linker_workers is something that they configure, I don't think it's exposed to the user to configure.

__all__ · on Dec 19, 2021

You bring a topic I have been myself concerned about but never managed to articulate. I'm usually performing quite well at my job, and easily get "special" attention and recognition which is good. At jobs, I tend to start motivated just by the work itself but at some point, after a few victories, recognitions, salary increases, or promotions I discover myself being that guy seeking attention and recognition and start feeling demotivated if I'm not getting it.

Perhaps, that's the big thing to solve. Being attention/recognition dependants doesn't look like being a good professional.

quadcore · on Dec 19, 2021

You were supposed to show any, even slight form of anger if I was right. You say "doesnt look like being a good professional" (good boy), who are you trying to please? Are your parents proud of you?

Simply follow things along those lines. Pay attention to your vocabulary: __all__, special, good professional, ive been doing everything right, etc. Thats how you do basic psychology. Youre subconscious is talking, just listen :) hope that helps

__all__ · on Dec 19, 2021

How do you accomplish that? Does your manager just don't care about you? Do you work for yourself? Are you retired?

k__ · on Dec 19, 2021

I'm a freelancer.