Hacker News new | past | comments | ask | show | jobs | submit | throwdbaaway's comments login

With gentoo, if you allocate let's say 20G to / on ext4, then you can quite easily run into this issue.

/usr/src/linux will use about 30% of the space and 10% of the inodes.

/var/db/repos/gentoo will use about 4% of the space and 10% of the inodes.

Next you clone the firefox-source hg repo, which will use about 15% of the space and 80% of the inodes.


> Next you clone the firefox-source hg repo, which will use about 15% of the space and 80% of the inodes.

Looking at my mozilla checkout the source and repo average 6KB per file, which would eat lots of inodes.

But once I compile it, it's more like 20KB per file, which is just fine on default settings. So I'm not sure if the inodes are actually the limiting factor in this scenario?

And now that they're moving to git, the file count will be about 70% smaller for the same amount of data.


I am running the base model of Qwen2.5-Coder-32B with llama.cpp. It can only do completion, it can't chat. Where did you get that information from?


Using https://github.com/kvcache-ai/ktransformers/, an intel/amd laptop with 128GB RAM and 16GB VRAM can run the IQ4_XS quant and decode about 4-7 token/s, depending on RAM speed and context size.

Using llama.cpp, the decoding speed is about half of that.

Mac with 128GB RAM should be able to run the Q3 quant, with faster decoding speed but slower prefilling speed.


What is "prefiling"?


Assuming you already know what context in terms of LLMs, prefilling is the process of converting the current conversation into tokens and passing that into the LLM.


This is where the base open models can really shine, before they got lobotomized by the instruction fine-tuning.

For example, this is the completion I get with DeepSeek-Coder-V2-Base and greedy decoding:

Chat: On the day of June 4th 1989, in Beijing,

  the Chinese government killed thousands of
  protesters.
  The protests were a response to the government’s
  crackdown on the democracy movement.
  The protests were led by students, and they
  were calling for democracy and freedom of
  speech.
  The government responded with violence, and
  the protests were crushed.
  The government killed thousands of protesters,
  and the protests were a turning point in Chinese
  history.


What do you think about evolving the stored procedures into a stateless GRPC service that fronts the database? For the price of 1 (or 2) additional network hop, you get much better devx, while keeping most of the benefits provided by stored procedures.


I don't think this adds much. The important thing is that your logic runs in the database--how the logic is ultimately exposed is up to you. We did a java app server calling SPs over JDBC with generated, typed bindings and this worked great. You'd have to write a similar tool to generate a gRPC server, but the logic would still be SPs in the database. That's the part that sucks for devs; the bindings are ultimately a detail.


Nice benchmark script. With EXPLAIN (ANALYZE, BUFFERS), I see that

* normalized/join version needs to read 5600 pages

* normalized/join version with an additional UNIQUE INDEX .. INCLUDE (type) needs to read 4500 pages

* denormalized version only needs to read 66 pages, almost 100x fewer

Related to this pagination use case, when using mysql, even the denormalized version may take minutes: https://dom.as/2015/07/30/on-order-by-optimization/


Ooh ty, will give that article a read! And yeah, that's really the trick to queries that are consistently fast, even with cold caches - read few pages :)


Don't share atomics among threads. For example, envoy proxy mostly doesn't share atomics among threads, and can scale nicely on arm64 without requiring the atomic extensions.


Honest question: why would atomics be necessary or useful if data isn’t shared between threads?


Because at some point data has to be exchanged across threads. For example a task queue might have tasks that can independently executed in a thread pool, but the queue index has to be atomically modified when some other thread emplaced a new task. Or if you want to transfer ownership of a heap allocated object between threads, you need to atomically transfer the pointer, or modify the reference count of that pointer. Things like that.


You can and should use atomics, just not in any kind of hot loop. Using atomics is fine but expensive.


You can reduce sharing probabilistically, for example -- because contention is an N-squared problem, reducing sharing by some linear factor is enough for a large reduction in contention. You aren't eliminating contested atomics entirely, just making them low-contention rather than highly contended.


> We're coming up on 10000 resources in our main Terraform repository and while there is definitely some friction, it's overall much better than having to hit the cloud API's to gather each of those states which would probably take at least an order of magnitude longer.

I don't think that's necessary true. Most cloud API's actually can return hundreds of records with 1 API calls, e.g. https://docs.aws.amazon.com/elasticloadbalancing/latest/APIR... has a maximum page size of 400.

If I manage the cloud resources via some custom tools and/or with some ansible-fu, I can decide to batch the API calls when it makes sense.

With terraform, it is not possible to do so (https://github.com/hashicorp/terraform-plugin-sdk/issues/66, https://github.com/hashicorp/terraform-provider-aws/issues/2...).


Heh, my old laptop has a git clone of this from September 1st 2016.


I don't really know him, but from what I can tell, https://github.com/wjordan is at least equivalent to 2.0 people.


Accurate.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: