Having worked in many languages and debuggers across many kinds of backend and front end systems, I think what some folks miss here is that some debuggers are great and fast, and some suck and are extremely slow. For example, using LLDB with Swift is hot garbage. It lies to you and frequently takes 30 seconds or more to evaluate statements or show you local variable values. But e.g. JavaScript debuggers tend to be fantastic and very fast. In addition, some kinds of systems are very easy to exercise in a debugger, and some are very difficult. Some bugs resist debugging, and must be printf’d.
In short, which is better? It depends, and varies wildly by domain.
Whats your prompt processing speed? That’s more important in this situation than output TPS. If you have to wait minutes to start getting an answer, that makes it much worse than a cloud-hosted version.
Prompt eval time varies a lot with context but it feels real-time for short prompts - approx 20 tokens per second but I haven't done much benchmarking of this. When there is a lot of re-prompting in a long back and forth it is still quite fast - I do use KV cache which I assume helps and also quantize the KV cache to Q8 if I am running contexts above 16k. However, if I want it to summarize a document of say 15,000 words it does take a long time - here I walk away and come back in about 20 minutes and it will be complete.
If he is doing multiturn conversations, he can reuse the kv cache from the last turn and skip the prompt processing on the history that would make time to first token too slow, by only doing prompt processing on his actual prompt for the current turn. This turns a quadratic amount of tokens to process into a linear number. I am not sure if this is what he is doing, but that is what I would do if I had his hardware.
Usually, when people think about the prompt tokens for a chat model, the initial system prompt is the vast majority of the tokens and it's the same regardless for many usage modes. You might have a slightly different system prompt for code than you have for English or for chatting, but that is 3 prompts which you can permanently put in some sort of persistent KV cache. After that, only your specific request in that mode is uncached.
Maybe possible, I did not look into that much for Coqui XTTS. What i know is that the quantized versions for Orpheus sound noticably worse. I feel audio models are quite sensitive to quantization.
Seems like they would've recored both in the field, no? If they were recording sap freezing in the field, presumably the mics would pick up on other parts of the tree undergoing stresses and making audible sounds.
For that to have not been the case, either they would've had to freeze sap in the lab, or they would've had to go way out of their way to isolate recordings of just the sap in the field without the rest of the tree (is that even possible with normal recording tech?)
Many seem a bit confused but I have only skimmed the comments.
I don't understand the point of the thing described in the OP (I have not watched the talk, just skimmed the notes), myself. Linux kernels can EFI load themselves; if you want more flexibility than a precompiled kernel command line, or to load from ext4/other non-FAT filesystems, refind exists, fits on the ESP (kernel + initramfs can get big; I keep mine on the ESP but wanting to keep it on a larger ext4 filesystem is very understandable) and is very high quality.
Bootloaders are obsolete in this sense; every OS provides an EFI stub loader, except Linux where kernels are their own EFI stub; nevertheless, distros continue to install GRUB alongside themselves on UEFI systems out of inertia. If Red Hat wants to supplant it... okay, but it can be supplanted today with very good components, even if they weren't invented there.
If I had a nickel for every time the RedHat ecosystem overengineered itself into a corner and decided the only possible solution was more overengineering, I could probably buy IBM.
Many people in this thread are extremely cynical and also ignorant of the actual security guarantees. If you don’t think Apple is doing what they say they’re doing, you can go audit the code and prove it doesn’t work. Apple is open sourcing all of it to prove it’s secure and private. If you don’t believe them, the code is right there.
This is what the “attestation” bit is supposed to take care of—if it works, which I’m assuming it will, because they’re open sourcing it for security auditing.
I would agree, except the seller seems to have made a new forgery of their receipt on the fly in response to Cabel's inquiry, which leads me to believe they probably made the original forgery as well.
It means you compile-in a direct reference to the node that needs to be updated when some property changes, so instead of searching the tree of n nodes to find it, you already have the reference.
The fact that the data exists somewhere is small comfort if users cannot read the privacy implications in the app store itself when they're deciding whether to download an app.
In short, which is better? It depends, and varies wildly by domain.