Can you easily chain these, though? (gzcat some.txt|grep foo|sort -u|head -10 etc?). Especially lazily, if the uncompressed stream is of modest size, like a couple of gigabytes?
I'm not sure what you mean by lazily here, but internally[0] it creates real anonymous pipes[1] between the spawned processes, so the data does not go through the ruby process at all.
I'm currently working with 150MB worth of gzipped JSON - marshalling the full file from JSON to ruby hash eats up a lot of memory. One tweak that allows for easier lazy iteration over the file (while keeping temporary disk Io reasonable) is to pipe it through zcat, jq in stream mode to convert to ndjson, gzip again - for a temp file that ruby zlib can wrap for a stream convenient for lazy iteration per read_line...).
Generally marshalling a gig or more of JSON (non-lazily) takes a lot of resources in ruby.
Some do, some don't. JSON is a special case as a valid JSON file needs to be a single array or object literal - event driven (SaX style) parsing needs to be a hack (like jq stream mode). In theory json_streamer or yajl should help, but I couldn't get a combination to return a proper lazy iterator.
With file as ndjson it was easier, if a little sparsely documented (Zlib::new or #wrap?):
my_it = Zlib::GzipReader.wrap(some_ndfile).lazy
obs = my_it.each_line.lazy.map do |line|
JSON.parse line
end.first(4)
When we can get a line at a time marshalling the whole line isn't an issue.
My issue is more that it is tricky to nest ruby IO objects and return a lazy iterator - especially nesting custom filters along the way - at least more tricky than it should be.
Apparently there's a third party frame work that does seem promising:
Didn't realize that! That's one snippet I can maybe eliminate now. (As to why I didn't know: the first thing in the RDoc for Kernel#system is still "see the docs for Kernel#spawn for options" — and then Kernel#spawn doesn't actually have that one, because it doesn't block until the process quits, and so returns you a pid, not a Process::Status. I stopped looking at the docs for Kernel#system itself a long time ago, just jumping directly to Kernel#spawn...)
But come to think of it, if Kernel#system is just doing a blocking version of Kernel#spawn → Process#wait, then shouldn't Process#wait also take an exception: kwarg now?
And also-also, sadly IO.popen doesn't take this kwarg. (And IO.popen is what I'm actually using most of the time. The system! function above is greatly simplified from the version of the snippet I actually use these days — which involves a DSL for hierarchical serial task execution that logs steps with nesting, and reflects command output from an isolated PTY.)
> How do you "prove" that other people are conscious?
For sentience scientists mainly look at behavioral cues:
> For example, "if a dog with an injured paw whimpers, licks the wound, limps, lowers pressure on the paw while walking, learns to avoid the place where the injury happened and seeks out analgesics when offered, we have reasonable grounds to assume that the dog is indeed experiencing something unpleasant." Avoiding painful stimuli unless the reward is significant can also provide evidence that pain avoidance is not merely an unconscious reflex (similarly to how humans "can choose to press a hot door handle to escape a burning building").
Exactly. All of that is reasonable and the behavior described are obviously present as anyone who's ever had a dog would tell. So I don't understand why "are animals conscious" is being debated at this point.
I’m not saying that this is the case, but all the mentioned behaviors are only indicators and could also be reflexive actions which the dog is genetically programmed to do because they work. If a beetle is flipped, it also has a “program” to get upright again, but that doesn’t mean it’s aware of its situation and is actively deciding something. I’m pretty sure dogs are conscious, but you can’t really tell from the outside. LLMs also appear to reason and make arguments but I wouldn’t call them conscious.
You’re right, you also can’t tell for other people. You can make an assumption because they are very similar to you and you yourself appear to be conscious to yourself. But you can’t really disprove solipsism as far as I know.
> Google currently shares data across its services for the purposes described in its Privacy Policy at g.co/privacypolicy and depending on the previous choices you’ve made about your privacy settings, such as Web & App Activity, YouTube History, and Personalized ads
> As of March 6, 2024, new laws in Europe will require Google to get your consent to link certain services if you want them to continue to share data with each other and other Google services as they do today. For example, linked Google services might work together to help personalize your content and ads, depending on your settings.
This is going to be interesting if I do it to my account.
I remember having to somehow contact support back in the day when google made me merge my youtube account and my gmail email, since they had the same name and email address. Lots of strange buggy behaviour.
> We estimate that 99% of US farmed animals are living in factory farms at present. By species, we estimate that 70.4% of cows, 98.3% of pigs, 99.8% of turkeys, 98.2% of chickens raised for eggs, and over 99.9% of chickens raised for meat are living in factory farms.
I've read that ZFS is less safe than other Linux filesystems if you don't use ECC RAM, because it assumes that there are no memory errors and therefore doesn't provide a tool to repair a filesystem corrupted by such errors. Is this true?
It's not true. That's basically ancient forum myth, alongside the also incorrect "ZFS needs 1GB memory per TB of HDD" nonsense that has thankfully mostly died out finally. ZFS makes no additional assumptions when using ECC vs non-ECC memory.
It is theoretically possibly to construct a scenario where evil ram does all the exactly right things needed fool ZFS and corrupt your filesystem. Any pearl clutching about this thing which has never happened somehow also ignores that every filesystem is going to get corrupted.
In reality, while ECC memory is always nice to have, it's no more required than any other filesystem. Though personally now that amounts of +32gb are common, I generally prefer error correction/detection over ultimate speed these days. Though ironically ECC memory is actually really nice to overclock, because I can actually just check my logs and prove if my system is actually stable.
There so many actual dangers to your data in comparison that it's laughable. The biggest one being you. Followed by hardware failure, malware, and genuine ZFS bugs. I'd stay far away from raw sends of encrypted datasets in ZFS for a while, there are edge cases that haven't been resolved yet.
I've been summoned by the police because of a "dot-zero" address on one of our servers.
Someone had been buying stuff online with a stolen card and the shop admins provided a list of the IP addresses used, including our server's. All the addresses were dot-zero addresses, so I assume it was just some kind of unfortunate obfuscation.
For example:
https://ruby-doc.org/3.2.2/stdlibs/open3/Open3.html