More

adtac · 2024-12-20T20:21:56 1734726116

It thinks hard about it

adtac · 2024-12-02T14:02:45 1733148165

> It is strictly less likely than Earth-bound abiogenesis

> all panspermia does is layer on an additional series of long-shot coincidences

Disagree. There is a lot more of not-Earth than Earth. I agree that the probability density (per unit volume of the universe) of life originating on Earth is much higher than anywhere else, but there's just so much more room out there for life to originate that the probabilistic cost of traveling to Earth is tiny in comparison.

All you need is a proto-life that's stable in an inert environment with sufficient radiation shielding. It could've originated billions of light years away and still have had enough time to arrive on Earth 4.2 billion years ago. That's a mind-boggling number of Earth-like environments.

In fact, proto-life doesn't even need to look like Earth life, so even environments that are hostile to current Earth life today could've been the cradle of origin (as a wise man once said, "life, uh, finds a way"). Additionally, environments that used to be Earth-like but eroded away are candidates too since all we need is for life to have escaped before the erosion.

In my opinion, panspermia is strictly more likely than Earth-bound abiogenesis.

adtac · 2024-12-01T13:09:13 1733058553

The paper evaluates itself on the GAIA benchmark and it was my first time hearing about it, so I tried to evaluate myself as a human.

Here's a level 3 question from the GAIA paper (level 3 = hardest):

>In NASA’s Astronomy Picture of the Day on 2006 January 21, two astronauts are visible, with one appearing much smaller than the other. As of August 2023, out of the astronauts in the NASA Astronaut Group that the smaller astronaut was a member of, which one spent the least time in space, and how many minutes did he spend in space, rounded to the nearest minute? Exclude any astronauts who did not spend any time in space. Give the last name of the astronaut, separated from the number of minutes by a semicolon. Use commas as thousands separators in the number of minutes.

I timed myself solving the problem. It took me 9 minutes, 5 Google searches, 14 web pages, multiple Ctrl+F in these pages and 1 calculator use to figure out the answer.

DynaSaur seems to have a 10% to 20% success rate at this level.

Try for yourself. This is one of the few empirically grounded reference levels for how far we are from AGI.

ethbr1 · 2024-12-01T15:46:14 1733067974

That seems similar to a ~7th grade reading comprehension question, if all the facts where at hand.

Out of curiosity, if anyone knows, what's SOTA for how well LLMs actually parse (English) grammar? In the way they're looking at the prompt.

A lot of correctness to the challenge questions seems to be identifying key phrases and requests. I.e. reading comprehension.

And multi-step tool use requires a higher bar than straight summarization, as one must more particularly differentiate between alternative information to focus on.

adtac · 2024-12-01T17:02:20 1733072540

The question above was not preceded by anything; that was the whole question. The facts are at hand in the sense that you have the internet and you're allowed to use it. The hard part is knowing what to search and recognising the answer when you see it. This is much harder than any 7th grade comprehension test I've done :)

adtac · 2024-11-13T14:18:36 1731507516

A much more useful trick I learnt from Tyler Cowen's podcast is to ask what they think is the most underrated / overrated thing in the category. Everyone understands that the answer is going to be subjective, so there's no pressure to be diplomatic. And in my experience, the answers are also high variance, which leads to more interesting conversations (most people agree that Messi is the greatest of all time, but everyone has a different opinion on who is the most underrated / overrated).

adtac · 2024-11-13T14:08:57 1731506937

In your opinion, what fraction of the small restaurants in a typical city are engaging in money laundering at this scale? And if weighted by volume, what fraction?

davidwritesbugs · 2024-11-13T20:08:55 1731528535

I think it's not so much restaurants, as the setup and operation costs are higher, but countertop takeaways I think. I'm not aware of any data but I'd guess from conversations & experience it was just a fraction - single percents? But I have no clue really. For a good ML operation you need agility: easy quick and cheap to set up & teardown and move - hence takeaways, hand carwashes etc.

adtac · 2024-11-13T01:54:20 1731462860

Two people? The department in charge of government efficiency has redundancy at the leadership level? The irony.

techfeathers · 2024-11-13T02:02:00 1731463320

Hedging their bets?

adtac · 2024-10-26T19:12:33 1729969953

so that the next company won't sue

adtac · 2024-10-14T19:49:36 1728935376

Is there any message from the Flipper Zero people that this is actually their CEO? :)

diggan · 2024-10-14T20:00:43 1728936043

No need, the user profile (cryptographically) links to their keybase profiles which corroborates the identity. The future is here! :)

rtpg · 2024-10-15T00:09:04 1728950944

Funnily, the way I used to check Keybase profiles is to check Twitter because a blue checkmark there was usually a good indication of them being "the famous person" but thanks to Twitter Blue that feature is no longer usable.

I understand Keybase allows you to link up a bunch of accounts, but it doesn't prevent you from making all of those accounts say you are the CEO/CTO of some company unfortunately.

diggan · 2024-10-15T00:44:00 1728953040

> but it doesn't prevent you from making all of those accounts say you are the CEO/CTO of some company unfortunately

At least a GitHub profile link can usually be used to validate that this account actually has write access to a GitHub organization, so you can somewhat see it's the right person. Requires them to have pushed any public commits to within that organization though.

rtpg · 2024-10-15T01:07:47 1728954467

Fair point, yeah.

roland35 · 2024-10-14T21:24:38 1728941078

And... How do we know the key base profile is correct?

diggan · 2024-10-14T21:41:40 1728942100

Here is how Keybase works: https://book.keybase.io/docs/server

Then take a look at the HN profile, which leads you to the Keybase profile.

bloopernova · 2024-10-14T22:14:25 1728944065

I really wish keybase had taken off. I should have realized it was going to fail once they started adding the cryptocurrency wallet.

diggan · 2024-10-14T22:48:52 1728946132

Keyoxide seems OK as an alternative: https://keyoxide.org/aspe:keyoxide.org:Q6B7ZBQITV7IE2RG4EMVK...

Doesn't have any of the social "features".

adtac · 2024-10-05T04:06:48 1728101208

if you're on your phone and can't read the pdf, I ran it through an OCR: https://gist.github.com/adtac/d4baaa5b23344689cff42b9bf16554... (no photos)

I haven't verified that it's correct (there are a few benign artifacts at the very least), but it looks right content-wise

adtac · 2024-09-26T00:22:28 1727310148

>WRONG! ENJOY MORE PENALTY, SCRUB!

Is that true tho? During training, the model predicts {"wall": 0.65, "fence": 0.25, "river": 0.03}. Then backprop modifies the weights such that it produces {"wall": 0.67, "fence": 0.24, "river": 0.02} next time.

But it does that with a much richer feedback than WRONG! because we're also telling the model how much more likely "fence" is than "wall" in an indirect way. It's likely most of the neurons that supported "wall" also supported "fence", so the average neuron that supported "river" gets penalised much more than a neuron that supported "fence".

I agree that distillation is more efficient for exactly the same reason, but I think even models as old as GPT-3 use this trick to work as well as they do.

snovv_crash · 2024-09-26T10:27:23 1727346443

You are in violent agreement with GP.