> It is strictly less likely than Earth-bound abiogenesis
> all panspermia does is layer on an additional series of long-shot coincidences
Disagree. There is a lot more of not-Earth than Earth. I agree that the probability density (per unit volume of the universe) of life originating on Earth is much higher than anywhere else, but there's just so much more room out there for life to originate that the probabilistic cost of traveling to Earth is tiny in comparison.
All you need is a proto-life that's stable in an inert environment with sufficient radiation shielding. It could've originated billions of light years away and still have had enough time to arrive on Earth 4.2 billion years ago. That's a mind-boggling number of Earth-like environments.
In fact, proto-life doesn't even need to look like Earth life, so even environments that are hostile to current Earth life today could've been the cradle of origin (as a wise man once said, "life, uh, finds a way"). Additionally, environments that used to be Earth-like but eroded away are candidates too since all we need is for life to have escaped before the erosion.
In my opinion, panspermia is strictly more likely than Earth-bound abiogenesis.
The paper evaluates itself on the GAIA benchmark and it was my first time hearing about it, so I tried to evaluate myself as a human.
Here's a level 3 question from the GAIA paper (level 3 = hardest):
>In NASA’s Astronomy Picture of the Day on 2006 January 21, two astronauts are visible, with one appearing much smaller than the other. As of August 2023, out of the astronauts in the NASA Astronaut Group that the smaller astronaut was a member of, which one spent the least time in space, and how many minutes did he spend in space, rounded to the nearest minute? Exclude any astronauts who did not spend any time in space. Give the last name of the astronaut, separated from the number of minutes by a semicolon. Use commas as thousands separators in the number of minutes.
I timed myself solving the problem. It took me 9 minutes, 5 Google searches, 14 web pages, multiple Ctrl+F in these pages and 1 calculator use to figure out the answer.
DynaSaur seems to have a 10% to 20% success rate at this level.
Try for yourself. This is one of the few empirically grounded reference levels for how far we are from AGI.
That seems similar to a ~7th grade reading comprehension question, if all the facts where at hand.
Out of curiosity, if anyone knows, what's SOTA for how well LLMs actually parse (English) grammar? In the way they're looking at the prompt.
A lot of correctness to the challenge questions seems to be identifying key phrases and requests. I.e. reading comprehension.
And multi-step tool use requires a higher bar than straight summarization, as one must more particularly differentiate between alternative information to focus on.
The question above was not preceded by anything; that was the whole question. The facts are at hand in the sense that you have the internet and you're allowed to use it. The hard part is knowing what to search and recognising the answer when you see it. This is much harder than any 7th grade comprehension test I've done :)
A much more useful trick I learnt from Tyler Cowen's podcast is to ask what they think is the most underrated / overrated thing in the category. Everyone understands that the answer is going to be subjective, so there's no pressure to be diplomatic. And in my experience, the answers are also high variance, which leads to more interesting conversations (most people agree that Messi is the greatest of all time, but everyone has a different opinion on who is the most underrated / overrated).
In your opinion, what fraction of the small restaurants in a typical city are engaging in money laundering at this scale? And if weighted by volume, what fraction?
I think it's not so much restaurants, as the setup and operation costs are higher, but countertop takeaways I think. I'm not aware of any data but I'd guess from conversations & experience it was just a fraction - single percents? But I have no clue really.
For a good ML operation you need agility: easy quick and cheap to set up & teardown and move - hence takeaways, hand carwashes etc.
Funnily, the way I used to check Keybase profiles is to check Twitter because a blue checkmark there was usually a good indication of them being "the famous person" but thanks to Twitter Blue that feature is no longer usable.
I understand Keybase allows you to link up a bunch of accounts, but it doesn't prevent you from making all of those accounts say you are the CEO/CTO of some company unfortunately.
> but it doesn't prevent you from making all of those accounts say you are the CEO/CTO of some company unfortunately
At least a GitHub profile link can usually be used to validate that this account actually has write access to a GitHub organization, so you can somewhat see it's the right person. Requires them to have pushed any public commits to within that organization though.
Is that true tho? During training, the model predicts {"wall": 0.65, "fence": 0.25, "river": 0.03}. Then backprop modifies the weights such that it produces {"wall": 0.67, "fence": 0.24, "river": 0.02} next time.
But it does that with a much richer feedback than WRONG! because we're also telling the model how much more likely "fence" is than "wall" in an indirect way. It's likely most of the neurons that supported "wall" also supported "fence", so the average neuron that supported "river" gets penalised much more than a neuron that supported "fence".
I agree that distillation is more efficient for exactly the same reason, but I think even models as old as GPT-3 use this trick to work as well as they do.
reply