I really don't think this is entirely spurious considering Elon has been using LLMs to fire government employees. It seems like a very plausible explanation for how they came up with this policy.
from Mike Johnson, about DOGE's use of algorithms: ""Elon has cracked the code. He is now inside the agencies. He’s created these algorithms that are constantly crawling through the data. And as he told me in his office, the data doesn’t lie. We’re going to be able to get the information."
My speculation, but it's very likely that one of those pieces of analyzed information was the 5 bullet points from each federal employee justifying their job.
He doesn't care about Tesla anymore. He's making some fat money out of AI, SpaceX is a money printing machine now that he's got the ear of Trump and he's spending almost all his daytime (and nightime) hours shitposting and boosting racists on twitter.
Funny that this is the exact calculation that ChatGPT suggests. Seems like another genius Musk idea. Madagascar definitely deserves 47% tariffs on US imports.
Genuinely, how hard for you is it to call a person what they want to be called. What difference does gender make in the workplace even?
I understand it's hard for you to use different pronouns, but it's probably much harder for the person who feels like they don't identify with their gender assigned at birth.
> Requiring me to call someone who was born with a penis, and who has sired children a "she" is telling me to say that 2 + 2 = 5.
It's more like someone telling you to call them John and you call them Tyler instead. It's literally just a word.
It's more like someone telling you to call them John and you call them Tyler instead. It's literally just a word.
That's an absurd comparison. Names are something that have always been personally chosen. "He" and "she", "man" and "woman" are words that create common knowledge about the nature of reality.
------------
Zhao Gao was contemplating treason but was afraid the other officials would not heed his commands, so he decided to test them first. He brought a deer and presented it to the Second Emperor but called it a horse. The Second Emperor laughed and said, "Is the chancellor perhaps mistaken, calling a deer a horse?" Then the emperor questioned those around him. Some remained silent, while some, hoping to ingratiate themselves with Zhao Gao, said it was a horse, and others said it was a deer. Zhao Gao secretly arranged for all those who said it was a deer to be brought before the law and had them executed instantly. Thereafter the officials were all terrified of Zhao Gao. Zhao Gao gained military power as a result of that.
---
How hard would it be for you to call a "deer" a "horse"? It's just words.
Must be hard to have to deal with that, I'm so sorry you have to use a different word when referring to someone.
It's obviously much harder for you to use a different pronoun than for them to have their entire gender identity invalidated. Would you like for people to misgender you constantly and purposefully?
Your life must be so hard. I feel bad for you to have to use a different word to refer to someone. Maybe have some basic decency and respect for others?
In my experience at this point all the flagship multi-modal LLM provide for the same accuracy. I see very little, if any, drift in output between them, especially if you have your prompts dialed.
For the Gemini Flash 1.5 model GCP pricing[0] treats each PDF page an image, so you're looking at pricing per image ($0.00002) + the token count ($0.00001875 / 1k characters) from the base64 string encoding of the entire PDF and the context you provide.
10 page PDF ($0.0002) + ~3,000 tokens of context/base64 ($0.00005625) = $0.00025625
Cut that in half if you utilize Batch Prediction jobs[1] and even at scale you're looking at a rounding error in costs.
For on-going accuracy tracking I take a static proportion of the generations (say 1%, or 10 PDFs for every 1,000) and run them through an evaluation[2] workflow. Depending on how/what you're extracting from the PDFs the eval method is going to change, but I find for "unstructured to structured" use-cases the fulfillment evaluation is a fair test.
Love the idea! We're doing something similar to parse rubrics and student submissions at https://automark.io - great to see an open source library exploring the space more! Like you said, I think iteratively adding explicit layers of LLM understanding to the raw extraction will allow a lot more control over what information gets extracted. Also interested to see an integration with GPT-4V as an additional aid. I'd love to chat sometime if you have time - my email is in my bio.