Not a full solution, but one thing I've learned not to do is tell Cursor "you got that wrong, fix it like this". Instead, I go back to the previous prompt and click "Restore Checkpoint", edit the prompt and possibly the Cursor rules to steer it in the right direction.
When the model has the wrong solution in its context, it will use it when generating new code, and my feeling is that it doesn't handle the idea of "negative example" very well. Instead, delete the bad code and give it positive examples of the right approach.
You probably mean the USAMO 2025 paper. They updated their comparison with Gemini 2.5 Pro, which did get a nontrivial score. That Gemini version was released five days after USAMO, so while it's not entirely impossible for the data to be in its training set, it would seem kind of unlikely.
The claim is that these models are training on data which include the problems and explanations. The fact that the first model trained after the public release of the questions (and crowdsourced answers) performs best is not a counter example, but is expected and supported by the claim.
I was noodling with Gemini 2.5 Pro a couple days ago and it was convinced Donald Trump didn’t win the 2024 election and that he conceded to Kamala Harris so I’m not entirely sure how much weight I’d put behind it.
> it seems not a huge computational task to figure out that ] is a `\right]`
TeX's design was finalised in 1982, when the computational resources were different by a few orders of magnitude. There is a very strong culture of backward compatibility (search for "A torture test for TeX") and there are so many equations written in TeX documents that it would be impossible to change the parsing now.
That said, something like MathJax would be free to create their own TeX-like syntax where parentheses are automatically paired.
I for one often find myself wanting more control, writing `\bigr]` or `\biggr]` instead of `\right]` to get the rendered equation to look good.
> And while rendering `(a+b)/a` as a `\frac` is opinionated, honestly, the only reason why this occurs in my notes is when I was too lazy to type `\frac{}{}`.
Plain TeX has the much nicer syntax `{a+b \over a}` but for some obscure reason LaTeX recommends against that.
He used to send out real cheques for $2.56 but apparently they contained codes that could be used to transfer money out of his account in excess of the sum of the cheque. Now he uses the made-up Bank of San Serriffe, which naturally understands hexadecimal.
Routing and account numbers at the bottom of a check are semi-private and nefarious actors can abuse them. See the movie/book Catch Me If You Can about Frank Abigail.
Yes, but the process of getting Gmail, Outlook etc to receive your emails and put them in recipients' inboxes is far from painless or quick. An IP address with a clean history and SPF/DKIM/DMARC are table stakes, but then you get to play the "my emails are randomly dropped today while everything looked fine yesterday" game.
OK, well it hasn't been MY experience at all, hosting your own legit email with a 100% score on mail-tester, SPF, DKIM and DMARC does NOT work fine because Microsoft still ends up marking all your emails as spam, so maybe you could consider your experience is not universal and just because it happens to work with your IP addresses doesn't mean that's the case for everyone else? Jeez...
My experience is that Gmail accepted my emails fine... until one day it didn't. Then some time later it worked again.
I registered for their Postmaster Tools, which says
No data to display at this time. Please come back later.
Postmaster Tools requires that your domain satisfies certain conditions before
data is visible for this chart.
Refer to the help page for more details.
The help page has no useful information. I suspect that I sent too little mail for it to register in their systems at all.
Outlook was even worse, and I just told my Outlook users to change providers.
Eventually I capitulated and got Google Workspace, and now everything gets delivered perfectly.
> At 15+ years of hosting my own email through multiple IP changes this has not been my experience at all.
At 25+ years of hosting email through multiple hosting providers, this has been my experience multiple times. To be fair, happening less often with DKIM et al, but those are relatively new inventions.
15+ years hosting email on the same ip space with strict security process. Numerous numerous numerous blocks, black holes, and spam routing. This was personal.
Worked for a company self hosting famous brand emails. They would get blocked too. Imagine telling the band manager of a famous classic rock band that their email to their label was being rejected due to being black listed for spam.. (cc’ing the managers team)
Stop fooling yourself, it does not work fine. If it did you would not rely on that google outlook or yahoo account
EDIT time is over. I don't want to be misunderstood. I am not claiming to send MASS emails and having them delivered without issues or anything. If we have to do mass emails, they are done with services that provide the GUIs for them etc. There's no way you won't end up in spam lists even if you sign up each invidiual email address in person yourself.
That's true sending email from my MS Outlook box to my own gmail. At some point, it comes down to just doing the best you can and not stressing too hard.
Getting a dedicated server with an ISP that does a decent job at keeping their IP blocks clean for email is about the best you can expect. Setup the appropriate SPF/DKIM/DMARC and get along. There's really not too much more to be done these days. Even the big guys don't always get along.
Anecdotally, we have hosted email servers for old games on Hetzner without issue, as the IP pool is generally not as popular with spammers given the time cost bringing up the server OS images. It is far from perfect, but generally performs well as reporting asshats on your local network block is easy.
Almost all cloud providers with dynamic-load ephemeral IPs will show up on ban lists eventually due to vulnerability scanners, bad spiders, and spam/voip drops. However, it is far more common for Spamhaus free tiers to quietly go sideways when no one is looking.
Gmail/Outlook have their own peer policies that serve their own business posture. Google does require administrators register in their clown system as a user to exchange email, but it is effective policy that adds nuisance cost to people spinning up 30 servers a day to spam people.
Firewall Rate-limits are effective on small single-domain servers. A modern email server in Go that is isolated from each user space greatly simplifies the possible setups. =3
which has the curious property that as you substitute nonnegative integers for the variables, the positive values of the polynomial are exactly the set of prime numbers. (The polynomial also yields negative values.)
When put like this, it sounds like the polynomial must reveal something deep about the primes... but it's another cool magic trick. The MRDP theorem (famous for solving Hilbert's 10th problem negatively) implies that this kind of multivariate polynomial exists for exactly those sets of natural numbers that are computably enumerable, so the polynomials could be seen as a really esoteric programming language for set-enumeration algorithms.
Oliver Burkeman wrote a book about this, named "Four Thousand Weeks: Time Management for Mortals". The main point (at least as I remember it) is that there are way more books to read, links to click, and things to do than you can fit in your lifetime, so it's a delusion that you could ever get to the end of your to-do list.
reply