The topic of human vision and perception is complex enough that I very much doubt it's scientists who are making the claim that we can't perceive anything higher than 30-60fps. There's various other effects like fluidity of motion, the flicker fusion threshold, persistence of vision, and strobing effects (think phantom array/wagon wheel effects), etc, which all have different answers. For example, the flicker fusion threshold can be as high as 500hz[0], similarly strobing effects like dragging your mouse across the screen are still perceivable on 144hz+ and supposedly 480hz monitors.
As far as perceiving images goes, there's a study at [1] which shows people can reliably identify images shown on screen for 13ms (75hz, the refresh rate of the monitor they were using). That is, subjects were shown a sequence of 6-12 distinct images 13ms apart and were still able to reliably able to identify a particular image in that sequence. What's noteworthy is this study is commonly cited for the claims that humans can only perceive 30-60fps, despite the study addressing a completely separate issue to perception of framerates, and is a massive improvement over previous studies which show figures as high as 80-100ms, which seems like a believable figure if they were using a similar or worse methodology. I can easily see this and similar studies being the source of the claims that people can only process 10-13 images a second, or perceive 30-60 fps, if science 'journalists' are lazily plugging something like 1000/80 into a calculator without having read the study.
There's also the old claim [2] from at least 2001 that the USAF studied fighter pilots and found that they can identify planes shown on screen for 4.5ms, 1/220th of a second, 1/225th of a second, or various other figures, but I can't find the source for this and I'm sure it's more of an urban legend that circulated gaming forums in the early 2000s than anything. If it was an actual study I'm almost certain perception of vision played a role in this, something the study at [1] avoids entirely.
As far as perceiving images goes, there's a study at [1] which shows people can reliably identify images shown on screen for 13ms (75hz, the refresh rate of the monitor they were using). That is, subjects were shown a sequence of 6-12 distinct images 13ms apart and were still able to reliably able to identify a particular image in that sequence. What's noteworthy is this study is commonly cited for the claims that humans can only perceive 30-60fps, despite the study addressing a completely separate issue to perception of framerates, and is a massive improvement over previous studies which show figures as high as 80-100ms, which seems like a believable figure if they were using a similar or worse methodology. I can easily see this and similar studies being the source of the claims that people can only process 10-13 images a second, or perceive 30-60 fps, if science 'journalists' are lazily plugging something like 1000/80 into a calculator without having read the study.
There's also the old claim [2] from at least 2001 that the USAF studied fighter pilots and found that they can identify planes shown on screen for 4.5ms, 1/220th of a second, 1/225th of a second, or various other figures, but I can't find the source for this and I'm sure it's more of an urban legend that circulated gaming forums in the early 2000s than anything. If it was an actual study I'm almost certain perception of vision played a role in this, something the study at [1] avoids entirely.
[0] 'Humans perceive flicker artifacts at 500 Hz' https://pubmed.ncbi.nlm.nih.gov/25644611/
[1] 'Detecting meaning in RSVP at 13 ms per picture' https://dspace.mit.edu/bitstream/handle/1721.1/107157/13414_...
[2] http://amo.net/nt/02-21-01fps.html