Really. I am not that impressed. It is not something radically different from doing the same thing with a still photo which by now is trivial for those models.
What is being tested here doesn't require a video. It is not showing to be able to derive any meaning from a short clip.
It is fucking doing very fancy OCR, that's all.
What would impress me is if shown a clip of an open chest surgery it was able to comment what surgery is being done, which technique is being used, or if shown video of construction workers, be able to figure out what is the building technique, what they are actually doing, telling that the guy with the yellow shirt is not following safety regulations by not wearing a helmet.
>What would impress me is if shown a clip of an open chest surgery it was able to comment what surgery is being done, which technique is being used, or if shown video of construction workers, be able to figure out what is the building technique, what they are actually doing, telling that the guy with the yellow shirt is not following safety regulations by not wearing a helmet.
What is being tested here doesn't require a video. It is not showing to be able to derive any meaning from a short clip. It is fucking doing very fancy OCR, that's all.
What would impress me is if shown a clip of an open chest surgery it was able to comment what surgery is being done, which technique is being used, or if shown video of construction workers, be able to figure out what is the building technique, what they are actually doing, telling that the guy with the yellow shirt is not following safety regulations by not wearing a helmet.