Hacker News new | past | comments | ask | show | jobs | submit login

It would be fun to try to add a predictive layer to this:

- Given the score and what each person has just played predict what the next few sounds are going to be - Given a high frame rate video stream of a person predict what the next note to be played

In the same way that Nvidia has extremely low bandwidth but high resolution video enabled by face keypoint tracking and facial reconstruction / puppeteering maybe there's a place for prediction and/or sound reconstruction from extremely low bitrate streams.*

* Obviously not the exact usecase here since a premium is being placed on not processing, but still fun to think about.

We're doing exactly this to teleoperate humanoid robots on high-latency networks!

Paper: https://arxiv.org/abs/2107.01281

Video: https://www.youtube.com/watch?v=N3u4ot3aIyQ

"We introduce a system in which a humanoid robot executes commands before it actually receives them, so that the visual feedback appears to be synchronized to the operator, whereas the robot executed the commands in the past. To do so, the robot continuously predicts future commands by querying a machine learning model that is trained on past trajectories and conditioned on the last received commands. In our experiments, an operator was able to successfully control a humanoid robot (32 degrees of freedom) with stochastic delays up to 2 seconds in several whole-body manipulation tasks, including reaching different targets, picking up, and placing a box at distinct locations."

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
