I 90% agree with your opinion. A video does add some useful information, but 90% of the benefit could come from a small window of the person(s) you are talking to- maybe 90px by 90px.
Unfortunately, not going to work with how Jitsi does their Web API through an iframe. Ol' OpenTok, before they got bought out by Vonage, could have been used because they inftected new DOM elements into the current page. But with the Jisti Meet interface timing in an iframe, I just don't have enough access to composite things like that.
Cool, I was looking for something like this but didn't find it.
Also, reminds me that I need to submit a pull request for their iframe API, because I stumbled on some missing information in it. Luckily, was pretty easy to guess at the right values.
I'm generally of the opinion that the video feed is useless in 90% of teleconferencing situations.