Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
computerex
24 days ago
|
parent
|
context
|
favorite
| on:
RWKV Language Model
New multimodal models take raw speech input and provide raw speech output, no tts in the middle.
benob
24 days ago
|
next
[–]
A relatively detailed description of such systems:
https://arxiv.org/abs/2410.00037
Closi
24 days ago
|
prev
|
next
[–]
Seems like the future - so much meaning and context is lost otherwise.
intalentive
24 days ago
|
prev
[–]
Very cool. Logical next step. Would be interested to know what the dataset looks like.
moffkalast
24 days ago
|
parent
[–]
Youtube. Youtube is the dataset.
Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: