Hacker News new | past | comments | ask | show | jobs | submit login

Depends on who is trying to sell you what.

Currently all the tools on the market just use a different prompt and call it an agent.

I've build a tool using different models for each agent, e.g. whisper for audio decoding, llava to detect slides on the screen, open cv to crop the image, ocr to read the slide content, llm to summarize everything that's happening during the earnings call.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: