IIRC correctly, Clippy’s most famous feature was interrupting you to offer advice. The advice was usually basic/useless/annoying, hence Clippy’s reputation, but a powerful LLM could actually make the original concept work. It would not be simply a chatbot that responds to text, but rather would observe your screen, understand it through a vision model, and give appropriate advice. Things like “did you know there’s an easier way to do what you’re doing”. I don’t think the necessary trust exists yet to do this using public LLM APIs, nor does the hardware to do it locally, but crack either of those and I could see ClipGPT being genuinely useful.
The way I remember it a lot of software had "help" documentation with full text search in the late 1980s and early 1990s but the common denominator was that it didn't work in the sense that you got useful answers less than 10% of the time. Until Google came along, users got trained to avoid full text search facilities.
The full text facility attached to Clippy really was helpful, getting useful answers around 50% of the time. I thought the whole point of making him an engaging cartoon character was to overcome the prejudice mid-1990s users had towards full-text search in help.
It looks like you're one of the 1% of humans who still write letters themselves! Dear me, imagine that, what do you think this is, the 90s or something?! Would you like to join the other 99% of humans and doomscroll and shytpost while I write that letter for you?
We are probably getting closer to that with the newer multimodal LLMs, but you'd almost need to take a screenshot on intervals fed directly to the LLM to provide a sort of chronological context to help it understand what the user is trying to do and gauge the users intentions.
As you say though, I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM.
> As you say though, I don't know how many people would be comfortable having screenshots of their computer sent arbitrarily to a non-local LLM.
Of the technical, hang-out-on-HN crowd? Ya, probably not many.
Of the other 99.99% of computer users? The majority of them wouldn't even think about it, let alone care. To quote a phrase, ”the user is going to pick dancing pigs over security every time”.
Even without the non-chalent attitude towards security, the majority of the population has been so conditioned that everything they do on a computer is already being sent to 1) Apple, 2) Google, 3) Microsoft, or 4) their employer, that they're burnt-out of caring.
All that is to say that if you can make a widely-available real-time LLM assistant that appeals to non-technical users, please invite me to your private-island-celebrity-filled-yacht-parties.
I think we're well into the paradigm of "hidden employee activity monitoring software" already taking periodic screenshots and sending it to an LLM somewhere, which then generates aggregate performance metrics and dashboards for managers. I've heard of multiple companies working on this for $bigcorp environments, customer service/call center workstation PCs, etc.
Models with native video understanding would do the trick - Advanced Voice Mode on the ChatGPT iOS/Android app lets you use your camera, works pretty well; there's also https://aistudio.google.com/live (AFAIK there are no open-source models with similar capabilities)
i added a minutely scrot cronjob about a year ago and haven't used it once. remembering "that website i was on last week" is apparently not a real problem I was having
> Things like “did you know there’s an easier way to do what you’re doing”
That could come off just as patronizing as the original Clippy. If it said things like "Would you like me to generate you a letter for X?" it would be miles ahead of the original.