I'm currently researching unique, cutting edge and honestly kick ass user experiences that just work with AI. This can be a feature/service from open-source to mainstream.
When someone pulled right in front of me from a side street a while back while I was driving, the car slammed on the brakes and before I even registered fully what was going on, I found myself stopped safely, but only inches away from having t-boned that guy.
That is the level of experience AI needs to get to. Not buttons that basically say: "Use AI!!" but features so fully integrated and smooth that you don't even think about whether or not AI is behind it... it just does what it does when it needs to do it.
(And I know, my anecdote wasn't about LLMs, but that is kinda the point.)
Not in any 2+ dimensional situation. Radar based AEB is awful. Reliable collision avoidance requires robust prediction of future trajectories for everything around you (technically it requires robust lower bounds on minimum time to impact for every object around you, which requires perception of and identification of every object around you plus reasonable predictions based on identification and behaviour history for each object... if you're willing to allow "nah that was BS and not your fault" style collisions like the "stopped for a paper bag sitting on the road" / "hit a stack of bricks in a paper bag" dichotomy then it's a little more forgiving but still much harder than rangefinder + 2nd order equations of motion.)
Personally I think the user experience is interesting because we show you very specific questions for low-confidence decisions. Some example screenshots are on that link above. Over time, the number of manual questions has gone down, as our models have gotten smarter about the (seemingly endless!) edge cases in music notation.
Once music is uploaded and scanned, you can use our bespoke notation editor to make any edits. The original image is tightly integrated into our editor, so you can cross-reference.
Gonna drop my own link here, because I really think the UX I’m working on is truly novel. Inspired by the commonplace book format, I take highlights from Kindle and embed them in a DB [1]. From there I build (multiple) downstream apps but the central one, Commonplace Bot [2] is a bot that serves as a retrieval and transformer for said highlights. It has changed the way I read books. I now get to link ideas from books I read in 2018 to books I read last week. I don’t need to always have a query either, as I added a hypothetical question as an entry point allowing for the UX of finding an idea to be as simple as typing “wander”. Finally, since quotes are dense, short, and generally context free, I enable a bunch of transformations like Anki quizzes, and art from quotes, and using the quote itself as a centroid to search its neighbors, etc.
I love this. I have my commonplace book in Roam Research. Search in Roam is not perfect and I have wondered lately if there was a way to get all of the content into a graph DB and then query using LLMs. But I haven't had time to tinker with it - I am sure open source libraries exist that do exactly this.
Can your library take all highlights from Readwise or just Kindle? I use Readwise Reader quite a bit and will love something that takes everything I save + all highlights + other places (Roam Research, Email, Calendar) etc. and I can just ask it questions.
You definitely could! Funnily enough, I have a function named "justBooks()" [1] that filters the Readwise export to just book type tags, but you could use the entire export, or whatever upstream method you want. I think much like journaling, every one's use case will be catered to their own tasks/quotes/ideas, but allow me to share centralized advice. You'll definitely need: 1) a database that supports vectors, I use Postgres 2) a low friction way to get your "new" highlights from your reading practice, I use Readwise 3) an llm to "cache" transformations [2]. This transformation does an insane amount of work, and takes it to the next level in terms of utility, I wouldn't skip it.
This is really cool! I could see it being excellent for anyone who writes or gives speeches very often, a great way to quickly access the knowledge one builds up over a lifetime of reading. Love it!
Well, as a blind user, I'd like to point at the OpenAI Vision integration of BeMyEyes! Being able to get fully detailed scene descriptions including OCR and translation all in one package was pretty much a game changer for me.
Not so much kick-ass, but still works nicely: https://github.com/mlang/tracktales -- My MPD track announcer with support for describing album art...
Can I ask if you think this might make alt text on images obsolete? Do you use the alt text where it’s available, or BeMyEyes (I presume you have a choice)?
Those are two different topics. BeMyEyes is a smartphone app which brings your camera and OpenAI vision models together. It is ment to be used to describe/OCR things in your real-world environment.
While alt texts could theoretically be replaced by a browser/screen reader functionality that asks a vision model to describe the image, it is a waste of time and energy to have each and every user do it over and over again.
Ah, sorry, got you. The aspect I think about with alt text is that AI is often better than a mediocre human effort, and it is improving all the time. Improving AI will improve the description of all images, even older ones, and therefore you might want to run the current AI on all images, even if they have existing alt text.
No one? In my opinion we are not there yet. Right now "chat" is the killer UX, I think conversational inputs will take over for a time... right now my killer AI app is a home brew slack bot RAG pipeline for my own knowledge base building and searching... Why share it, its free now... I think this will also be a trend. Those who can, those who cant.
Chat sucks and will be replaced in almost all cases, I suspect. Speaking is much easier for many tasks, traditional UIs for others. Mediating everything through a keyboard is clunky and slow.
I very much doubt that speaking is slower than typing, given that basically only stenographers can transcribe things in real time, and that's by using special technique.
Talking is still faster, and I can do it while my hands and eyes are occupied, so I definitely don't see typing ever being the preferred interface. You'd never type to interact with an assistant while driving.
The voice chat in the ChatGPT app. The present has never been more futuristic than that, I really feel like I'm talking to another person. It's not just what the responses are, but also how, the voice really brings the whole interaction to life.
Year ago I cancelled my policy with Churchill; I found the entire process pretty painless = called the phoneline, was greeted by the robot, I said "I want to cancel my policy", from the phone number I was calling it has gotten who I was; I confirmed my identity, it outlined when my policy will be cancelled and confirmed with me that I want to proceed. The entire experience was a self-serve that I'd like to see everywhere.
Comparing this to yesterdays adventure with other service (my package got lost) where the bot couldn't decipher what a WRITTEN "my package got lost" or "where is my delivery" means.
Talking with an LLM feels very different to me from text-based chat interactions.
I used the spoken interface with ChatGPT 4 a lot a few months ago after it was released on the iPhone app, and it was pretty immersive. The latency was a bit long, though, and even when prompted to reply briefly the bot tended to ramble on, often with numbered lists, which sound awkward in speech.
For the past couple of weeks, I’ve been experimenting with Inflection AI’s Pi. Its voices are very natural—the American female voice I use even has vocal fry [1]—and the latency is short. It will talk about serious topics (sometimes with numbered lists), but it seems prompted mainly for friendly conversation. It calls me by my name and remembers our previous conversations. I can easily see people becoming emotionally attached to bots like that.
A man named Chris Cappetta has created some open-source software for talking with Claude 3. His conversations with the bot about AI are pretty remarkable [2, 3].
The current spoken interfaces all seem to run what the user says through a speech-to-text converter, so the bot does not perceive pronunciation, intonation, hesitation, etc. After multimodal models that can hear and respond to the speaker’s tone become available, the experience will become even stickier.
If neural network is AI, then translation tools, deepl being one of my favorite.
There is also AI in video editing apps: fast autofocus on faces, face detection and modifiers that follow you. It's really incredible and intuiive enough so that many people use it (too much maybe).
I was recently working on a UI system that attempts to let AI build websites autonomously. When you start working on it you quickly realize how many tradeoffs you have to think through. Most of the questions arise from the limitedness of the context window. For example, Claude 3 supports 200k tokens.
You also have to take into consideration, that since chat history is sent with every new message, the price of the conversion growths ~ n^2 of the number of messages. So do you send the whole codebase? Or do you let AI run commands like ls and cat to read the files it needs? Do you want a file in the directory with quick history of what's already done, and what needs to be done?
Another thing I find interesting is how microservices became a natural choose vs. monolith apps when building with AI, again due to limits of the context window. So you focus on thinking through all the components and their APIs, and then let AI build each of the component. If it can be done in isolation without any knowledge of other components, that's better.
Also, it quickly becomes obvious that fully-autonoumous builder does not make any practical sense. Real person still needs to look at the progress, and give guidance. Not even because AI can't do this, it probably can. But because your own understand of what you are building changes over time. So it should be semi-automatical, with real users being able to change the course any moment.
How do you build the autonomous loop?
One thing I find useful is to let AI write tests first, and then run those automatically on each new chat message. TypeScript types also helps catching broken code early. In those case automatical message is sent "Hey, you broke the tests. Here are the error messages. Go ahead and fix those." Operator doesn't have to bother, until it's fixed.
Another loop can be build with the ability to send screenshots. So at any moment system can send a screenshot to AI, and ask if it's good enough, and if it wants to make any changes. That also improves the quality.
Well, you get the ides. It's an interesting task to ponder.
Klaviyo. I know they have been around a long while but I just implemented it for the first time today on a new ecommerce project.
The signup and setup flow is quite lengthy because they need to implement email flows, abandoned cart reminders, sms flows, push messaging which of course it needs to be highly customized. All of this is needed just to unlock some of the basic features of the tool.
I was surprised and delighted to begin setting up an email series only to discover it had already scanned my website and used AI to write the content of all the messages to be applicable to our tone and messaging.
Highly impressive and it makes getting it up and running super fast.
There is a segment in a recent Linus Tech Tips video [1] that showcased media asset management software [2] where you can search for portions of locally stored videos via natural language. e.g. Person X holding object Y, working on task Z. If this type of AI video tagging comes to mobile I think it will be a game changer.
I would say Open AI's Whisper just works, a nice GUI wrapper that leverages Metal/GPU/Co-processors is "Hello Transcribe"
Whisper transcribes conversations from audio files. Hello Transcribe is a GUI wrapper by someone else that uses Whisper under the hood to create subtitles in subtitle format with timestamps.
Does not distinguish between speakers though
I wish other use cases were considered by that application, as I pretty much never want subtitle files and never plan to
There are some techniques to distinguish between speakers but I haven't seen anyone put the combinations together in a nice GPU leveraging app.
Which AI you mean? It's so ubiquitous that it's hard to pick. Categorization and labelling existed before LLM-s, but they made it much better. Imagine not having to manually label support tickets if you have a 1000 new every day.
3D model generation is also a good example, you can save tons of time.
AI is everywhere, but most of the time you simply don't notice it since it's so well integrated or they are driving backend logic, anti fraud systems, etc.
"AI" is a moving target but a few months ago I was blown away to find out that phones are WAY better at semantically tagging photos now, to a point where it's actually useful. I can search my phone photos for 'cat' or 'car' (or more usefully, that one scanned pic of my drivers' license if I forgot my ID). Sure, resnet has been around for a while and this is still not great but it's there in production right now and has been legitimately useful to me.
Recently I got to drive a rental Corolla with lanekeep assist and smart-ish cruise control which could sorta keep itself on the road by itself. It was definitely a fun toy for a controls engineer, kind of like riding a narcoleptic horse with ADHD but still on the cusp of being a net positive.
The best UX at the moment is to do role play with chatGPT's voice interaction.
My son (5y) loves to play "holiday". So I tell GPT to pretend to be a hotel receptionist. And he will then have a conversation about rooms, events, costs, etc. with the bot.
I have not used it, but every review of the “AI software engineer“ DEVIN has pointed out that the marketing claim is absurd but the user interface is pretty cool.
It's difficult to spot, because - similar to your sysadmin - if you know it's AI, it's probably because it's doing a bad job. It's when it just blends into the overall experience so you don't even notice it's there that it's great. There are cases where AI is helping a company be more profitable, by allowing them to provide a substandard service with fewer humans. From a company perspective, it's doing great, but the end user experience sucks.
So, my list:
* Spotify's weekly picks used to be pretty good at recommending new music, although it's actually got worse in the last 2-3 years.
* AI filtering out things like fraudulent transactions and virus-laden web pages. They're a long way from 100%, but it's got better, even as the challenge has got bigger.
* Some games have started making good use of AI - Red Dead Redemption 2 is probably the best I've played. Makes the in-game world feel a bit more dynamic, rather than the same procedural world.
* Google Maps does a lot with AI, redirecting based on traffic. It doesn't go out of its way to tell you how clever it's being, so it's hard to spot. But 10 years ago, I used to get stuck in a lot more traffic than I do now.
* ChatGPT is awesome, even if the hype cycle is now turning and we're all eye-rolling at it. I've conversed with it to improve my understanding of all sorts of topics, and it is amazing.
> * Spotify's weekly picks used to be pretty good at recommending new music, although it's actually got worse in the last 2-3 years.
In a similar vein, I remember using Pandora for the first time in the late 00's and thinking "Is this software reading my mind??" as it picked song I either already loved or a new song I like immediately.
Amazon Music, YouTube etc all feel like it's some version of "this what other people liked" but that just spirals me into some very specific genre.
I haven't had an experience similar to that first Pandora one in tech in a long time.
Yes! In 2010, it could say "Oh you like Leonard Cohen? Maybe you'll like Jonathan Richman.", which was a great jump. Now, it's more like "Oh, you like Simon and Garfunkel? Here's a bland cover of one of their songs."
That is the level of experience AI needs to get to. Not buttons that basically say: "Use AI!!" but features so fully integrated and smooth that you don't even think about whether or not AI is behind it... it just does what it does when it needs to do it.
(And I know, my anecdote wasn't about LLMs, but that is kinda the point.)