This title really undersells the absolute insanity of the described solution. This is a beautiful example of "if it's stupid, but it works, it's not stupid." The justification is very convincing.
One thing I'm curious about: how did you build your corpus of meme images and videos?
If it works, it works. But it also speaks volumes about Apple's disregard/inexperience with exposing their stuff via the web - https://www.icloud.com/ being the prime example: half the stuff the phone apps can do are not available (cannot create a reminder with a due date...) and the things that are there are slow and buggy.
Have you tried something based on deep-learning that uses Transformers :
https://github.com/roatienza/deep-text-recognition-benchmark (available weights are for tasks that seem similar to OCR so there is a good chance you can use it out of the box). With a good gpu it should process hundreds to thousands image per seconds, so you likely can build your index in less than a day. (Maybe you can even port it to your iphone stack :) )
There are tons of other freely available solutions that you can get with a search for things with keywords like "image to text ocr" "transformers" "visual transformers"...
You can do better than a general image-to-text model reading memes, because they all use the same fonts - so you want something trained off synthetic data made with that font.
yeah but lots of things that work are stupid because there are many other solutions that work better, the greatness of this crazy solution is it really seems like the best solution given price requirements.
I feel like I’m taking crazy pills in this thread. Am I the only one who talks to Gen Z kids who explore around their iPhone apps? This definitely isn’t the best option given price requirements. It’s not even the most convenient option.
I’m around age 30, not 13, so similar to the article, my first instinct was also to create a database and OCR the image. But by total coincidence, yesterday I had a conversation with my 14 year old cousin on the topic of saving memes. Her response was along the lines of “yeah, everyone nowadays just saves the image to your iPhone photos, and then just search for it later from the photos app”.
Yeah. This whole article is literally already built into iOS UI, not just a hidden API. And kids all seem to know about this, apparently.
This article uses an example meme with the text “Sorry young man But the armband (red) stays on during solo raids”. I saved it in my iPhone photos app… and found it again through the search function in the photos app.
This is a solved problem already, by teenager standards.
I felt extremely old yesterday when I was talking to my cousin. And I felt extremely old today, reading this article. This is because looking back, the past few decades of CS cultural intuition have established that text are text, and images are images. Strings and bitmaps don’t mix.
This seems sort of obvious to anyone in tech, but I realized that from a clueless grandma perspective, not being able to search up text in photos wasn’t really obvious. Well, the roles are reversed now. Ordinary people now have access to software that treats text as a first class citizen in photos by default.
No, it's not. The author even mentions private memes that are in-jokes. He built a service that can be used to explore memes, but people generally don't search for new memes in a search engine. They tend to use the search engine to find memes they already have.
How would you solve sourcing and distribution using just iOS though? Sure, it's built into iPhones, but if you want to create a comprehensive globally accessible meme search I don't think you can do that by saving memes to your iPhone.
A kid doesn’t care about that. They just save the memes they like, including the custom ones that their friend made which doesn’t make any sense to anyone else. You don’t need a comprehensive global search engine, if you have a tool that will tell you exact personalized answers to images you’ve saved before. And kids these days save everything; it’s like how people use Gmail, no point in deleting if you just archive.
If you’re talking about kids without iPhones, then I don’t know, I’m assuming there’s probably some competitor apps on android now.
But I think you’re thinking too narrow. Don’t worry about a meme database. What about a searchable visual database of everything you’ve ever seen?
The use case for the meme database is slightly different: it's to find that meme you saw somewhere else. Local search isn't enough then.
Though I share your feelings somewhat - I was completely surprised by what you wrote in the comment higher up about iPhone gallery search. I didn't realize this is possible in a reliable fashion, much less off-line and deployed in a mass market device.
Complete supposition: maybe teens' exposition to memes is through non-private messengers and apps, meaning all media is saved automatically on the phone and available through a search. I don't think the web is very much used still.
Yes, and the database is useful for all the situations when you saw a meme on someone else's device, or embedded in some piece of content, so you had no way to save it.
Your cousin and your friends don't care about that, not sure if it applies to all kids worldwide honestly. I'm sure "meme collecting" is a common practice among many teens, but I don't think it means that every teen saves all meme/images they encounter.
You know that some teens don't even save some images? They store them on specific instagram accounts they make for a specific category. My cousin had an instagram account for close friends (5 people) where she only shared bad things that happened to her during the day. Another one for nice things, etc. All those memories were recorded in app and never saved in the device, only stored as stories on the account. Guess what? She was sad a while ago because somehow she lost access to one of this accounts and so to all those pictures/videos.
Also the fact that some teens save a lot of pics on their devices doesn't mean much in the grand scheme of things. In 2008 I had folders upon folders of images downloaded from the internet. Now i'm not even sure where they are, probably in some hard drive in some closet. You can be sure that if i remember any of those contents I won't dig up my hard drive, but google them (use a comprehensive global search engine). We have no idea where all this collected data from teens will be in 15 years, it's not unlikely that it will be lost or archived in hard to reach places and forgotten. I've stumbled some times into "meme dumps" where they upload all their memes to a service to free space on their device/icloud.
For sure teens use technology in ways that might be unexpected and counter-intuitive to us, but I don't think that invalidates in the slightly the need of a global search engine for memes. It's a good idea if i want to find a meme that I saw, a need that I don't think will disappear anytime and that's also not a millennial+ only problem.
I really want to emphasize how insane this situation is, because I think most tech people won’t realize what’s happening unless it’s pointed out.
If you’re a typical tech person, you probably look at this, go “oh, iOS Photos now OCRs every photo. Cool, that’s 2000 or 2010 era fancy tech, boring these days. And then a search engine on those strings, yeah cool, nothing too mind blowing”. The sheer boring-ness of this by tech people standards meant that this iOS UI change went under the radar.
That’s not true for non tech people.
The people who discovered this, put this to use immediately. You can search up anything from an image now. Old memes? Sure. Forgot the name of a restaurant you went to, but remember that you took a picture of the menu and the beef dish was amazing? Search up the word “beef” and it’s probably in there. Took a screenshot of an article, remember 1 or 2 words from it, but can’t find it on Google? Search for those 1-2 words you remember to find the screenshot, then use the phrases in the screenshot to find the article on google. Trying to find a picture of a cat you saved? Type in ”cat” and search for it. Yes, the photos app can do that too.
Screenshots are cheap and instant. Kids never delete them. It’s like how Gmail “archive” feature in 2005 revolutionized email because you never had to delete an email. Well, iCloud Photos “optimize storage” means that you can effectively store infinite screenshots.
There’s another UX revolution happening in terms of saving information. It’s just that photos became easily instantly searchable, and nobody seemed to have really noticed the implications this has on storing memories, and boosting recollection. This can possibly be the equivalent of “you’ll always have a calculator in your pocket” but equivalent analogy to memory techniques like spaced repetition.
> I really want to emphasize how insane this situation is, because I think most tech people won’t realize what’s happening unless it’s pointed out.
Count me in. If you asked me about the OCR itself, I'd probably say "yeah, it's mostly been solved for a good decade for print books and articles, but it's unreliable enough". I somehow never considered OCR might have gone better - possibly because my main exposure was through badly OCRed book scans and a built-in OCR in some PDF reader I used at one point.
It definitely didn't occur to me that OCR works well enough on arbitrary images, and it's cheap enough compute-wise that you could do it locally in a casual fashion.
Nice thing you have there in the Apple garden. Over here in Android land, I have the opposite problem. You say:
> This can possibly be the equivalent of “you’ll always have a calculator in your pocket” but equivalent analogy to memory techniques like spaced repetition.
and all i can think of is how I recently became convinced that a Samsung flagship is losing my photos. There's been a couple cases over the past few months when I felt really damn sure I made a set of photos of something (e.g. remodeled kitchen), but when I checked on the phone, it turned out those photos don't exist, or there is maybe just one where I expected 5-10. They aren't in the gallery. They aren't in the filesystem. Poof, gone.
So either I'm getting senile in my 30s, or something is off with the way my phone stores photos. I did a web search for this the other day, there are relatively recent reports on-line complaining about the same thing, but no one has any evidence. I'm thinking about doing an experiment now (basically make extra photos every day and document them in a paper notebook, and check after half a year if the photos match the notes) - but the point of me sharing this is: I no longer trust new tech, smartphones in particular, to handle basics correctly. Much less do something advanced like reliable text search on images.
There is a chance that your photos are being backed up by some cloud service and being removed from your gallery. The most likely suspect is Google Photos.
Note that Google photos not only OCRs, but it also does a visual search of objects, faces, scenery etc. and is extremely powerful.
> There is a chance that your photos are being backed up by some cloud service and being removed from your gallery. The most likely suspect is Google Photos.
I have Google Photos upload and backup both disabled.
But then, I'm pretty sure either Google or Samsung SMS app had a "feature" to automatically delete old messages (for a definition of "old" that was neither specified, nor configurable), and it defaulted to ON on my current phone, likely costing me significant chunk of my message archive (that I dutifully transferred over from the previous phone) before I accidentally found and disabled the switch.
So yeah, could be Google Photos deleting it. Or someone else. I don't trust Android as a platform anymore.
BTW. about this "delete old messages" "feature" - most likely this was implemented for performance reasons. But the thing is, you're unlikely to send or receive enough SMS in your whole life for it to take a noticeable amount of space. The irony here is, I do remember a case where the messaging app would become slow and laggy if you had enough texts stored on the phone - but that was solely because someone implemented the message list as a linked list, thus adding a O(N) multiplier to many GUI operations.
Nice project, I wanted to build meme search engine myself one day, but figured Tesseract will fail at most of the memes because of how stylized those have become. So I tuned down my meme source to only /r/bertstrips as those contain sane looking text and it's working quite alright - project has no frontend yet, I search from cli and click links.
> Initial testing with the Postgres Full Text Search indexing functionality proved unusably slow at the scale of anything over a million images, even when allocated the appropriate hardware resources.
I can guarantee you that correctly setup PostgreSQL text search will be faster than ES with much, much less hardware resources needed, it's just a matter of correctly creating tsvector column and creating GIN index on it (and ofc asking right queries so it's actually used). I can help you out setting postgres schema up and debugging queries if you are interested, for testing purposes at least.
I recently worked on a project using lnx.rs. Simple to setup and use and fast at the scale I was using it. Built on Tantivy with a custom fast fuzzy search feature.
If you want to go beyond meme sites and possibly detect memes in the wild, common crawl might be something to start with.
This is really brilliant to see, and I've been surprised for quite a long time that nothing similar exists. I think it's a real shame that few people with interest in memes have interest in building solutions like this that help us engage with them.
People in the 21st century know a lot about the mistakes of the past century that led to much popular culture of the time being lost (especially terminally online people who've watched lots of Youtube documentaries about lost Dr. Who episodes and so on), so it surprises me how little we try and avoid those same mistakes with today's ephemeral pop culture in the form of memes. People like yourself who want to help make the internet's huge corpus of memes tractable are part of the solution in terms of meme archival and cultural memory.
(There's a good meme metadiscussion group on Discord, "The Philosopher's Meme," which you might be interested in joining. People there would be very keen to discuss what you've made.)
> My preliminary speed tests were fairly slow on my Macbook. However, once I deployed the app to an actual iPhone the speed of OCR was extremely promising (possibly due to the Vision framework using the GPU). I was then able to perform extremely accurate OCR on thousands of images in no time at all, even on the budget iPhone models like the 2nd gen SE.
It is already here to be honest. I know BrowserStack and other mobile testing platforms (at Facebook and Amazon) do host real devices, both Android and iPhones, in server farms like this. Meta wrote a blog post about it: https://engineering.fb.com/2016/07/13/android/the-mobile-dev...
At one of my previous workplaces, we discussed running the Z3 theorem prover on an iPhone cluster, because they run so much faster on A series processor than a desktop Intel machine.
I had a friend in med school who wrote a very early note-taking app for the iPad. Turns out that there was no way to render PowerPoint files when the iPad first came out. He realized that the iOS/Mac OS "quick preview" function could be used to take screenshots of each PowerPoint slide. For a brief time, his was the only app that could display PowerPoints (albeit, they were just screenshots!). There's a lot of hidden utility in Apple libraries.
My question is about the image distribution costs. All the memes on the site seem to be coming straight off an object storage, all that bandwidth consumption has got to add up(?). Some sort of a CDN might help depending on the search patterns.
Although not as elegant a solution as this I've also tried my hand as well at indexing and categorizing memes. I wanted to save a very specific type of meme though since there are, in my mind, 2 main categories of memes. The first category are what I call "story" memes, they are standalone and typically what you see being shared on Facebook. They usually have text and are able to tell a story on their own with no additional context and can be presented as a single post, story, etc, (think 4 panel comics). The second type are reaction memes. These are used to respond to people and usually convey a feeling towards a post or tweet. They can also be standalone so they should probably be considered a subset of the "story" memes. I've gravitated towards the reaction memes as I see more utility in them and can be used in a more universal way. My site if anyone is interested (its still a work in progress):
These different approaches really compliment each other - most of the memes you've categorised are used in a variety of situations and therefore not suited to text searching. Meanwhile if you're looking for a specific meme that you've seen, text search is the way to go.
Ideally there would be a best of both worlds where you could search memes by "characters" or "formats" in addition to text.
If you don't need advanced search features, you can use Sonic (https://github.com/valeriansaliou/sonic). It's blazing fast and you can save lot of money on servers.
I sat down literally last night and started sketching out the scratch-my-own-itch solution to more or less exactly this problem, because I too have meme-aphasia where I know there exists a meme that fits perfectly in a conversation, but I have about 5 seconds to find it before the moment passes.
I'm so, so glad to see that I'm not the only person in the world with the same "problem". Well done, mandatory.
I wonder how the performance of Vision.framework on desktop Mac hardware compares to a cluster of phones. (The author mentions that it was "fairly slow", but it sounds like they were running an iOS app in the simulator and not a macOS app.)
Does the Vision API call back to apple servers in any fashion? Like how google on-device voice recognition APIs will call back to Google when you are online (unless you explicitly pass flags to force it in offline mode).
If so, is there any risk in getting your account suspended or ip range banned somehow because of this, for example?
Now, after reading the article, I gave your search engine a try. I was looking for that futurama its a trap meme (pretty much pops up on any image search here https://www.google.com/search?q=futurama+its+a+trap)
The problem is, the search engine you built is now very text-heavy, which seems to be usually very unconnected to the actual meme. So, searching for "its a trap" did not yield the results I was actually hoping for, but made total sense looking at how the search was implemented.
Are you planning to implement an actual tagging of the content of some sorts? Maybe a clustering of similar objects (like iphone clusters similar peoples faces in the gallery) and then tag those clusters with keywords somehow?
Yes I definitely want to improve the search to be better. It is currently very text heavy and I (only recently) got image similarity indexing working. Hoping to leverage this to do something like you mentioned!
I'd also like to figure out how to turn an image into a description of whats in it. My ML/tensorflow knowledge is very weak though, so I still have a lot to learn here.
This is great, I particularly like the part about using compute from old unwanted iPhones. Quite an inventive way to reuse/recycle otherwise obsolete hardware!
I have absolutely no experience in this area and I'm curious:
is there really no open-source text recognition software that's on-par with or close to Apple's (presumably proprietary) implementation? the article mentions Tesseract. is that the current best open-source option?
This is remarkable. I'd love to see that combined with some kind of sentiment analysis like Microsoft offers, just to see if something useful comes out of it.
Sometimes, I don’t know the exact words when looking for a meme, but once I see it, I know that’s the one.
Unfortunately semantic analysis barely works at the best of times, but it especially doesn't work here. Computers… they're just not good at irony.
IME CLIP embedding search can work strangely on memes as well, because it gets confused when images have words in them. Basically the same problem reported in the original CLIP paper where it thinks an apple and a piece of paper with "apple" written on it are the same thing.
> My preliminary speed tests were fairly slow on my Macbook. However, once I deployed the app to an actual iPhone the speed of OCR was extremely promising (possibly due to the Vision framework using the GPU). I was then able to perform extremely accurate OCR on thousands of images in no time at all, even on the budget iPhone models like the 2nd gen SE.
I suppose that’s an old Intel MacBook? I’d be very surprised if the Vision framework performs better on a 2nd gen iPhone SE than even the first M1 MacBook Air.
I have a "hackish but works for me" meme database: I use my Telegram "self chat" to send memes I like to myself, and I tag them with the kind of words I'm likely to search for when looking for them later.
Works great for me.
It's kind of like trying to come up with a good Google search phrase, based on how other people must have phrased something, but relying on knowledge of how you phrase things instead.
I do this for some, but most of the time my use case is to look for something very obscure from a long time ago that I didn't regard as interesting when I first saw it (so didn't make the effort to manually categorize it).
Very inventive. Admittedly when I read the first few paragraphs, I was thinking “he’s got to have $40K of iPhones doing image processing” but you made a good point about being able to use iPhones with screen and other damage.
What was your average price per iPhone, if you don’t mind disclosing?
Last time I looked into OCR stuff I came to a similar conclusion (though I didn't implement anything back thne). It would be really nice to have "open source" models that had similar performance, without having to deal with the iphone cluster hackery.
> Better yet, I don’t even want to use them as phones, so even iPhones that are IMEI banned or are locked to unpopular networks are perfectly fine for my use.
Fences worldwide will be overjoyed to hear of this novel application.
Very cool project! I'll try to remember it the next time I'm looking for a specific image. I noticed that repeated appearances of the search term are ranked higher, which isn't necessarily productive. Also, some kind of duplicate detection would be nice. Searching for "SpongeBob" yields many copies of the same images that mentions "SpongeBob" several times.
I once tried installing it on a recent Ubuntu. After messing with dependency hell and pip downloading half of the internet, when I finally invoked the CLI, it complained _again_ about a missing runtime dependency.
I called it quits. DL people are simply not interested in bundled static binaries.
Pretty insane. If you don’t want to use iPhones, I made macOCR a while back. It uses the same vision APIs, with a very simple CLI interface. See: https://github.com/schappim/macOCR
You can do it on macOS as well, it has the same API for fast high quality OCR. I used it to create an OCR system to detect secrets or credentials in screencasts: https://github.com/peterc/videocr
I'm curious how well the iPhone OCR actually works. How do you deal with errors? Is the error rate low enough that you can accept the output from the iPhone OCR as is or do you also run it through a cleaning process (e.g. spell check)?
For a single data point, it works exceptionally well for me. I routinely copy-paste from images or screenshots, and it rarely fails (mostly for handwriting or obscure fonts).
I am not sure if the Photos app search also uses the same OCR. But sometimes I search for a word and iOS will find a photo that has that text in a 50x20 cluster of pixels somewhere far in the background...it's remarkably good.
I believe I will actually use it a lot if you keep the site up.
Minor feedback for the blog post: It deserves a better meta description (for link previews). The first paragraph doesn't advertise how good the article is going to be.
I tried a few memes. The results were quite poor, and vastly inferior to just using Google. In the case of text searches I had to scroll through dozens of results before finding the original meme images.
Out of curiosity, how does your Image Similarity Search works ? Are you also using some feature of Apple's Vision framework, or running some ML model on your linode instance ?
I think there is right? At least assuming it's the same as Macs where the minimum billing period is 1 day, because you're technically renting a physical Mac (and 1d is the minimum that qualifies as adhering to that).
My #1 recommendation for anyone thinking about the convoluted OCR solution: use a cheap OCR API and save yourself months of time / hassle / upkeep. Google's OCR API is a good place to start, but AWS has one too and dozens of others out there.
Without this "convoluted OCR solution" it never would have been built. Mandatory would have easily had to spend hundreds of thousands of dollars to OCR his meme collection alone, even without scraping other meme sites.
On the contrary, this sort of creative thinking is what's needed instead of automatically reaching for that shiny Cloud Toy. It's easy to get a proof-of-concept working, sure, but at scale, you start torching through cash.
Many places keep adding cloud services to their stacks until one day someone in the C-suite notices the AWS bill.
One thing I'm curious about: how did you build your corpus of meme images and videos?