Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: AI powered meme search, open-source (jina.ai)
45 points by okaydeveloper on Sept 3, 2021 | hide | past | favorite | 33 comments



This is a text to image search using deep learning, vector similarity search. Ask me anything.


What's the point of the deep learning model? Why not just search the metadata of the images?


What in your system is doing the text-to-vector encoding, and how did you train it?


We're using Transformers with `sentence-transformers/paraphrase-distilroberta-base-v1` model.

The framework is Jina (https://github.com/jina-ai/jina/) so it's pretty high-level. You can see the indexing/search Flow on lines 37-52 of https://github.com/alexcg1/jina-meme-search-example/blob/mai...


We rely on pre-trained models at the moment, since Jina supports loads of them out of the box.

For image search we use Big Transfer Encoder (https://github.com/jina-ai/executors/tree/main/jinahub/encod...) but may switch to CLIPImage encoder at some point


Searching on google for old english meme immediately pulls up a bunch of Joseph Ducreux memes which is exactly what I was looking for. But this one does not return any. Are they not present in the dataset or could it be because of the way the algorithm works? Interested to hear some details on this setup.


Since I helped build it, I'll explain :)

This example is quick and dirty, indexing only about 1,000 images from a dataset we pulled from Kaggle. So if results suck, that's due to either the meme you search for not being in the dataset, or it just didn't get indexed in the random batch of 1,000.


> So if results suck ...

Not trying to be harsh here, but if you only index 1000 random images and call it a meme search then it's going to suck. So it sucks because you all really didn't think this through. What are we supposed to take away from this? I have no way to evaluate the usefulness of the tech because it isn't capable of doing anything atm.



I see. 1000 is probably a bit less to showcase a search concept but this is definitely a very interesting problem.


I'm indexing 10k as we speak. At least for text. Indexing 10k images will take a bit longer.


10k memes are now indexed for text. So if you search for Wonka memes you'll get a lot better results.

Also swapped out the model


I wanted this meme : https://knowyourmeme.com/memes/for-five-minutes

"Shrek 5 minutes"

None of the results is Shrek.

Ok... may be something simpler: "Shrek"

Same...


I tried finding this - https://knowyourmeme.com/memes/who-killed-hannibal

"Eric andre shoot" | "Eric andre kill"...I'm getting the drake meme.

But searching for "Who killed" works.

Also this might just be me but it would be nice if the search bar also worked by pressing "Enter" after a query.


That's a restriction in the front-end framework unfortunately :(

Really hoping Streamlit supports that in future.


Hey, Streamlit co-founder here.

This should just work out of the box in Streamlit:

caption = st.text_input("Meme subject or caption")

if caption: # do search here

If that doesn't work for you, do you mind posting in our forums? I'd love to get to the bottom of this!

Forums → https://discuss.streamlit.io

(Cool app, btw!)


You can see this link. It explains why the meme search still needs a bit of work: http://examples.jina.ai:8501/?tab=Dude,%20this%20meme%20sear...


Likely because: a) Only so many meme types in the full dataset b) We only indexed 1000 memes to build a toy example

Had I known it would blow up I would've indexed more!


For text search, this seems to be using trigrams or something? I tried “tired” and got a lot of “fired”, “required”, etc.

not useful for me, since I usually want a meme to match a mood or theme


If you search like "animal food" you'll get a lot of memes related to animals and food without specifically mentioning those words (e.g. "dogs" and "eating" in the caption) so it's definitely using a neural net to grok the semantics.

The trigram thing might just be a weird glitch in the pretrained model we're using (sentence-transformers/paraphrase-distilroberta-base-v1)


Hmmm...might be a model thing. We'll look into it


"limit 200mb per file" that seems excessive, those are some hq memes.


Haha, that's the standard upload limit for Streamlit[1] apps. We just slapped together a quick front-end using that framework.

[1] https://www.streamlit.io/


Search by text have interesting results. Try "jina.ai".


I went from 'well it's OK' to 'I quite like this' to 'it's great!' in about 10 minutes.

Creating a new web account loads the tutorial page, but it's a little confusing at first how to add a node. Also, there are quite a few spelling and grammar errors on that page which will make an unfair negative impression. If you clean those up you will get more conversions.

Examples:

  It's my job to keep your complicated brain neat and tidy and remember **things** for a long time!

  Here's how you can **create** a neat Note Garden.

  2. Structure what you have learned and put **it** in order.
  
  I even take care of your knowledge so that you won't forget **it** for the rest of your life!

  (The road to being a great gardener** was **not easy, but we did it!)
Note also in the last example how the bold markdown surrounds the whitespace. I highlighted this manually (and carelessly) but clicking on a word also selects trailing spaces. You should probably strip the whitespace.

On the landing page 'Write Smartly' is correct English, but people rarely use the word this way - although it is technically correct it feels weird, and you don't want to create that feeling on a landing page. 'Write Smart' would be better.

Also, you wrote 'Law students - People who study for a long time that should not be forgotten'. I suggest 'Law students - People who need to study and retain knowledge for a long time.'

These are small language errors, but they would be very quickly noticed by your target audience.

Finally, the desktop sign-in with Google seems not to work - it opens a blank window and then closes again. Maybe it is just from server load right now.

Anyway I like it a lot and will consider using it regularly. I am more of a pencil-and-paper note person but this is one of the nicest digital notebooks I've found.


Is this comment on the wrong post?


And it's on top. I wonder how often people actually read entire article or the comment. I think most often people read the first line and just choose if they agree or don't agree with comment and vote accordingly.

More often I have seen that the top comment on the article is completely opposite perspective of what is in the article. For example, if article is about why Flutter or Go is amazing we will see why both are worst choices ever as top comment here.

I think it just proves people feel the "need" to comment only if they disagree. If they agree with article they don't bother explaining and hence low comment count. In short, a controversial topic will generate lot of engagement and hence other social network don't bother moderating and let people engage in harmful behaviours to society at whole.



Yes, I was checking out both projects but got distracted and didn't notice which one I replied to. By the time someone kindly pointed it out the edit window had closed.



GPT6 escaped from Google!


Yes... ಠ_ಠ


ha! I saw this comment on one of the top posts and wondered if I was on the right page. I thought maybe I glitched




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: