Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Bulletpapers – ArXiv AI paper summarizer, won Anthropic Hackathon (bulletpapers.ai)
178 points by mattfalconer on Nov 8, 2023 | hide | past | favorite | 53 comments



Since the point of a title and abstract of a paper is to be a useful summary of the whole paper, the existence of this tool is an indictment of researchers to do this effectively.

One thing I could imagine being useful is to summarize it for a lay audience (rather than the intended audience of the paper).


> Since the point of a title and abstract of a paper is to be a useful summary of the whole paper,

I'd like to nit pick this a little. The title and abstract of the paper is optimized towards reviewers. There is the assumption that reviewers are aligned with researchers seeking to read new papers (after all, papers are simply the act of communicating from researcher to researcher), but I don't think this assumption is actually valid.

I complain a lot about reviews, but I'll give an example that is simple and I'm sure is true in almost every domain: paper length. There are far too many papers that could be a page or two but are 10 because that's what the journal/conference requires. If you don't fill the pages you're more likely to get rejected as the reviewer has more validity to claim your work was not thorough enough rather than your explanation simply being concise[0]. This is only exaggerated as we are in a publish or perish paradigm and publishing faster and in more competitive environments. But papers are like wizards: they're meant to be as long as they're meant to be. No more, no less. (At least that's how they should be if you're targeting fellow researchers in your niche)

It's an all too common mistake to believe that metrics are perfectly aligned with some well defined but abstract goal. Rather they are generally aligned with proxies that correlate with the desired goal. You'll find this everywhere from trying to measure the quality of LLMs to trying to exterminate cobras in Colonial India. Pay close attention or Goodhart will be turning in his grave.

I'd say more about the science communication aspect but I don't want to rant too much and I think one could guess a much lengthier response extrapolating from my thoughts above.

[0] Similarly papers get cut to fit the length and tough decisions are made about what goes in the front matter vs the appendix because reviewers are not required to read the appendix and a large portion simply do not (https://twitter.com/sarahookr/status/1660250223745314819).


The real silver bullet would be not to summarize for a general layperson but somehow, personally, for the context of "me", whoever that is. For instance, the amount of approximation language and jargon can vary to make it optimally accessible

I'm pretty sure this is achievable right now with just a lot of work.


I don’t think it’s that much work, if you have a general sense of your knowledge domains. I described the platform I work on in a few sentences, my knowledge level, and what I like out of a response and it’s been pretty great for that.


I was thinking of a more generalized one with "personas" which use things like embeddings and hypernetworks to "know" you.

Of course the presumption here is that more useful results would avail themselves under the added training load. It might be just as good as the simple usecase.

For instance, ideally it would know your strengths, weaknesses, blind spots and misconceptions so it will know what you don't know you don't know.


That is what it tries to do, the title, bulletpoints, summary and 'FAQs' try to simplify the paper, but LLMs are hard to tame when given an entire paper in the context window.


I would assume that the existence of abstracts and titles is what allows tools like this to be effective at all. Abstracts are probably shorter than a useful summary of a paper should be, and this is by design. You would prefer that a summary have a few more things, like how experiments were actually conducted, but the abstract tells you what words to look for to find that.


Abstracts are often quite a bit shorter than proper summaries and function more like elevator pitches (mainly to reviewers and other people in the field). They are not intended to do what you say they should.

If you suggest to also add summaries, that is something I could get behind, but right now your criticism is a bit misplaced.


What I really want to see more than anything is an upfront explanation of what the paper’s novel new technique or idea is. Sometimes abstracts include that, but not always, but I wish it were SOP for all scientific or technical papers to include in the abstract. If AI could make it SOP that would be an improvement, but it may take AGI to be able to make that distinction.


It's interesting you say that, because I've always found abstracts from academic papers incredibly dense and hard to follow. They tend to really not summarize very well


I think needing LLM summarizers to read a paper at all only highlights the failure of paper authors writing the abstracts. Let's face it: Abstracts are getting intentionally more complex and hard-to-read or reviewers will question the paper's writing. If the LLM summarizers were useful, the authors could have generated it and just used it in the paper's abstract section. And indeed, there is no better person than the authors to edit the LLM generated summary because they know what parts of the summary is hallucinated and what is not, right?


An abstract generally has the following format, it starts by describing the background of the problem, the problem the paper aims to solve, the method the paper uses, and finally a conclusion. The abstract doesn't assume much prior knowledge, and can probably still be understood 10 or 20 years from now. Whereas you can see how the LLM summarized version totally skips the background and jumps straight to the problem and the method.

Now, I'm not saying there is no room for improvements. The fixed format an academic paper has with abstract and the actual paper may actually be replaced by what is shown here, and I genuinely hope to see more experimentation with the communication of scientific studies, but that is unfortunately not being focused on in the academic world.


It makes sense to debate what should be included in the abstract. Should background, problem, method or conclusion be included? My personal preference is to read the problem and method only because that’s what gives me inspiration and helps me decide whether the paper is relevant. I acknowledge everyone may have their own preference, and as mentioned in other comments, a major feature of LLMs is that you can fine-tune it using instructions to decide the level of detail that you want. But I think the main contention is that the paper authors could have done just slightly more work beyond getting the paper accepted to have the paper reach a much wider audience than their specific field.


I think it matters who we want to write papers to. Right now we write papers and abstracts to reviewers. That's because that's how we're measured and that's where we compete. But I'd say that we generally believe that papers are written to other researchers, which I agree that that should be the goal. But as this competition is increasing we're starting to write more to media as this can usually pass review and gathers lots of citations (these people tend to be from big schools too which have large media arms and are willing to pay for articles in news venues).

This is why I'm deeply frustrated with academia right now. Papers are supposed to be how I communicate to my fellow researchers working on the same or similar topic. They're not for communicating to someone in a different field and not for communicating to the public layman (nor should they be!). It is the job of science communicators to act as the bridge between laymen and researcher, which a lot do a poor job as they're beholden to the YouTube algorithm, not accuracy. Hell, Quanta published a shit piece recently about quantum wormholes and machine learning and what did they do when it was called out? Just write another article and add a note on their youtube video. Nature is pulling similar shit. I get wanting to make science popular and exciting, but truth/accuracy has a lower bound in complexity whereas fantasy doesn't.

https://www.quantamagazine.org/physicists-create-a-wormhole-...

https://www.quantamagazine.org/wormhole-experiment-called-in...

https://www.youtube.com/watch?v=uOJCS1W1uzg


I really wish they would limit abstracts to those. I feel like abstracts should be like an index/jumping off point in spirit if not in structure


There's a larger problem with writing papers, that results in this problem that I discuss here[0] and it looks like you're getting at too. What I'd say is that authors (being one myself) are writing papers to reviewers, not to other researchers in my niche (sometimes it feels like not even my several levels of abstraction above).

I can confirm having to significantly tweak papers in ways that I would not have done writing to other researchers. I have papers with hundreds of citations as well as top benchmark scores on papers that could not pass reviews with the most common complaints of "not novel" and "I don't know who would find this useful." This has been one of the most challenging aspects of my PhD and certainly one of the most frustrating.

But the larger problem I see is that everyone is simply hyper-hacking every metric that they can. This is beyond academics, I'm sure you see it in your work or politics too. I think we need to have a serious discussion as to the fact that metrics are proxies and not always aligned with our goals. Or that they stay aligned with our goals, because if someone gets an advantage by optimizing towards the metric rather than optimizing towards the abstract goal we actually want.

It's not AI turning the world into a paperclip that we should be afraid of, it's humans doing that.

[0] https://news.ycombinator.com/item?id=38200598


In math, the job of writing a comprehensive summary is usually done in other places. Some of them are even accessible for free, such as zbMATH [1], a big database of comprehensive summaries of math articles.

[1]: https://zbmath.org/about


All of it is hallucinated, some times it happens to be accurate, some other times it is not.


I am doing AI paper summaries for my substack. My insight here is that usually the relevant information to understand the paper is not actually in the paper. For example, when writing about the DALL-E 3 paper, the insight is to understand the problem of image captions on Internet scale data and how a captioner can solve this but its not necessarily in the paper.


This is such an amazing insight and hits the nail right on the head (for why the above project and 99% of "AI" fears are nonsense).

Reading any scientific paper usually takes me about 1 day, if I actually want to understand it. I've been in my field a decade but still, to read one paper usually means reading AT LEAST one other paper along the way, but I don't know which of the 100s of citations I will need until I understand what I don't understand, AI can't do that for me.

AI is like the crypto hype but for the HN crowd, except with basically no real world use cases.


Do you think it is completely out of reach for the AI to follow those rabbit holes automatically and tie in the useful information? Could it not also be personalized to the users knowledge of the subject?

I'm actively working on the first problem. The second is in my todo list.


Currently? Yes. This is a challenging problem for someone with decades of experience. I'm not sure you can train an LLM to appropriately do this because I can't even begin to describe how one would generate an adequate cost function. I don't think even RLHF can resolve that aspect because the truth of the matter is that I don't know what's important in that rabbit hole until I spend time working on the problem, replicating, or have sufficient experience. All too common a single line can make or break an algorithm and that line is 3 papers back. All too common there's nuances that radically change results that aren't even in the papers themselves.

I hope you succeed, but personally I don't know how this could be solved. The problem is that I don't actually need better summarization, its that I need more nuance and technical aspects. The problem exists because we're writing to larger audiences as competition increases and the quality of reviewing decreases (we even have a shortage which only exacerbates this problem). I'm not sure AI solves existential problems that are built around reward hacking, in fact everything I've seen suggests they explicitly do the opposite. I mean we literally train them to do that...


As another ML researcher I'll second this. When I review papers for conferences it takes me hours to review a work that's in my niche. It's because you can't skim a paper unless it is exceptionally good (lol) or has major flaws. A single sentence often holds the magic keys to making an algorithm work, and I don't think many would realize this unless they try to reproduce works from the paper alone (not using lucidrains or the official implementation). Even lucidrains makes some of these mistakes. And yes, even in my own niche, I go back and reread papers that are key references to make sure I didn't forget key details and understand the exact limitations the authors are addressing. The main thing is that the closer it is to my exact same niche the fewer reference papers I have to read (because I know which ones matter) and the faster I can read those papers. It's ensuring I am not forgetting key nuances of specific datasets or specific metrics that are used, because not keeping these in mind will trick me into wrong conclusions. This is what's required if I want to give a quality review. It's what's required if I see myself as on the same team as authors (team science) and helping them make the best work they can.

But I'll admit that there's a lot of pressure for me to stop doing this. A big part being that it's very clear my reviewers are not prescribing to this tactic. Rather I think many reviewers are not concerned with the rigor of their reviews. That they do not see themselves on the same team but rather antagonistic (team conference/team journal) and that their job is to filter. But I think an issue is that in ML you get an advantage if you are reject heavy and lazy in reviewing. Not only do you save on the time it takes to review but since it is a zero sum game you ever so slightly increase the odds of your own work being accepted. Honestly I do not feel the process is very scientific. Even the new CVPR LLM rules are a joke. More signaling than solutions. I just wonder if people care about the science anymore.


Have you ever used a summary service? Have you ever build or even done something with ChatGPT or competitors?

There's tons of real world uses and you're falling behind if you think there aint.


One problem with summarization that I think is overlooked is that it relies on the context of the reader. A lot of summarization assume some ambiguous level of context, but it would be way better if you knew exactly where the readers coming from and used that knowledge to perform a summary


EXACTLY! I said the same thing above. The very idea of this is nonsense. An AI can't tell me what I don't understand.


100%, if we had real personalized AI companions that actually had an idea of your knowledge state then summarization would be a game changer.


An AI can teach you something you don't understand :)


I love the prospect of summarizing papers in layman terms, tuned to my own definition of "layman", and never looking back to the pop-science clickbait world that I've grown to detest.


Anyone know how these images were created or what type of prompt would be required. For example this one...

https://www.bulletpapers.ai/paper/1edec37d-e8c5-43ab-bfec-90...

I really like the Japanese / anime style.


The model generates a prompt that's sent to SDXL, but it’s “seeded” with some randomness, we have an array of adjectives & colours that we input to try to make the output different.

Primarily the model is trying to generate “abstract publication cover art for a research paper covering the following topics…”.


Seems like a LoRA, check out Civitai, here is the anime category for example: https://civitai.com/tag/anime


not crazy with some of the results, e.g., https://www.bulletpapers.ai/paper/1d002187-927d-6775-94e2-a4...

- Bulletpapers title: Using robots to map and digitize construction sites

- Paper title: Multi-agent robotic systems and exploration algorithms: Applications for data collection in construction sites

- Bullets / Key Details: + Proposes methodology for multi-robot systems in construction sites + Robots use exploration algorithms to navigate autonomously + Information from building plans guides exploration + Robots digitize environments by 3D scanning as they explore + System is robust, efficient, requires minimal human involvement

- Generated Summary: This paper proposes using multiple robots with different capabilities working together to map and digitize construction sites. The robots use exploration algorithms to autonomously navigate and scan the environment. Information from building plans helps guide the exploration. The multi-robot system is robust, efficient, and requires minimal human involvement.

This all reads like the info was gathered from the abstract instead of the paper itself....that said, this is good AI generation info for IEEE explore to implement i guess


Agreed. It was built in 24 hours, and not as perfect as we'd like, but it does actually take the entire paper as context - the models aren't there yet, but we're going to keep refining this until it's super useful.


I built something similar for non fiction books and articles: https://findsight.ai. It’s nearing 10,000 users.


down for me


Back up :)


Reminds me of the more ML specific https://paperswithcode.com/


Did google stop its book scanning? with the data they had and now with these kinds of models the ability to search by topic in those books would be amazing. You forgot the book name but remember a bit of the story explain it and it is found for you. Or better yet run in on the libgen library


How did the abstract summarizations compare to other approaches (e.g. pointer-generator networks)? Any idea of improvement, to warrant the setup?


Was the entire frontend built during the hackathon? It's incredibly polished! What did you use?


I competed against these guys in the hackathon, it was absurd how much more polished everything was compared to the rest of us!


how exactly does this work? is it placing calling gpt-4 with a template you've created or is something more fancy happening? and if its using gpt-4, why would someone use your website over just asking gpt-4 themselves (assuming they have it available)


Anthropic hackathon, assuming they used Claude. Also dramatically larger context window, until this week.


> You:

> what was your name before it was Bullet?

> Bullet:

> I don't have a previous name. I was created by Anthropic to be called Claude.


I'm guessing it's a plain old gpt wrapper - not exactly sure how novel this is


You should make a weekly newsletter with a few: new / featured / hot papers


how about a version that works on a custom/curated journals feed?


what's the tech stack?


The blur effect that shows up when you start chatting is very annoying -- I want to be able to see the details of the paper so I can ask about them!


Good point, we're updating now.


It also appears like the filter dropdown in the bottom right corner needs additional space from the bottom of the screen (right now, the dropdown is cut off by the page boundary).

That said, nice site! The interface feels very intentional.


The site is down for me.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: