Sorry, i've some experience in this field: - This is text-extraction, NOT text-g...

radhakrsna · on July 28, 2019

Hi,

- Yes, you are totally right. TextRank is an extractive method. - Ah right. But we aren't storing any information on our servers, just showing selected sentences to the user. - We select the top 5 sentences which have the highest relevancy to the article. I am not an expert in this field so not too sure if that's the best way. Just started with NLP a few days back and wanted to test it out by developing a small application.

Yes, it works on quite a few articles and but also there are some articles where it fails to give accurate results.

Ah nice, I would like to hear more about what you are working on. Let me know if I could contribute to it in some way.

Thank you again for your feedback.

andreareina · on July 28, 2019

> But we aren't storing any information on our servers, just showing selected sentences to the user.

If this is in reference to the copyright comment, it doesn't matter -- you're still transmitting/redistributing the content, which is what matters. One way to get around this is to ship the code and have the code execute on the user's machine (i.e. what you're presumably doing with the extension).

radhakrsna · on July 28, 2019

Ah right, thank you for the detailed explanation. Currently, we are processing the text using a Python backend. In order to process it on the user's side, I guess we'll have to use Javascript. I will try to fix that in the next version. Thank you very much.

nkozyra · on July 28, 2019

You don't need to move anything to the client side, what you're doing is covered under fair use doctrine.

rrwright · on July 28, 2019

Maybe. Almost nothing is straightforward about fair use.

nkozyra · on July 29, 2019

Agreed, but this is about as close as you can get to safe enough

CodiePetersen · on July 29, 2019

Being that it's the internet you should think more outside whatever country law you are referring to. For example Spain blocked google news because of aggregating the news as is with little to no transformation.

Plus moving it to the client side would free up whatever resources they are currently using to feed summary info to us.

tutfbhuf · on July 28, 2019

yup, switching to js would fix this issue.

air7 · on July 28, 2019

That's such a ridiculous consequence of our field/times.

Edit: How about the backend just returns pointers to the text (word #x till word #y) and the js just (re)assembles it?

jadedtuna · on July 28, 2019

If I understand correctly, that would still require information to be transmitted to the server, ergo copyright infringement.

feanaro · on July 28, 2019

Which information? The server is then not reproducing/transmitting/redistributing the content, only indices into the content. I don't see why this would be copyright infringement.

rnicholus · on July 28, 2019

Hacker news is not a good place to get legal advice. Best to ignore anyone offering it. Talk to a lawyer for legal advice.

feanaro · on July 29, 2019

Sure, it's not, so do not take it as advice, especially not ultimate advice. However, I find the inevitable intellectual shutdown when discussing matters like this even more repugnant and unwarranted.

nothrabannosir · on July 28, 2019

This comment is so poignant it should be part of the site guidelines.

radhakrsna · on July 28, 2019

Right, thanks a lot.

HugThem · on July 28, 2019

    you're still transmitting/redistributing
    the content

Parts of it. Google does the same in their search results. The user can even decide which parts, because they show you the part that contains the search term.

So they provide a service that includes storing your content in it's entirety.

Has this ever been tested in court?

donohoe · on July 28, 2019

Google is doing something different in regular search results.

They are showing a small extract for context OR a summary specified by the publisher.

That’s completely legit and fair use.

CodiePetersen · on July 29, 2019

Yes it has been tested in Spanish courts and google news is blocked there.

nl · on July 28, 2019

Are you a lawyer?

Because there is plenty of precedent for this in available APIs and I've never heard of a case claiming this.

hacker089 · on July 28, 2019

Well, my copyright comment was targetted at a distinct case (like "redistributing the summary" on another website or in a book)

Though, just by copying & summarizing with your current implementation, there would be NO ONE to sue you, since you are just grabbing it and displaying it in the browser (sure, depending on the jurisdiction, one may rate this simple step already as some type of copyright issue)

In reality, this will not happen. (Except in North Korea ;-)

My comment regarding copyright was really about grabbing, summarizing and re-distributing it on another webpage, like a news aggregator.

nkozyra · on July 28, 2019

> If this is in reference to the copyright comment, it doesn't matter -- you're still transmitting/redistributing the content, which is what matters.

I'm pretty sure this would be covered by fair use.

duckqlz · on July 28, 2019

I think google was only retransmitting lyrics, and they are getting sued now. Can’t imagine google was actually storing the lyrics although I may be wrong [1]. If someone could clarify this I would really appreciate it as it has implications for a project I’m currently working on.

[1] https://www.theverge.com/platform/amp/2019/6/16/18681225/gen...

nkozyra · on July 28, 2019

That's not a snippet, though, it's wholesale copying.

When you Google a newspaper article you get a verbatim snippet, same concept.

LaurentD2 · on July 28, 2019

If this were true then Evernote's web clipper would too be infringing copyright... (it is transmitting and redistributing the content)

steve19 · on July 28, 2019

I'm interested in learning about true generation algorithms. Can you point me in the right direction?

p1esk · on July 28, 2019

Google “gpt-2”.

steve19 · on July 28, 2019

Thanks but that is not what the OP is claiming. That generates text from a seed, the OP is talking about an article that generates a summary of an article, but without using existing sentences.

p1esk · on July 28, 2019

Did you read the paper? https://d4mucfpksywv.cloudfront.net/better-language-models/l...

You don’t need any seed, and can generate summaries (section 3.6).

GPT-2 is the model to learn about if you’re interested in NLP.

applecrazy · on July 29, 2019

Are there any papers benchmarking a transformer NN architecture in comparison to something like a pointer-generator network? I'm doing a bit of work in this area (i.e. reimplementing papers), and I'm curious if GPT2-like models can derive greater semantic meaning.

p1esk · on July 29, 2019

Both GPT-2 and pointer-generator network are open source, and pretrained models are available, so it should be straightforward to compare them.