*> I don't think there is an ML which could extract the take aways from each sec...

nudpiedo · on July 29, 2019

is there any working demo to play with? I am not into ML and won't read the whole paper... I expect these sort of mechanisms to make some weighting of the morphemes and sentences and replicate the top score ones whereas a human made summary is based on context and semantics given by humans (for example, the relevant part might be only relevant during the current zeitgeist or for some peoples bias)

applecrazy · on July 29, 2019

I'm not aware of any working demos. In terms of the weighting, your intuition is correct, there's a way how a machine learning algorithm can learn which tokens (what you call morphemes) are of significance automatically. That's called an attention mechanism. It learns what tokens are important and then passes that into a "context vector" of sorts, which is then passed into the generator network (which is some form of recurrent network).

The disadvantage of using only token-based attention is that it's still not enough context for large spans of text (> 1000 tokens), and I'm actually having this issue now in my work. That's where things like applying attention to chunks of text is helpful.

I'm still trying to fully understand attention mechanisms, so I hope this comment made sense.

nudpiedo · on July 29, 2019

but facing such problem with an absolute lack of abstraction on the ideas behind obviously cannot work at any non straightforward example, isn't it? there must be some other steps or algorithms or additions to this process; the language is a protocol to encode a message, a representation about abstract concepts and ideas semantically referenced. It is not the same the HTTP server configuration values and statistical analysis of a xml text encoding those; and that still would be straightforward communication, not metaphoric speech or indirect references.

As long time language learner I can tell you that the most significative words in a text are the less common (that's why learning a language by lists of common words does not work out of the box). In the sort of process you mention the words with more meaning for the idea to communicate might have easily less weight than other more common tokens at the same texts, and it very easily depends on the communication style of the subject.

For me this way of making a summary, looking at the language as a sum of interconnected tokens, sounds like trying to replicate a paint by measuring brush strokes directions and pigmentation (vectors of values) rather than trying to understand the abstract concept or idea behind (a vase with flowers, fruits) and recreating it (use the skill of extracted with the vectors to recreate the abstract idea which was encoded behind).

I think I might end up reading the paper you mentioned.

applecrazy · on July 29, 2019

To address how computers establish context, usually the tokens are sent through an embedding layer (basically turning a word into a dense vector). Word embeddings, surprisingly, reflect the real-world semantic relationships behind words (for instance, king - man + woman = queen). It also happens that the algorithms for creating word embeddings group words with similar meanings/semantics together (for example, king and queen would be closer together in n-dimensional space). Overall, embeddings allow a computer to look at a language as more than a bunch of interconnected tokens and actually incorporate the semantics of each word into its predictions. Of course, this isn't perfect, which is why we need attention mechanisms and various other methods to help a computer try to understand context long-term.