Hacker News new | past | comments | ask | show | jobs | submit login
Multidimensional analysis of plot arcs in thousands of TV and movie scripts (sappingattention.blogspot.com)
103 points by jvmiert on Dec 18, 2014 | hide | past | favorite | 17 comments



Can someone explain how the author classified the topics? As I understood the article (and to be honest, I didn't understand it very well) he:

1. Takes a corpus of 'words often used in topic X'

2. Compares that corpus to the script, divided into 12 sections

3. Gives a value to how much the corpus corresponds to the script

A couple of things which interested me:

* Finding original films - would it be possible to come up with a list of films which have been manually classified as 'romantic' but which don't follow the standard 'romance' plot arc?

* Unusual direction or editing - Are there films for which the dialogue can't be used to classify what's going on? Perhaps analysing the soundtrack (loudness, bpm, minor vs major keys) and the video (brightness, colouring, movement) and comparing it to the dialogue would show something interesting.

* Compare the 'deviation from the norm' to reviews, awards, box office takings, and press coverage.

Unfortunately, I have absolutely no idea how to do something like that. Just wondering if it's been done before.


Answering your first question:

Latent Dirichlet Allocation (http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation )


He links to a good explanation about topic modeling and LDA: http://tedunderwood.com/2012/04/07/topic-modeling-made-just-...


Thank you both.


Answering your second question: "Quest for Fire".


Koyaanisqatsi


What is "unknown" that is included in the graphs? It is not explained. Title only mentions tv and movie scripts.


This analysis shows the already known 3 act structure of most plays. Is it a cross cultural phenomenon?




This is very good. I wonder if this could be used to describe someone's personal taste in movies.


"twelfths"? Why isn't the word count measured in French metric quarts of ink, then? I'm disappointed.

P.S. Cool article though.


Any unit of measurement would have been arbitrary, and twelfths at least lets you refer to the first third, quarter or half. No idea whether that was the rationale, but I don't think any other division would have seriously affected the point the article is making.


Well, if the point of the study would change with the choice of the units, then the study isn't that good, is it? :)

I'm not arguing the point. I'm arguing, that using twelfths has not utility over using per cent, for example, but is counter-intuitive.


Three act structure with each act divided into quarters.

Though, to be honest, three acts is kinda misleading since most second acts have a big turning point in the middle. So while yeah, there's a beginning, middle, and end, that middle is really two acts.


Good point. Splitting it up into quarters would make more sense.


12 parts be quarters split into three as easily as thirds split into four.

Though in any case it would be nice for the article to state why that division was used (even if the reason is completely arbitrary - at least then we'd know to stop guessing!).




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: