Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I made an extension to watch Netflix films with screenplays in sync (screenplaysubs.com)
289 points by justEgan on Aug 16, 2020 | hide | past | favorite | 62 comments



Very interesting!

Fun tidbit: for TV actors, regularly reading pilot scripts and then watching the produced pilot for comparison is a huge common educational technique. You get to imagine what kind of acting and directorial choices you'd make, and then see what was actually done. Often times you'll realize you had totally misinterpreted what a scene was even about.

It's also fun to see how every script is filled with lines that are "unactable" -- there's just no way any real person would ever say anything like that. Then nine times out of ten, those lines are cut from the final product, because even the best actors couldn't make them work.


Fascinating! It can also be the other way around, where an actor miraculously interpreted an unactable script well. E.g. Joaquin Phoenix delivered "It vexes me. I am vexed." in Gladiator quite well.


I watched a video recently that touched on this subject when talking about the TV show House. If you read the script for a regular House episode, there's no conclusion to be had besides House being an insufferable, racist prick. Instead, Hugh Laurie delivers the obviously racist lines so sarcastically that it makes them work very well (which is obviously what the writers were going on).


I've recently watched House and I found this more uncomfy than I did the first time.

House (the character) is often being plainly racist and sexist. The fact that he presents it as sarcasm is a vehicle for him, used to make his racism more difficult to challenge.


But can’t shows be about racist people? I mean there’s shows about murderers all day. Do people critique shows (think Dexter) as “uncomfy”? You can laugh at, and even find endearing, racists characters. It’s part of being a grown up I believe.


If there are more shows with racist people as the endearing title character than there are with black people as the endearing title character, is that not a problem?


For sure. The way it's delivered doesn't excuse the content in the slightest, but it still plays off the tone the show is going for.


Nitpick, but I think it was "It vexes me. I'm terribly vexed." Otherwise agree with you though. It's not something you imagine someone saying in the course of normal, or even quite abnormal, conversation, but Joaquin Phoenix pulled it off in spades - sounded totally natural, and entirely fit with the character of Commodus.


I feel like Seinfeld has a ton of those. Jason Alexander, Julia Louise-Dreyfus, and Michael Richards had a lot of moments where their acting ability carried what would otherwise be a mundane script.


FYI, English "to vex" is actually from the Latin vēxāre, meaning "to disturb, agitate, or annoy." I find it believable that Commodus would say "Hoc me vexat. Tantum hoc me vexat."


I think in a similar way this could also lower the barrier of entry into screenwriting.


This is great!

Also check out LanguageLearningWithNetflix [0] which lets you watch videos with two subs in different languages, displays the subs as HTML so select/copy/define will work (and it has a built-in dictionary too). It also allows you to quickly jump to the beginning of each sentence so you can hear it multiple times, which helps improve your listening skills. For me, it has been a fun way to improve my German.

On a side-note, please notice how none of these great features are available to mobile users. iOS for example, is technically perfectly capable of supporting this kind of extensibility, but the App Store model limits it to a few narrow and specific use-cases.

[0] https://languagelearningwithnetflix.com


> iOS for example, is technically perfectly capable of supporting this kind of extensibility, but the App Store model limits it to a few narrow and specific use-cases

Injecting third-party code into a third-party app that has to deal with DRM sounds like a recipe for disaster.


Well don't even get me started on DRM :)

But this particular use-case would still play well with DRM as it's implemented today. Netflix on the browser still has DRM, but since the <video> element is standard, it can still be hooked into and decorated.


Sure... but "technically perfectly capable of supporting this kind of extensibility" is not true in my opinion, you'd need quite a few changes to allow this to happen safely. Apple doesn't even want apps to be sideloaded, so that's a stretch :)


App Store is not really the best environment for niche apps, I suppose. What about Android? I'd presume F-Droid could host something like this even if the Play Market doesn't?


LLN is good but seems to favor anti-features like disabling text selection on subtitles (except in the side pane) in order to push their in-house paid-for features.

Regarding your comment about extensibility, this applies to mostly every software platform other than the web, unfortunately. And even there, it feels like a happy accident. There's much work to be done in this area.


Text selection is a fiddly CSS issue that Ognjen didn't get around to fixing yet. There's only two paid features (saving words and machine translation.) About 3500 paying users at $3.50/mo after taxes and fees (from 800k total users), but before paying for servers/APIs, if you are curious. Free users are welcome.

[ Dear Netflix, let's be friends. There's a lot of work to do still, and we can go further, faster, with some small helps. Can we get a test account? Regards, David. languagelearningextension@gmail.com ]


LLWN is useful just for the a/s/d shortcuts like the ability to hit "S" to repeat a sentence until you understand it.


Thanks for the kind words. :)


Oh, someone has to manually time the movie. They only support about six movies. I expected that it would use closed captioning data, do the sync automatically, and support far more titles.


We had seamless synchronized open-standard multimedia (W3C SMIL) in 2000 but not 2020. There are now attempts to bring back a subset via epub standards. Meanwhile, most Internet traffic is now video and there's no standard mechanism to provide contextual timed commentary and other annotations.


The syncing is done automatically, at least mostly.

TL;DR: ScreenplaySubs fetches the subtitles from Netflix, parses the PDF-formatted screenplays into JSON, and syncs by calculating the sentence similarities between subtitle and screenplay dialogue.

In particular, we use the Universal Sentence Encoder for deciding whether a subtitle matches with a screenplay dialogue. If a screenplay dialogue is similar enough with the subtitles, the former will be tagged with the timestamp provided by the latter.

A lot of the underlying problems presented with each step sounds deceptively simple at first, but turns out to be quite challenging and fun to research. E.g. Parsing PDFs in general are not straightforward (https://filingdb.com/b/pdf-text-extraction), and there’s only a handful of resources on parsing PDF screenplays beside a handful of research papers (https://github.com/drwiner/ScreenPy/blob/master/INT17_screen...), which lead us to create our own open source repo for this (https://github.com/SMASH-CUT/screenplay-pdf-to-json).

Our screenplay-pdf-to-JSON converter is able to contain all dialogues, transitions, actions within a particular screenplay scene. With this, we’re treating scenes as atomic, being able to detect changes in scene ordering based on the tagged scene timestamps. This also means if dialogues are swapped within a scene in the movie, there will be some syncing inconsistencies.

Some scenes do have little to no dialogues, which would pretty much cause the extension to work on a best-effort basis. E.g. The opening scene of There Will Be Blood has very minimal if not no dialogue at all. This is the case where I need to jump in and sync up the screenplay manually. OTOH, the opening scene of Inglourious Basterds will work very well, since there are tons of dialogues in it. This is the reason why I can’t just add movies and instantly upload it to the site.

Would you be interested for me to get into more details? I was thinking of writing a series of technical blog posts if there are enough interests!


Interesting work. Glad you've been able to chart a path through some tedious problems.

Over the last several years I've imagined a lot of projects (both serious utilities, and the absurd/artistic) in roughly the territory you're exploring...

- For my MFA thesis (2012) I used plaintext (thankfully, though they had plenty of their own problems) transcripts of a TV show as a corpus for generating poems from, and at the time I thought it would be an interesting follow-up project to turn them back into video clips.

- Mapping film quotes/citations back to the script/film and accuracy-checking movie quotes. (can imagine both of these being useful for film forums like the movies/sci-fi stack-exchange sites).

- Generating script-cuts of movies that re-order/drop scenes and just show the printed script on-screen where scenes were cut.

- A film-analysis/screenwriting-class sort of interface oriented around reading a segment and then playing it (could be particularly interesting when there happen to be multiple known script drafts?)

- Re-constructing a character monologue from lines spoken by an actor that turned down the role.

- Generating a super-cut of actor X saying Y.

- Generating focused cuts of a film that cover, say, every scene a given character does/doesn't appear in, or every scene that mentions X.


Please blog about the details! Are you following the W3C work on synchronized multimedia?

https://github.com/w3c/sync-media-pub

https://www.w3.org/community/sync-media-pub/


Will do! I am not aware of that, tell me more!


I'd definitely be interested to read more about the tech. I wonder if it can be used to time-sync audiobooks to their ebooks counterparts.

This is my use-case:

Kindle has a feature called "Audible Narration." You buy a Kindle book, and the Audible audio book, which allows you to play the audio book while it highlights the words on the Kindle book as you're listening. This effortless switching between audio and text enables some interesting reading behavior. Certain books become easier to read. Note taking also gets much easier (Highlighting text is much easier than bookmarking timestamps on an audio book).

The problem is, getting your annotations and highlights and other data out of Kindle is very difficult, because Kindle does not have a public API. Same with Audible.

So I'm thinking of emulating Audible narration with a hybrid ebook/audiobook reader app. The ebook would be a simple HTML page (converted from epub, formatting be damned) and a simple audio player. As the audio plays, the HTML page would scroll and words would be highlighted.

Challenge is to timestamp tag the HTML with the audio track. I'd guess I could TTS the audio track and then somehow diff the generated text with the epub content. Given that some audiobooks are abridged, some read the footnotes on each mention, and some explain the visuals, I would assume diffing would not be very straightforward.

Do you know of any solutions I could look into?


The task is called 'forced alignment', take a look at aeneas and other projects at https://www.readbeyond.it/ :) IIRC, Aeneas has some features for handling extra text and the beginning/end of the book, while abridgement etc. isn't handled.


You just saved me weeks of work. Thank you :)


> ScreenplaySubs fetches the subtitles from Netflix

How is this done? Isn't everything on Netflix protected by DRM?


You can fetch them by recording the network requests, as explained in this repo: https://github.com/isaacbernat/netflix-to-srt


Is there any support for querying actual timestamp of the video?


Very cool. I have been fascinated by this whole area of what I call “media stapling” since I spent about two years obsessively watching the Big Lebowski as a stress reliever. This film has no commentary track so people have recorded their own and you have to sort of just manually sync up the mp3. I also do a lot of interview transcription where text is stapled to audio.

Anyway I see you have a comment here where you say you use the closed captions to figure out where to staple in the script. Would be cool to be able to staple in arbitrary other media - text audio video whatever.


Sounds like Rifftrax, the successor to Mystery science Theater 3000


Wow, thanks for the pointer, had no idea about this. They have an app that looks like it could be so cool if they opened it to other people to use as a platform. Although for now just using it for their own commentaries (which I'm sure are great). Sort of a Hollywood approach as opposed to a Silicon Valley approach (even if they aren't literally in Hollywood).


i'd be interested in hearing those commentaries


I've seen plugins like this from time to time and I always wonder to what extent using them with a secured service (like Netflix etc) means that you've opened yourself up to them doing all sorts of things with your account. You need to login and once that's done the plugin code effectively acts as you doesn't it? I'm guessing there are Chrome/FF protections on the password field, but if the plugin can do anything on a site, might it not draw their own fake password box on top of the real one?

I'm certainly not suggesting this is done by this author and I applaud the creation of the tool, but I'd be interested to hear opinions as to whether my interpretation above is correct or if I'm overly cautious/overlooking something.


I mean, unsurprisingly, it looks like it requests permissions to execute arbitrary code on your behalf on netflix.com. So yeah, it can do... a lot. It could, for example, click the logout button on your behalf, wait for you to log back in again and keylog your password when you do (there aren't any special protections there -- you can access it like any other field from privileged JS, and other extensions like password managers depend on this being the case), then use that to, in the background, change your password and recovery email, and then log you back out again. Any use of Chrome extensions that can execute scripts requires some degree of trust, for better or for worse.


I believe when an extension requires matches permission for say ://netflix.com/, it asks for permission to load the content script to the browser tab that has that URL opened. Which means that even if the extension involves the slightest bit of modification on the UI, it still requires the same permission as one that involves the user's sensitive information. It seems this page suggests that the extension could also read usernames and password: https://support.mozilla.org/en-US/kb/permission-request-mess...

For what it's worth, we can confidently say that our extension does UI modifications without ever being involved with user sensitive info. Regardless, will definitely open source the extension. Hopefully this will win some user's trust. Stay tuned!


I am curious too. I wonder what are the limits of an extension.


ScreenplaySubs is a browser extension for Netflix that syncs up movies with screenplays, displaying them side by side. It's like having a subtitle that provides more insights to your films.

Demo: https://vimeo.com/447986440


I really like the Amazon Prime Video "X ray" feature that shows the actors, bios, etc, in the current scene. It's odd to me that other services like Netflix don't have something similar.


Most likely other services don't do this because of patents.


and they don't own IMDB where much of this information is already organized into a usable state.


As I checked the demo video and read the screenplay, I could actually imagine the shots, camera angle - the images basically appeared in my head with Tom Holland in them (without actually playing the video or remembering the movie). This is very interesting.


It would be interesting to have a TTS synth output the screenplay on another card, one which could be used by a blind person to plug some headphones in (not covering all the sound, in order to hear the environment and speeches). Maybe even optionally disable the spoken words, and only output the scene description, and emit a beep on a cut.

The demo on the page looks great, and this is stuff which should be automatable at some point by AI.


Where do you get the screenplays, out of curiosity?


I wondered this, too. Also--which draft, if there's more than one? Is it like, "latest available", or "easiest to parse", or do you have a policy like only shooting scripts?


not OP but I imagine from something like this https://www.imsdb.com


Nicely implemented, you just need to recruit people to add more movies!


It's not for me but it's a great extension!


much appreciated!


I don't use netflix and prefer subtitles, but it looks nice!

Maybe you could implement smooth scroll and some sort of an overlay mode.


Thank you, and that's a great idea I've considered for future releases. More specifically, the layout presented in this video looks ideal: https://www.youtube.com/watch?v=HybzbDBF7HQ. Where the screenplay can replace the letterbox at the bottom.

One of the reasons we decided not to implement that for now is to provide a bigger room for error since our algorithm is still not perfect. Sometimes the extension choose to focus on 1 or 2 sentences next to the accurate dialogue. Having an entire viewport height to show the screenplay means even if some inconsistencies occur, the user may still be able to see the accurate dialogue.


makes sense :)

yea I like this example better, but thought of an actual overlay.


This is Chrome extension that is asking to read browser history. Why?


Really cool! It seems like the data synchronization is done manually?


There’s some automation going on. Check out my reply to one of the skeptical comments!


This cool. I did this exact thing in 2013 with DistanceFlix.com


Nice!

Nerd out with your word out.


Site is flagged from work PC


Data source for GPT-4?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: