I have a close relative who is an aspiring movie screenplay writer. During one of our recent meetings, he shared with me with a lot of interesting information about screenplay writing and its challenges. He even lent me a few good books to read further. One of the books is Essentials of Screenwriting by Richard Walter. I thoroughly enjoyed reading it!
After reading the book, I was quite eager to go through some actual film screenplays. I searched the net and found the script for the movie The Prestige (2006). I downloaded the PDF version and started reading it. It was then that another thought crossed my mind: Wouldn’t it be nice if we can parse and analyse the script programmatically? Now, why would anyone want to do that? Computational analysis can throw up interesting statistics about the screenplay, and if you are into text mining, this can be an interesting source of data.
Properly formatted scripts have a well-defined structure. Instead of worrying about low-level parsing, I wanted to convert the script file to XML format so that I can process the file without much difficulty. There is a script writing software called Fade In that allows opening script files in PDF format and exporting in XML. I used that software and converted the original PDF version of the script to XML format.
I then decided to use Mathematica for analysing the script. It is quite straightforward to work with XML files in Mathematica. I am not describing the code in this article, but you can download the source for the program using the link given later in the article.
Let us first load the XML file and list the top 20 Characters that appear in the script.
Remember that the script only talks about characters, not about the real actors who play these characters.
OK. Next, how many scenes are there in the script and what are some scene headings (also called slug lines)? The following table lists the first 25 scenes (there are 219 scenes overall).
Let us find out some of the locations where the scenes are set, and how many such locations are there.
When you study the various scene headings, you will notice prefixes such as INT. (Interior) and EXT. (Exterior), and some form of time specification such as DAY, NIGHT, etc. The following table shows the different types of scenes in this screenplay.
Looks good so far. How can we find out the dialogues spoken by a specific character? That is easy. The following tables lists the first 10 dialogues uttered by the character called Borden.
And if you want to know the number of dialogs uttered by each of the characters in the screenplay, that is also easy to get:
The same information plotted as a Bar chart appears below.
For ease of decoding, I am listing the top 20 dialogue counts by different characters in the following table.
The number of dialogues spoken by each character is likely to reveal the prominence of that character in the movie. Here Angier, Borden and Cutter seem to be key characters.
As you can see, it is quite easy to perform such interesting analysis on a screenplay once you have the right tool. If you are into deep text analysis, you can even scan the dialogues for patterns, sentiments, etc.
That was an interesting exercise for me. I learnt a fair amount of technicalities of movie screenplay writing while doing this study. Hope you enjoyed reading this article! You can find the XML version of the screenplay here, and the Mathematica source here.
I want to take this opportunity to thank you for your continued support. Have a fantastic New Year! Let this year bring you lots of happiness and prosperity!
Recent Comments