#Silly Kung Fu Titles with word2vec
Inspired by the ever wonderful Lynn Cherny’s word2vec experimentation, I wanted to use this opportunity to experiment just a bit with word embeddings and “word arithmetic”.
So I came up with an idea too silly to be useful, but fun for me to play with. For each noun detected in a title, I used Gensim’s word2vec implementation to find the nearest word to that noun, after subtracting “Chinese”, and adding another country.
Again, don’t take this seriously, I was just wanting to see what would happen.
So, cherry-picking a comical example:
buddha - Chinese + American = God Fearin'
See? kind of funny at least in theory. I did this for every title, for a number of different countries. In practice, the results aren’t as hilarious as I had hopped for, probably because country vectors don’t impact the trajectory of most of these words much. Also, I used a pre-trained model which obviously impacts the embeddings.
But, I made a little toy for exploring these new titles anyways, check it out!
The raw results and the code I used to generate these are also available.
#Actor Troupes, Groups, and Clusters
Let’s end this exploration with a few insights into the actors in these films.
Even with my novice-level consumption of Shaw Brothers films, one thing you notice early on is a lot of familiar faces show up over and over in many of the movies.
We can see the extent of actor-over-use with another simple chart counting the number of movies frequently seen actors are found in.
Wow! Ku Feng apparently appeared in 82 Kung Fu movies. That’s a lot of Kung Fu!
His Wikipedia page isn’t as impressed with this feat as I am, providing little information on this Martial Arts Maniac. Apparently his real name is Chan Sze-man, and his first film was in 1959, and he is still acting today! The HKMDB, or Hong Kong Movie Database, provides just a bit more info:
In 1965, Ku formally signed an acting contract with Shaw Brothers where he made around 100 films for them and became most notably known as one of their top character actors. He has worked with just about every top Hong Kong director in a variety of films.
Ok then, well props to you Ku.
Did most of the top actors’ careers span multiple decades, or did actors come and go quickly?
We can graph the number of years an actor was featured in a movie over the total number of years in our dataset:
Pe
For the top actors, we see most were active more than half of the entire time Shaw Brothers Studios was making Kung Fu movies.
Here’s another quick graph showing the beginning and ending of these actors’ tenures:
#Finding a Mob of Venoms
One phrase that came up when researching these Shaw Brothers films, related to actor-reuse, is the Venom Mob, a group of actors that did indeed appear in a lot of Shaw Brothers films together. They became well known after the success of The Five Venoms, hence the catchy name.
So, can we find a Mob of Venoms in our data?
Inspired by David Robinson’s network analysis of Love Actually, I decided to try out the igraph package for a bit of network exploration.
After a lot of filtering and frustration, I ended up with a basic, but still fairly hairball-y network:
In this network, the nodes are actors who have appeared in many films. The edges are co-occurrences of actors in the same movies, with the width of the edges proportional to the number of movies they were found together in.
You can see that there are a lot of actors appear together.
The Venom Mob includes Chiang Sheng, so here, I’ve highlighted in red everyone he is connected with. In this sub-cluster, you can see Philip Kwok, Lu Feng, and many of the other Venoms.
igraph is great for digging into properties of nodes, edges, and networks - and there is plenty more that could be done in this tool just for this simple dataset. I however, was wanting a bit more of an interactive exploratory tool that I could use to browse Kung Fu actor connections.
You were too? Great! That’s why I created the amazing Shaw Brothers Actors Network Visualization.
With it, you can clearly pick out the Venom Mob on the right, in the screenshot. But that’s not all, you can also browse all the movies actors appeared together in, and modify the network in lots of fun ways.
The code is based on my interactive network Flowing Data tutorial, which recently has been updated to use plain old Javascript and D3v4.
That wraps up my little data-driven exploration of Shaw Brothers films. As with any analysis, there’s plenty more to explore - but hopefully this was fun for you too, and inspires some data-driven exploration of your own.