logo

William Zhu

Data Scientist by Training
& Social Scientist at Heart


Network Visualization of The Office (US) Characters

[Python Code]

Introduction

Recently, I found The Office (US) transcript dataset from Kaggle, which enables me to explore the network relationship among The Office characters throughout the 9 seasons. The Office script contains 8157 scenes across 201 episodes and 9 seasons. The following table shows the episode and scene distribution by season.

Season 1 2 3 4 5 6 7 8 9 Total
Total Episodes 6 22 25 19 28 26 26 24 25 201
Total Scenes 227 991 1115 726 1031 1022 1012 929 1104 8157

Figure 1 and 2 show the top 10 characters with the most lines and scenes across the 9 seasons in The Office. 40 characters have more than 10 scenes. 30 characters have more than 50 scenes, and 18 characters have more than 200 scenes.

Figure 3 shows the top 10 characters with the most monologue scenes (scenes with only one character). Like those in Figure 1 and 2, the top five spots in Figure 3 are Michael, Dwight, Jim, Pam, and Andy. Despite leaving The Office two seasons early, Michael has by far the most lines, scenes, and monologue scenes among all Office characters. Figure 4 shows the top 10 character pairs with the greatest number of shared scenes. The relationships among Jim, Pam, Michael, and Dwight take the top 6 spots.

Character Network by Season

The following graphs show character network by season. Each point (node) represents a character. Each line (edge) between two nodes indicates that the two characters have appeared in the same scene. The width of the edge presents the number of scenes that the two characters share. The graph applies force-directed graph drawing, which means that characters who have more shared scenes with various other characters (greater centrality) are more likely to appear at the center of the graph. Monologue scenes are excluded from the network graphs.

Most of the following tabs contains two graphs. The first graph shows characters with at least 10 scene interactions. For most seasons (except for Season 1 and 4), having peripheral characters in the graph causes the core characters to cram together at the center. Therefore, the second graph shows only characters with at least 40 scene interactions, so that we can get a better sense of the relationship among the core characters. "All Seasons" tab contains 3 graphs to help viewers zoom in from periphery to the central characters.

Note: A character's total number of "scene interactions" is the sum of the number of other characters in every scene that the character appears in. For character A, interacting with character B in two separate scenes counts as two scene interactions. Interacting with character C and character D in one scene also counts as two scene interactions.

Observations from Season 1 Graphs

  • Jim-Dwight-Pam-Michael relationships form the center of graph.
  • Dwight and Jim have more scenes together than any other character pairs.
  • The Angela-Oscar-Kevin triangle forms the accounting department.
  • The Pam-Angela-Phyllis triangel forms the party planning committee.
  • Creed does not have a single line in scenes of Season 1.
  • Kelly only has 3 scene interactions, and none with Ryan.
  • Roy has more scene interactions with Jim than with Pam.

Observations from Season 2 Graphs

  • Dwight and Michael have more scenes together than any other character pairs.
  • Creed becomes a recurring character.

Observations from Season 3 Graphs

  • Jim moves away from the center of the graph because of his transfer to Stamford during the early part of the season. Jim has more scenes with Karen than with Pam.
  • The Dwight-Michael-Pam relationships carry this season.
  • Andy and Karen are introduced in this season as recurring characters.
  • Oscar moves away from the center of the graph because he was on a vacation.

Observations from Season 4 Graphs

  • Ryan moves toward the periphery of the graph because of his corporate promotion.
  • Despite only appearring at the last episode of the season, Holly appears on the network graph because her central role in the last episode.
  • Dwight moves away from the center of the graph. Though he share plenty of scences with Jim, Pam, Angela, and Michael, he does not have many shared scenes with other characters.

Observations from Season 5 Graphs

  • Erin is introduced in this season as a recurring character.
  • To Michael's horror, Toby is back.
  • This is Jan's last season as a recurring character.

Observations from Season 6 Graphs

  • In the first 5 seasons, audience becomes familiar with the relationships among the core characters. As a result, Seasons 6 to 9 invest in more time to explore the greater Dunder Mifflin universe by introducing new periphery characters. Consequently, the central characters are increasingly crammed to the center of the graphs.
  • Gabe and Jo are introduced in this season as recurring characters due to Sabre buyout.

Observations from Season 7 Graphs

  • This is Michael's last season in the Office.
  • Andy becomes an increasingly important character to the show.

Observations from Season 8 Graphs

  • After Michael's departure, Jim-Andy-Dwight triangle shoulders the burden to connect characters in this season. Andy moves to the center as the new regional manager.

Observations from Season 9 Graphs

  • Dwight moves toward the center of the graph as he (finally) becomes the new regional manager of the Scranton branch.

Observations from All Seasons Graphs

  • Jim-Dwight-Pam-Michael relationship forms the center of graph.

Endnote:

This project is still a work in progress. The next step is to engage in text analysis of the lines in The Office script. If you have new observations about the network graphs or suggestions for directions of further visualization and analysis, please email me. Leave your preferred name in the email if you'd like me to credit you for new updates.

[Back to Top] [Email me Observations] [Back to Projects]