Illuminating Data and Visualizations:
The Illuminating 2016 project has been collecting Facebook and Twitter messages from the official campaign accounts of all of the major party presidential primary candidates.
We collected both candidate-generated messages and public commentary through Facebook and Twitter’s Application Programming Interface (API). We are still collecting and analyzing the messages in real time. The Illuminating 2016 website refreshes once an hour with the latest data that we’ve collected.
For the past year, we have been creating a system that automatically classifies each message into a category based on what the message is trying to do. For candidate-generated messages, the categories include the following types: urging people to act, changing their opinions through persuasion, informing them about some activity or event, honoring or mourning people or holidays, or on Twitter having a conversation with members of the public.
For public-generated messages, we currently identify messages that are focused on a presidential candidate or surrogate, a political party, or another prominent politician. For the type of message we focused on, we first identify whether it criticizes or shows support to the primary subjects on issues or image, then we identify the target of attacking and supporting in the message. The targets of messages are not featured on the site currently. Although we are collecting public commentary on Facebook and Twitter, the Illuminating 2016 project currently only provides analysis of the public’s comments on political campaign Facebook walls.
Each message is categorized based on particular features of the message that we have identified using Machine Learning methods. Machine Learning is a computational process for classifying unstructured data, like Facebook messages. The process for developing the algorithm requires that humans first develop the categories and then place a sample of messages into the categories. This categorized data is then fed through computer software that look for patterns and features in the messages that are in the same category.
The candidate-generated data currently presented on Illuminating 2016 is accurately categorized approximately 75% of the time. For some categories, the accuracy is up to 80%, such as call-to-action and the persuasive message (advocacy, attack). For call-to-action sub-categories, digital engagement, media and debate appearances, giving money, and vote, the accuracy is about 80%. For image, issue, and endorsement, the accuracy is 76%. For the informative and conversational categories, the accuracy is 70%. For the ceremonial category, the accuracy is lower, at around 40%. The reason for the lower score for this category is that there are far fewer of these messages and they often express a wider range of features making them harder to classify.
For public commentary on Facebook, for the category of politicians, the accuracy is up to 86%. For attacking messages, the accuracy is 74%. For support messages, the accuracy is 70%.
Note that some messages may contain multiple categories. For example, it is possible for messages to be both strategic messages and call-to-action. Currently, they are classified into one of those two categories based on the strength of the features that distinguish the categories. In the future, we hope to enable messages to receive multiple categories.
We are constantly working to improve the accuracy of the algorithms, by using additional techniques available to computational social scientists. As we improve the algorithms, all the data are re-classified to give users of this site the most accurate view of the campaign as possible.
We present the data using interactive visualizations that allow website visitors to explore the data in ways that interest them. Visitors can filter the data based on platform (Twitter or Facebook), candidate’s, time frame and our different message categories. The graphs update automatically each time a visitor changes the filter settings. For example, a visitor might un-check all candidates except for Donald Trump, then ensure that both "Image" and "Issue", under "Attack" are checked. The main plot now shows a comparison of the types of political attack messages Trump has used over the last month. The bar plots lower on the page also update and show the numbers of these kinds of messages with the change Trumps Twitter Followers and Facebook page likes. If the visitor now includes Clinton, by checking the box next to her name, they can instantly compare the differences in attack message strategies of the two candidates.
If the visitor click on the names of the candidates on the main page, they can see a candidate summary, which allows them to check public responses to candidate-generated messages, including the number of re-tweets, shares, and likes. For Facebook, the visitor can also check categories of public commentary, e.g. numbers of messages referring to politicians, surrogates, and parties, number of messages attacking or supporting the primary subject in message.