Campaign 2020 Usage Guidelines

Our Data

Illuminating 2020 focuses on collecting Presidential candidate’s political advertisements and categorizing them with our previously developed codebook. We use Apache Airflow to fuel our data pipelines that collect streaming ads data from the Facebook Ad Library APIs. The database gets updated every 4 hours with new metrics and new posts and gets tagged by our machine learning models. This data contains ads from Facebook and Instagram for all the presidential candidates with valid Facebook and Instagram accounts. We pull data only from the official candidate pages, and not from other entities advertising on the candidates behalf. We only pull data for candidates who ran long enough to be included in debates. The Facebook Ad Library API provides spending and impressions data for each ad in ranges, including a minimum and maximum amount. To estimate the amount of spending and impressions for an ad, we take the midpoint of each range. The data we provide are thus estimates and should be treated as such.

Message Type

The Illuminating team developed a codebook to classify political speech found in political candidates’ social media posts, which was then applied to political advertisements. Using this codebook, coders could apply up to five labels to a given ad. Prior to creating any gold-standard datasets, human coders trained in pairs on pre-existing sets of gold-standard social media messages until achieving consistent intercoder agreement. The gold-standard labels were used to train the Machine Learning algorithms. From there, the ML models labeled the complete set of candidate ads, which are used in the interactive dashboards.

Below are the five categories that are used to classify candidates’ political advertisements. Messages can include as many of the following labels that are applicable.

  • Advocacy messages advocate for a candidate, highlight their strengths as a leader, emphasize their popularity, describe their positive personality characteristics, and/or explain their prior/future policies.
  • Attack messages criticize a clear target, such as the opponent, opposing administration, or opposing party, affiliated political institutions, social media platforms, newspapers, and/or politicized figures.
  • Persona messages primarily describe the candidate or the opponent’s character, personality, style and/or values, such as their competence, popularity, moral character, charm or level of benevolence.
  • Issue messages primarily describe the candidate or opponent’s issue/policy positions, both past or future. This can refer to economic issues, social programs, immigration and citizenship, environmental issues, safety, military, foreign policy, governance, social/cultural issues, and media issues. The Issue category includes broad claims about the state of the country and/or national (rather than individual) values.
  • Call to Action messages are those that include a clear directive for readers to take (e.g., watch, retweet, share, etc.).
Message Topic

Topics are not mutually exclusive.

  • Economic

    This category includes messages that deal with (non-exhaustive):

    • Federal spending (Taxes, IRS, National debt, pork barrel spending)
    • Wages (teacher pay)
    • Infrastructure (transportation, roads, housing, etc.)
    • Business policy, including doing business with other countries
    • Financial sector issues (mortgages, investments, banking, unions)
    • Discussions about socioeconomic classes (upper class, the middle class, 99%, gender pay gap)
    • U.S. trade policy and how it impacts us economically at home
    • Job creation (in context of green new deal/blue new deal)
    • Reparation (payments to descendants of former slaves)
  • Social Programs

    This category includes messages that deal with (non-exhaustive):

    • Health Care (health insurance, dealing with epidemics, PPE, costs, Obamacare)
    • Social Security (medicaid, medicare)
    • Welfare programs (affordable housing)
    • General solutions to social/cultural issues
  • Immigration

    This category includes messages that deal with (non-exhaustive):

    • Protecting our borders (border wall, crime and job issues related to illegal immigration, “safe haven cities”, Mexico, Middle East, closing borders, ICE)
    • Citizenship (Green cards, visas, green card lottery, anchor babies)
    • Immigrant students
  • Environment

    This category includes messages that deal with (non-exhaustive):

    • Global warming/Climate change (greenhouse gases, pollution, overpopulation, natural disasters, water pollution)
    • Renewable Energy (clean coal, wind, wind farms, electric cars)
    • Green New Deal, Blue New Deal --may also be economic if references to new jobs
  • Safety

    This category includes messages that deal with (non-exhaustive):

    • Crime and General Safety in our country (crime rate, school violence, murder rates, suicide rate, death penalty, mass shooting, sexual assault, Patriot Act)
    • Drugs & Alcohol (war on drugs, increasing drinking age (unless it’s focused on social programs)
    • Gun Legislation (gun laws, second amendment, gun ownership, automatic weapons)
    • NSA (Snowden, surveillance of the public, cybersecurity, domestic terrorism)
    • Failure of government to keep people safe
  • Military

    This category includes messages that deal with (non-exhaustive):

    • Veteran’s affairs (healthcare, treatment, PTSD, housing)
    • Service (military preparedness, draft, selective service),
    • Military technology R&D
    • Military budget & spending
  • Foreign Policy

    This category includes messages that deal with (non-exhaustive):

    • Trade agreements or pacts with other nations, NATO
    • Policies related to foreign governments and institutions,
    • Aid to other countries
    • War on Terror/Homeland Security (war in Iraq and Afghanistan (includes violations of civil liberties in U.S., radical Islam, ISIS, policy relating to war in Iraq)
  • Governance

    This category includes messages that deal with (non-exhaustive):

    • Policies around how government works (the size of government, specific accusations of corruption, shutdowns)
    • Process of governance (elections and campaigns, executive action, congressional hearings)
    • Judicial matters (balance of powers, Supreme Court)
    • Campaign (finance, political parties, 527s, cabinet positions, FBI surveillance/monitoring of campaign (Clinton emails, Trump, Comey), transparency of tax returns)
    • Voting (elections, electoral college, right to vote, gerrymandering, purging voter rolls, voter fraud, issues related to voter turnout i.e. voter ID laws, how candidates compare to current administration, issues of access to debates for candidates, power of Super PACs in elections)
    • Citizen rights (Bill of Rights, citizen rights, civil asset forfeiture;
    • Media related issues (Freedom of speech, SM policies) [doesn’t include Trump complaining about #fakenews]
  • Social and cultural

    This category includes messages that deal with (non-exhaustive):

    • Values on how we should treat all people, but especially minority and disenfranchised population (Culture Wars)
    • Women’s Issues (including abstinence programs, contraception, stem cells, fetus heartbeat discussions, Planned Parenthood, reproductive rights)
    • LGBT Issues (Gay/Lesbian/Gay Marriage/Civil Unions)
    • Racial Issues (HBCUs)
    • Religious Issues (e.g. Muslims and their rights)
    • Disability rights
    • Aging population Issues
  • Health

    This category includes messages that deal with (non-exhaustive):

    • Virus outbreaks (Zika, Ebola, COVID-19)
    • Vaccines (Existing and new development)
    • Health Science (new technology, new treatments)
    • Illnesses (cancer, diabetes, asthma, alzheimers, heart attacks, obesity, etc)
    • Treatments (insulin, foreign/cheaper drugs, therapy (conversion therapy included), counseling etc)
    • Access to healthy things or services (food, prison healthcare, abortion, big pharma raising prices, etc)
    • Health system (Obamacare, health insurance, paying for treatment)
    • Mental health issues (PTSD, addiction, suicide, etc.)
  • Education
    • Financing education (debt, debt-free college, loans, paying teachers, for-profit universities)
    • Education standards (common core)
    • Safety in schools (shootings, etc.)
    • Teacher education (training)
    • Subject-specific learning (English language learning)
    • Education for prisoners/training to rejoin society
    • HBCUs
      • COVID
        • Origins of the virus (China, bats, etc)
        • Government stopping the spread of the virus (flatten the curve, lockdown, quarantine, social distancing, closing borders, etc.)
        • Medical professionals (Hospital capacity, doctors overworked, etc)
        • Medical supplies (PPE, ventilators, masks, disinfectants, etc)
        • Possible treatments (vaccines, remdesivir, hydrochloroquine, etc)
        • Economic outcomes (jobs loss, recession, stimulus checks, panic shopping, shortages, etc)
Message Tone

The Illuminating team developed a codebook to classify political speech acts found in political candidates’ social media posts, which was then applied to political advertisements. Using this codebook, coders could apply the categories of civil and uncivil. Prior to creating any gold-standard datasets, human coders trained in pairs on pre-existing sets of gold-standard social media messages until achieving consistent intercoder agreement. The gold-standard labels were used to train the Machine Learning algorithms. From there, the ML models labeled the complete set of candidate ads, which are used in the interactive dashboards. Unlike other approaches to classifying civility, we do not use a lexicon - or list of words - approach. Our approach provides a more nuanced classification of civility than simply hostile and derogatory words.

Thus, for this project we define Uncivil messages as having a disrespectful or rude tone, for instance, using insulting language (demeaning, belittling, mean or vulgar) towards opponents or groups, lying and deception accusations, mockery, and misrepresentative exaggeration.

How we classify our data

We use specialized Deep Learning Algorithms for Natural Language Processing like BERT to classify our data. The pre-trained BERT models are trained (fine-tuned for text classification tasks) on a sample of the corpus from the 2016 and 2020 Facebook post content marked by annotators. These trained models are then used to mark the whole corpus and graphs on our website are based on the corpus. Each label is treated as a distinct binary class and labels are not interdependent.

Machine Learning training and data filtering

We used a pre-trained BERT (Bidirectional Encoder Representations from Transformers) Base model (12-layer, 768-hidden, 12-heads, 110M parameters) which is trained on Wikipedia and English textbooks.

We used Facebook post data from 2016 and 2020 to fine-tune pre-trained BERT models. This model was then tested on 700 unique ads from 2020 Facebook ads Data. Further, these models were used to mark the whole data corpus. The optimal hyperparameters (epochs, batch size, input length, weight balancing etc.) were used for each label.

Citation
To cite the categories and the machine learning work that created the 2020 data:

Gupta, S., Bolden, S.E., Kachhadia, J., Korsunska, A., Stromer-Galley, J. (2020) PoliBERT: Classifying political social media messages with BERT. Paper presented at the Social, Cultural and Behavioral Modeling (SBP-BRIMS 2020) conference. Washington, DC, October 18-21, 2020.