Political Data Science

Political Data Science - 262F26

Author

Tobias Widmann; Jesper Rasmussen


General Information

Course code: 262F26

Semester: Spring 2026

Instructors:

  • Tobias Widmann (widmann@ps.au.dk)
  • Jesper Rasmussen (jr@ps.au.dk)

Homework: Almost each week, there will be a group homework for you to work on in your study groups. Please upload the rendered HTML version of your homework to the SharePoint: Political Data Science Spring 2026


Course Content

This course introduces methods broadly termed “data science” - from automated data collection and processing of unstructured data to visualization and methods for exploiting big data. The substantive focus is on applying these skills to study politics and social science more broadly. The public sector produces vast quantities of data. Managing and analyzing this information is crucial for academics who study how government functions, for government employees who aim to improve government performance, and for the media and businesses who rely on public data for their work.

Learning Goals

  • Use the programming language R to explore, analyze, and communicate data.
  • Execute basic machine learning algorithms for data analysis and discuss their strengths and weaknesses.
  • Match analysis choices to different types of research designs and different types of data.
  • Build custom software to collect a variety of data types from the Internet.
  • Output attractive and informative graphics, tables, and reports from analyses.
  • Discuss the ethical implications and limitations of using data science methods to inform (political) decision making.

Course Structure

Week 1: Introduction

Date: February 6, 2026

Instructor: Tobias

Practice

  • Getting to know R & RStudio

Readings: None

Week 2: All (Some Things) About R

Date: February 13, 2026

Instructor: Jesper

Practice

  • Data wrangling
  • Transforming
  • Tidying
  • Functions
  • ggplot2
  • Exploring data

Required Reading

  • Wickham, H., & Grolemund, G. R for Data Science (2nd ed.), Chapters 1-9. Online book: https://r4ds.hadley.nz/ (approx. 70 pages).

Week 3: Working with Text 1

Date: February 20, 2026

Instructor: Jesper

Practice

  • Sentiment
  • Complexity
  • Applications in R (quanteda package, regular expressions)

Required Readings

  • Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). “Psychological Aspects of Natural Language Use: Our Words, Our Selves.” Annual Review of Psychology, 54(1), 547-577. https://doi.org/10.1146/annurev.psych.54.101601.145041

  • Proksch, S.-O., Lowe, W., Wackerle, J., & Soroka, S. N. (2019). “Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Legislative Speeches.” Legislative Studies Quarterly, 44(1), 97-131.

Week 4: Working with Text 2

Date: February 27, 2026

Instructor: Jesper

Practice

  • Topic modeling
  • Applications in R

Required Readings

  • Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). “Structural Topic Models for Open-Ended Survey Responses.” American Journal of Political Science, 58(4), 1064-1082. https://doi.org/10.1111/ajps.12103

  • Eshima, S., Imai, K., & Sasaki, T. (2023). “Keyword-Assisted Topic Models.” American Journal of Political Science. https://doi.org/10.1111/ajps.12779

Week 5: Working with Text 3

Date: March 6, 2026

Instructor: Jesper

Practice

  • Classification
  • Embeddings
  • LLMs
  • Applications in R (e.g., openai, rOllama)

Required Readings

  • Gilardi, F., et al. (2023). “ChatGPT Outperforms Crowd Workers for Text-Annotation Tasks.” Proceedings of the National Academy of Sciences, 120(30). https://doi.org/10.1073/pnas.2305016120

  • Shanahan, M. (2024). “Talking about Large Language Models.” Communications of the ACM, 67(2), 68-79. https://doi.org/10.1145/3624724

  • Wankmuller, S. (2022). Introduction to Neural Transfer Learning With Transformers for Social Science Text Analysis. Sociological Methods & Research. https://doi.org/10.1177/00491241221134527

  • Tornberg, P. (2024). “Large Language Models Outperform Expert Coders and Supervised Classifiers at Annotating Political Social Media Messages.” Social Science Computer Review.

  • Rasmussen, J., Rathje, S., Van Bavel, J. J., & Robertson, C. (2025). “Negativity and Identity Language Have Additive Effects on Online News Consumption.” Preprint. https://doi.org/10.31234/osf.io/h3cn5_v1

Week 6: Collecting Data from the Web 1: Scraping Static Websites

Date: March 13, 2026

Instructor: Jesper

Practice

  • Applications in R
  • Rvest package from the tidyverse

Readings

Optional Reading

  • Mancosu, M., & Vegetti, F. (2020). “What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data.” Social Media + Society. https://doi.org/10.1177/2056305120940703

Week 7: Collecting Data from the Web 2: APIs (httr)

Date: Monday, March 16, 2026

Instructor: Jesper

Practice

  • Bluesky / Manifesto Project / Pushshift / Google Search / Folketinget

Readings

Background Readings

  • Wickham, H. (2017). “Getting Started with httr.” httr: Tools for Working with URLs and HTTP. https://httr2.r-lib.org/articles/httr2.html

  • Stephens-Davidowitz, S. (2014). “The Cost of Racial Animus on a Black Candidate: Evidence Using Google Search Data.” Journal of Public Economics, 118, 26-40.

Week 8: Research Design Workshop

Date: March 27, 2026

Instructor: Jesper

Practice

  • Workshop research ideas

Required Reading

  • Grimmer, J. (2015). “We Are All Social Scientists Now: How Big Data, Machine Learning, and Causal Inference Work Together.” PS: Political Science & Politics, 48(1), 80-83.

Week 9: Modeling 1

Date: April 10, 2026

Instructor: Tobias

Practice

  • Modeling logic in data science (train/validate/test), cross-validation, hyperparameter tuning
  • Linear regression

Readings

Week 10: Modeling 2

Date: April 17, 2026

Instructor: Tobias

Practice

  • Random forest
  • Logistic regression
  • tidymodels

Readings

  • James, G., et al. (2013). An Introduction to Statistical Learning. Chapters 4 & 8.

Week 11: Working with Network Data

Date: April 22, 2026

Instructor: Tobias

Practice

  • Ideology estimation

Readings

  • Barbera, P. (2015). “Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data.” Political Analysis, 23(1), 76-91.

  • Eady, G., Bonneau, R., Tucker, J. A., & Nagler, J. (2025). “News Sharing on Social Media: Mapping the Ideology of News Media, Politicians, and the Mass Public.” Political Analysis, 33(2), 73-90.

Week 12: Synopsis Workshop

Date: May 1, 2026

Instructor: Tobias

Practice

  • No scheduled practice

Readings: None

Week 13: Ethics

Date: May 8, 2026

Instructor: Tobias

Practice

  • No scheduled practice

Preparation Materials

Background Readings

  • Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P. S., Mellor, J., et al. (2022). “Taxonomy of Risks Posed by Language Models.” In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency.

  • Bianchi, F., Kalluri, P., Durmus, E., Ladhak, F., Cheng, M., Nozza, D., et al. (2023). “Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale.” In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency.

  • Spirling, A. (2023). “Why Open-Source Generative AI Models Are an Ethical Way Forward for Science.” Nature, 616(7957), 413-413. https://doi.org/10.1038/d41586-023-01295-4

Week 14: Wrap Up

Date: May 15, 2026

Instructor: Tobias

Practice

  • No scheduled practice

Readings: None