Political Data Science
Political Data Science - 262F26
General Information
Course code: 262F26
Semester: Spring 2026
Instructors:
- Tobias Widmann (widmann@ps.au.dk)
- Jesper Rasmussen (jr@ps.au.dk)
Homework: Almost each week, there will be a group homework for you to work on in your study groups. Please upload the rendered HTML version of your homework to the SharePoint: Political Data Science Spring 2026
Course Content
This course introduces methods broadly termed “data science” - from automated data collection and processing of unstructured data to visualization and methods for exploiting big data. The substantive focus is on applying these skills to study politics and social science more broadly. The public sector produces vast quantities of data. Managing and analyzing this information is crucial for academics who study how government functions, for government employees who aim to improve government performance, and for the media and businesses who rely on public data for their work.
Learning Goals
- Use the programming language R to explore, analyze, and communicate data.
- Execute basic machine learning algorithms for data analysis and discuss their strengths and weaknesses.
- Match analysis choices to different types of research designs and different types of data.
- Build custom software to collect a variety of data types from the Internet.
- Output attractive and informative graphics, tables, and reports from analyses.
- Discuss the ethical implications and limitations of using data science methods to inform (political) decision making.
Course Structure
Week 1: Introduction
Date: February 6, 2026
Instructor: Tobias
Practice
- Getting to know R & RStudio
Readings: None
Week 2: All (Some Things) About R
Date: February 13, 2026
Instructor: Jesper
Practice
- Data wrangling
- Transforming
- Tidying
- Functions
- ggplot2
- Exploring data
Required Reading
- Wickham, H., & Grolemund, G. R for Data Science (2nd ed.), Chapters 1-9. Online book: https://r4ds.hadley.nz/ (approx. 70 pages).
Week 3: Working with Text 1
Date: February 20, 2026
Instructor: Jesper
Practice
- Sentiment
- Complexity
- Applications in R (quanteda package, regular expressions)
Required Readings
Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). “Psychological Aspects of Natural Language Use: Our Words, Our Selves.” Annual Review of Psychology, 54(1), 547-577. https://doi.org/10.1146/annurev.psych.54.101601.145041
Proksch, S.-O., Lowe, W., Wackerle, J., & Soroka, S. N. (2019). “Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Legislative Speeches.” Legislative Studies Quarterly, 44(1), 97-131.
Week 4: Working with Text 2
Date: February 27, 2026
Instructor: Jesper
Practice
- Topic modeling
- Applications in R
Required Readings
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). “Structural Topic Models for Open-Ended Survey Responses.” American Journal of Political Science, 58(4), 1064-1082. https://doi.org/10.1111/ajps.12103
Eshima, S., Imai, K., & Sasaki, T. (2023). “Keyword-Assisted Topic Models.” American Journal of Political Science. https://doi.org/10.1111/ajps.12779
Week 5: Working with Text 3
Date: March 6, 2026
Instructor: Jesper
Practice
- Classification
- Embeddings
- LLMs
- Applications in R (e.g., openai, rOllama)
Required Readings
Gilardi, F., et al. (2023). “ChatGPT Outperforms Crowd Workers for Text-Annotation Tasks.” Proceedings of the National Academy of Sciences, 120(30). https://doi.org/10.1073/pnas.2305016120
Shanahan, M. (2024). “Talking about Large Language Models.” Communications of the ACM, 67(2), 68-79. https://doi.org/10.1145/3624724
Wankmuller, S. (2022). Introduction to Neural Transfer Learning With Transformers for Social Science Text Analysis. Sociological Methods & Research. https://doi.org/10.1177/00491241221134527
Tornberg, P. (2024). “Large Language Models Outperform Expert Coders and Supervised Classifiers at Annotating Political Social Media Messages.” Social Science Computer Review.
Rasmussen, J., Rathje, S., Van Bavel, J. J., & Robertson, C. (2025). “Negativity and Identity Language Have Additive Effects on Online News Consumption.” Preprint. https://doi.org/10.31234/osf.io/h3cn5_v1
Week 6: Collecting Data from the Web 1: Scraping Static Websites
Date: March 13, 2026
Instructor: Jesper
Practice
- Applications in R
- Rvest package from the tidyverse
Readings
- Freelon, D. (2018). “Computational Research in the Post-API Age.” Political Communication, 0(0), 1-4. https://doi.org/10.1080/10584609.2018.1477506
Optional Reading
- Mancosu, M., & Vegetti, F. (2020). “What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data.” Social Media + Society. https://doi.org/10.1177/2056305120940703
Week 7: Collecting Data from the Web 2: APIs (httr)
Date: Monday, March 16, 2026
Instructor: Jesper
Practice
- Bluesky / Manifesto Project / Pushshift / Google Search / Folketinget
Readings
- Cooksey, B. An Introduction to APIs (all chapters). https://zapier.com/learn/apis/
Background Readings
Wickham, H. (2017). “Getting Started with httr.” httr: Tools for Working with URLs and HTTP. https://httr2.r-lib.org/articles/httr2.html
Stephens-Davidowitz, S. (2014). “The Cost of Racial Animus on a Black Candidate: Evidence Using Google Search Data.” Journal of Public Economics, 118, 26-40.
Week 8: Research Design Workshop
Date: March 27, 2026
Instructor: Jesper
Practice
- Workshop research ideas
Required Reading
- Grimmer, J. (2015). “We Are All Social Scientists Now: How Big Data, Machine Learning, and Causal Inference Work Together.” PS: Political Science & Politics, 48(1), 80-83.
Week 9: Modeling 1
Date: April 10, 2026
Instructor: Tobias
Practice
- Modeling logic in data science (train/validate/test), cross-validation, hyperparameter tuning
- Linear regression
Readings
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms. Chapter 1.
Urdinez, F., & Cruz, A. (2020). R for Political Data Science: A Practical Guide (1st ed.). Chapter 5. https://doi-org.ez.statsbiblioteket.dk/10.1201/9781003010623
Week 10: Modeling 2
Date: April 17, 2026
Instructor: Tobias
Practice
- Random forest
- Logistic regression
- tidymodels
Readings
- James, G., et al. (2013). An Introduction to Statistical Learning. Chapters 4 & 8.
Week 11: Working with Network Data
Date: April 22, 2026
Instructor: Tobias
Practice
- Ideology estimation
Readings
Barbera, P. (2015). “Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data.” Political Analysis, 23(1), 76-91.
Eady, G., Bonneau, R., Tucker, J. A., & Nagler, J. (2025). “News Sharing on Social Media: Mapping the Ideology of News Media, Politicians, and the Mass Public.” Political Analysis, 33(2), 73-90.
Week 12: Synopsis Workshop
Date: May 1, 2026
Instructor: Tobias
Practice
- No scheduled practice
Readings: None
Week 13: Ethics
Date: May 8, 2026
Instructor: Tobias
Practice
- No scheduled practice
Preparation Materials
O’Neil, L. (2023). “These Women Tried to Warn Us About AI.” Rolling Stone. https://www.rollingstone.com/culture/culture-features/women-warnings-ai-danger-risk-before-chatgpt-1234804367/
Weapons of Math Destruction talk. https://www.youtube.com/watch?v=TQHs8SA1qpk&ab_channel=TalksatGoogle
Background Readings
Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P. S., Mellor, J., et al. (2022). “Taxonomy of Risks Posed by Language Models.” In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency.
Bianchi, F., Kalluri, P., Durmus, E., Ladhak, F., Cheng, M., Nozza, D., et al. (2023). “Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale.” In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency.
Spirling, A. (2023). “Why Open-Source Generative AI Models Are an Ethical Way Forward for Science.” Nature, 616(7957), 413-413. https://doi.org/10.1038/d41586-023-01295-4
Week 14: Wrap Up
Date: May 15, 2026
Instructor: Tobias
Practice
- No scheduled practice
Readings: None