Time: Spring semester 2022, Tuesdays 11:00 - 14:00
Location: 1330-024 Undervisningslokale
Instructor: Tobias Widmann (Office: 1341-122)
Exam Format: 7-day take-home exam
With the rise of the internet, the availability of new data sources vastly changed over the course of the past decades. Social scientists can nowadays rely on huge data sets consisting of videos, images or text to answer pressing societal questions. In particular, the amount of available text data has exploded due to the growth of websites such as Twitter, Facebook, Google and Wikipedia. This increase has further been strengthened by the digitisation of historical archives, journalistic corpora and administrative records. However, collecting and analysing large amounts of text data also present new challenges for researchers and students.
The aim of this class is to introduce students to the computational analysis of text from a social science perspective, with a special focus on politics. To do so, the course covers the theoretical foundation as well as the practical application of text analysis approaches. However, the course is predominantly practical in nature and aims to give students the tools to perform their own analyses. Thus, we focus on empirical questions we can ask with text-as-data and learn how to answer them. To do so, students are provided with hands-on exercises during class using the R statistical programming language. Furthermore, we discuss recent examples of empirical research that rely on text analysis techniques.
Overall, the course will cover a range of popular techniques for collecting, processing and analysing text-based data. These range from data collection techniques to supervised and unsupervised approaches. Among others, the course will cover topics such as:
The individual lessons of this course cover different aspects and build on top of each other. For instance, we start out with a basic introduction, followed by data collection and data preparation and finally move on to more complex forms of text analysis. To follow this course, basic R knowledge would be beneficial, but advanced programming skills are not required. We will learn how to use R/Rstudio and the necessary packages together in class.
Each class consists of two different formats. In the lecture part of the course, the instructor will introduce new text analysis techniques and present example studies. In the practical part of the class, students work individually or in groups on weekly assignments. Students should finish these assignments at home between classes and the solution of these assignments will be discussed together in class in the following week.
At the end of the course the student:
Readings Week 1:
Readings Week 2:
Readings Week 3:
Readings Week 4:
Readings Week 6:
Readings Week 7:
Readings Week 8:
Readings Week 9:
Readings Week 10:
Readings Week 11:
Readings Week 12:
Readings Week 13:
Readings Week 14: