DARIAH-Campus

ExploreCor - Using Programmable Corpora in Computational Literary Studies
EN
This three-day training school organised by the CLS INFRA project focused on dynamic collections of literary texts manipulated programmatically. Learners will learn to find, evaluate, and select corpora using tools like CLSCor and DraCor, and gain skills in Python, Jupyter Notebooks, API querying, Linked Open Data, and Digital Literary Network Analysis. The training addresses reproducibility using Docker, promoting transparent, replicable research in Computational Literary Studies.
Authors, editors, and contributors
Julia Jennifer Beine
Ingo Börner
Floor Buschenhenke
Analyzing Multilingual French and Russian Text using NLTK, spaCy, and Stanza
EN
This lesson covers tokenization, part-of-speech tagging, and lemmatization, as well as automatic language detection, for non-English and multilingual text. You'll learn how to use the Python packages NLTK, spaCy, and Stanza to analyze a multilingual Russian and French text.
Authors, editors, and contributors
Ian Goodale
Laura Alice Chapot
Facial Recognition in Historical Photographs with Artificial Intelligence in Python
EN
In this lesson, you'll learn computer vision and machine learning principles for object recognition, and how to apply these principles using Python to recognize and classify smiling faces in historical photographs.
Authors, editors, and contributors
Charles Goldberg
Zach Haala
Giulia Taurino
Understanding and Creating Word Embeddings
EN
Word embeddings allow you to analyze the usage of different terms in a corpus of texts by capturing information about their contextual usage. Through a primarily theoretical lens, this lesson will teach you how to prepare a corpus and train a word embedding model. You will explore how word vectors work, how to interpret them, and how to answer humanities research questions using them.
Authors, editors, and contributors
Avery Blankenship
Sarah Connell
Quinn Dombrowski
Creating Interactive Visualizations with Plotly
EN
This lesson demonstrates how to create interactive data visualizations in Python with Plotly's open-source graphing libraries using materials from the Historical Violence Database.
Authors, editors, and contributors
Grace Di Méo
Scott Kleinman
Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision
EN
Tools for machine transcription of handwriting are practical and labour-saving if you need to analyse or present text in digital form. This lesson will explain how to write a Python program to transcribe handwritten documents using Microsoft's Azure Cognitive Services, a commercially available service that has a cost-free option for low volumes of use.
Authors, editors, and contributors
Jeff Blackadar
Giulia Taurino
Clustering and Visualising Documents Using Word Embeddings
EN
This lesson uses word embeddings and clustering algorithms in Python to identify groups of similar documents in a corpus of approximately 9,000 academic abstracts. It will teach you the basics of dimensionality reduction for extracting structure from a large corpus and how to evaluate your results.
Authors, editors, and contributors
Jonathan Reades
Jennie Williams
Alex Wermer-Colan
Corpus Analysis with spaCy
EN
This lesson demonstrates how to use the Python library spaCy for analysis of large collections of texts. This lesson details the process of using spaCy to enrich a corpus via lemmatization, part-of-speech tagging, dependency parsing, and named entity recognition. Readers will learn how the linguistic annotations produced by spaCy can be analyzed to help researchers explore meaningful trends in language patterns across a set of texts.
Authors, editors, and contributors
Megan S. Kane
John R Ladd
OCR with Google Vision API and Tesseract
EN
Google Vision and Tesseract are both popular and powerful OCR tools, but they each have their weaknesses. In this lesson, you will learn how to combine the two to make the most of their individual strengths and achieve even more accurate OCR results.
Authors, editors, and contributors
Isabelle Gribomont
Liz Fischer
Creating GUIs in Python for Digital Humanities Projects
EN
In this lesson, you will use Qt Designer and Python to design and implement a simple graphical user interface and application to merge PDF files. This lesson also demonstrates how to package the application for distribution to other personal computers.
Authors, editors, and contributors
Christopher Goodwin
Yann Ryan
Interrogating a National Narrative with GPT-2
EN
In this lesson, you will learn how to apply a Generative Pre-trained Transformer language model to a large-scale corpus so that you can locate broad themes and trends within written text.
Authors, editors, and contributors
Chantal Brousseau
John R Ladd
Tiago Sousa Garcia
Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification (Part 1)
EN
This is the first of a two-part lesson introducing deep learning based computer vision methods for humanities research. Using a dataset of historical newspaper advertisements and the fastai Python library, the lesson walks through the pipeline of training a computer vision model to perform image classification.
Authors, editors, and contributors
Daniel van Strien
Kaspar Beelen
Melvin Wevers
Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification (Part 2)
EN
This is the second of a two-part lesson introducing deep learning based computer vision methods for humanities research. This lesson digs deeper into the details of training a deep learning based computer vision model. It covers some challenges one may face due to the training data used and the importance of choosing an appropriate metric for your model. It presents some methods for evaluating the performance of a model.
Authors, editors, and contributors
Daniel van Strien
Kaspar Beelen
Melvin Wevers
Regression Analysis with Scikit-learn (part 2 - Logistic)
EN
This lesson is the second in a two-part lesson focusing on regression analysis. It provides an overview of logistic regression, how to use Python (Scikit-learn) to make a logistic regression model, and a discussion of interpreting the results of such analysis.
Authors, editors, and contributors
Matthew J Lavin
James Baker
Data Analysis with Python
EN
This course from dariahTeach introduces learners to the theoretical and practical foundations of an analysis of socio-cultural objects using Python through theoretical grounding and hands-on case studies. Students will work through several research use cases using basic machine learning, and employ network analysis to split a small community network into groups and clusters before finally learning more about visualisation and image analysis.
Authors, editors, and contributors
Zarah van Hout
Tobias Blanke
Giovanni Colavizza
Introduction to Programming for NLP with Python
EN
The aim of this virtual course is to offer basic knowledge and skills in programming in Python. Target audiences are undergraduate and graduate students in the Humanities and Social Sciences who want to acquire hands-on knowledge and skills in working with textual data or quantitative data in language and humanities research.
Authors, editors, and contributors
Koenraad De Smedt

Search