• Introduction

    Newspapers collect information about cultural, political and social events in a more detailed way than any other public record. Since their beginnings in the 17th century, they have recorded billions of events, stories and personal names in almost every language and every country daily. The importance of historical newspapers as cultural heritage is well understood. However, the recent progress of dedicated projects such as the H2020 NewsEye project (https://www.newseye.eu/) offer new ways to read and access an in-depth analysis of extensive collections of European historical newspapers. This course is intended to bring such material and tools as a rich and diverse application framework for learning or teaching Digital Humanities (DH).

    The course is divided into four units:

    Unit I: Introduction to historical newspaper and the NewsEye Platform

    This unit introduces digitised historical newspapers and presents the methodological and technical issues of their exploitation for digital history. It also presents the research projects and tools on which the following units are based.

    Unit II: Searching with the NewsEye Platform

    Unit 2 explains in detail how the platform functions and proposes concrete examples of research in a vast corpus of digitised historical newspapers. It also presents the general functioning of a search engine for historical documents and highlights the potential biases of their use in digital humanities.

    Unit III: Information extraction and document understanding

    This unit presents the main methods for extracting information from textual documents. It gives the basic programming required to process documents and provides detailed examples of information extraction, focusing in particular on named entities.

    Unit IV: Practical projects

    This last unit offers concrete examples of different case studies. This unit allows us to mobilise the knowledge acquired in the previous units and reach a concrete result. It first explores the methods for discovering a corpus. It then develops analysis techniques up to the graphical representation of the data.


    Course developers :

    Axel Jean-Caurant
    is a research engineer at the L3i laboratory of the University of La Rochelle. He obtained his PhD in computer science in 2018. The focus of his thesis was on the accessibility of documents inside digital libraries. The large number of online available digital documents has changed how researchers think about information and research. The focus of his work was put on two distinct aspects. First is to understand how researchers use these online platforms and how to train them to understand the changes data has undergone during the digitisation processes. Second, is the study of the impact of the quality of documents on accessibility. Axel is now focusing his work on historical press. Axel is interested in all the potential processing steps, from layout analysis to named entity recognition, to better understand these documents.

    Antoine Doucet has been a full Professor in computer science at the L3i laboratory of the University of La Rochelle since 2014. He is the leader of the research group in document analysis, digital content and images at La Rochelle Université (about 50 people). Until January 2022, he coordinated the H2020 project NewsEye, focusing on augmenting access to historical newspapers across domains and languages. He further led the effort on semantic enrichment for low-resourced languages in the context of the H2020 project Embeddia. His main research interests lie in information retrieval, natural language processing, (text) data mining and artificial intelligence. Antoine Doucet holds a PhD in computer science from the University in Helsinki (Finland) since 2005 and a French research supervision habilitation (HDR) since 2012.

    Nicolas Sidère is an associate professor in computer science at the L3i laboratory of the University of La Rochelle. His main focus is document analysis and understanding. Nicolas holds a PhD in computer science from the University of Tours since 2012.

    Cyrille Suire is an associate professor at the L3i laboratory of the University of La Rochelle. He specialises in digital libraries and document analysis, focusing on understanding user behaviour and developing methods to improve data quality and reliability in digital humanities research. Cyrille has been involved in numerous digital humanities projects, including analysing digitised historical newspapers. This research provides insight into how users interact with digital libraries and how we can make these resources more useful and accessible to researchers. Cyrille holds a PhD in computer science from the University of La Rochelle since 2018.