DIRECTOR
MORE
Marçal RusiñolRESEARCH TEAM
Dimosthenis Karatzas, Josep Lladós Canet and Ernest Valveny Llobet, Universidad Autónoma de Barcelona; Lluís Gómez Bigordà, Centro de Visión por Computador.
COLLABORATING INSTITUTIONS
DESCRIPTION
The goal is to apply the latest approaches in deep learning to digital newspaper archives for the first time, enhancing the value of the press as a historical repository.
The information contained in digital newspaper archives is of huge cultural, historical and anthropological value, as it can be used to help understand the past. In Spain, digital newspaper archives contain thousands of headlines resulting from millions of pages of digitalized historic press now accessible via the Internet. Normally, the digital publications are PDF files resulting from OCR processes, which enable words to be searched for within the text of the publication. However, this search system has certain limitations.
This project will offer solutions for unblocking the semantic content – both text and images – to facilitate searches and provide advanced data visualization techniques to boost universal access to the humanistic and cultural knowledge offered by digital newspaper archives.
The current state of the art allows use of natural language processing tools and automatic computerized display to analyze images and text, providing a semantic description of their content. The research will focus on the latest approaches in deep learning, applied to a context of the historic press for the first time. The result from the project will be a platform for processing and analyzing the textual and visual information contained in digital newspaper archives.
This processing will enable semantic searches to be performed, a step beyond simple keyword searches, and allow advanced visualization of the content in digital newspaper archives.