loading page

A novel architecture for knowledge mining from digitised document libraries
  • +4
  • Luca Malinverno,
  • Alessio Tugnoli,
  • Andrea Ficini,
  • Barbara Elvira Ventura,
  • Matteo Kirolos Beshara,
  • Flavio Fergonzi,
  • Francesco Ghisoni
Luca Malinverno
Porini SRL

Corresponding Author:[email protected]

Author Profile
Alessio Tugnoli
Porini SRL
Author Profile
Andrea Ficini
Scuola Normale Superiore
Author Profile
Barbara Elvira Ventura
Porini SRL
Author Profile
Matteo Kirolos Beshara
Porini SRL
Author Profile
Flavio Fergonzi
Scuola Normale Superiore
Author Profile
Francesco Ghisoni
Porini SRL
Author Profile

Abstract

This paper examines a novel knowledge mining architecture based on the Azure cloud data and AI services, to extract data from the Emporium library, a modern art journal published between 1985 and 1964. The knowledge mining starts with Optical Character Recognition (OCR) and custom Name Entity Recognition (NER) on digitised images of the pages and provide the final user with an user-friendly search portal to navigates the hundreds of pages in milliseconds through a semantic query. The study proved how this architecture fits from an art scholar’s perspective and how it enables to build more comprehensive statistics and description of the document corpus.