Authorea

Richard Smith-Unna added overview.md almost 10 years ago

Commit id: f4ed26732480ffa0704c2065538fb58ebbcee6b8

deletions | additions

## Overview We have developed image and vector processing technology which can reconstruct semantic data from a wide range of diagrams. Users may start with PDF documents, PNG or JPEG diagrams, or other sources of vectors (Word or Powerpoint EWF, Postscript, etc.). AMI is a work-in-progress being deployed to alpha-testers especially in chemistry and phylogentics. The overall process is: * dissect and restructure PDFs and extract images. * transform raw images into SVG. * associate SVG with extracted captions to add semantics and classification. * from the SVG primitives build domain-independent mid-level graphics objects (boxes, circles, grids, annotations, symbols, etc.) * use domain-specific heuristics from the classification to create high-level semantic objects (x-y plots, molecular structures, phylogenetic trees, maps, etc.)