this is for holding javascript data
Richard Smith-Unna deleted Interpreting PDFs.md
almost 10 years ago
Commit id: 2258e5cb092721c8a84682c6105e50e03f77bd92
deletions | additions
diff --git a/Interpreting PDFs.md b/Interpreting PDFs.md
deleted file mode 100644
index c9d9dd9..0000000
--- a/Interpreting PDFs.md
+++ /dev/null
...
PDFs provide three streams:
1. characters with code points or their glyphs
2. paths (lines and curves)
3. pixel images
We use PDFBox from Apache (pdfbox.org) which provides these, but most STM publishers do not use Unicode fonts, and it is formally impossible to identify many character. We use a per-journal lookup which is constructed by expert classification. There is often some difficulty in identifying the pixel images and they may be layered with character codes or paths. We translate characters and paths to SVG which is an excellent intermediate format. We generally keep the images as PNGs as the SVG representation is verbose.
diff --git a/layout.md b/layout.md
index 4409a22..4df8cf1 100644
--- a/layout.md
+++ b/layout.md
...
intro.md
background.md
overview.md
Interpreting PDFs.md
Interpreting pixel maps.md
Reconstructing Objects.md
figures/ami-diagram_figure1/ami-diagram_figure1.png