| |
| | Glyph & Cog: Text Extraction (Site not responding. Last check: ) |
 | | A font is a collection of glyphs - Times-Roman, Helvetica, and Courier each have their own glyph for the letter 'A', for example. |
 | | A PDF text extractor converts the code to glyph names, using the font encoding, and then converts to the glyph names to whatever output encoding was requested (ASCII, UTF-8, etc.). |
 | | For example, the glyph names often contain the original character codes: the ASCII code for 'T' is 84, and the font subset might use 'p84' as the glyph name for this character. |
| www.glyphandcog.com /textext.html (551 words) |
|