This is an exploration of data- and software-driven approaches to pulling useful data out of semi-structured text in electronic format that may or may not contain OCR errors and other noise. The methods we outline are more accurate and cost-effective in terms of human time than previously existing methods and tools. The resulting data is highly structured and useful for many kinds of down-stream applications.
ThriftBooks sells millions of used books at the lowest everyday prices. We personally assess every book's quality and offer rare, out-of-print treasures. We deliver the joy of reading in recyclable packaging with free standard shipping on US orders over $15. ThriftBooks.com. Read more. Spend less.