docWorks

With its comfortable, seamless workflow, docWorks has established itself as a global leader in conversion software: The pages of scanned newspaper archives and library holdings are converted, enriched with sustainable METS/ALTO metadata, secured long-term and made available for flexible further use.

Seamless Workflow

From importing the scans to exporting the METS/ALTO or IIIF files, docWorks runs through all the conversion steps (cropping, deskewing, zoning, layout analysis and OCR) in one seamless workflow. This all-in-one application, combined with continuously optimised processes, results in projects that are both cost and time effective.

Artificial Intelligence

docWorks’ proven layout analysis is now optional enhanced by machine-learning. This automated step gets visibly more precise results that significantly reduce the amount of any additional manual work. For an even more precise layout analysis, we can use individual training data to tailor the analysis to your specific project materials.

Universally applicable

The flexible machine-learning-supported layout analysis allows docWorks to process any publication type and layout format. A broad choice of OCR Engines give access to a huge variety of languages and writing systems. docWorks scales easily between projects of a few thousand to many millions of pages.

Multiple file formats

Various import, export and metadata formats are supported. Import formats are TIF, JPG, JP2, GIF, PNG, BMP, CR2 and PDF. In the export you get METS (including both METS physical structural maps and METS logical structural maps) and ALTO XML, image files, IIIF, PDF, PDF/A, custom XML formats (full-text, other), RTF and EPUB. Metadata schemes are MIX, MARC21, MODS, DC.

Premium Support

With over 40 years of success in implementing large and mass digitization projects for renowned libraries and service providers such as The British Library and Digital Divide Data (DDD), our CCS team offers worldwide first-class service and professional support.

Thanks to the efficient and robust conversion with docWorks, you will produce data with high information content for your sustainable, searchable digital archive.

1. Import

After the scanned print or microform document pages are imported, they undergo cropping and deskewing.

2. Zoning/ Layout Analysis

Supported by artificial intelligence, the structural elements of a page are identified. E.G Article headings, photos, paragraphs, captions.

3. Structure Analysis

The structural analysis includes the identification of the components of the entire publication, such as table of contents, articles, chapters and appendix.

4. Text Recognition (OCR)

From the set of supported OCR systems, docWorks will automatically select the best engine based on language, font, and zone information.

5. Export

In the final step, the data is outputted in METS/ALTO metadata standard format for libraries, saved and is then available for further use.

docWorks is used by innovative, renowned customers around the globe and is the software of choice for many service providers. To date, some 200 million document pages have been successfully processed with docWorks, including collections from 15 national libraries.