With its comfortable, seamless workflow, docWorks has established itself as a global leader in conversion software: The pages of scanned newspaper archives and library holdings are converted, enriched with sustainable METS/ALTO metadata, secured long-term and made available for flexible further use.
From importing the scans to exporting the METS/ALTO or IIIF files, docWorks runs through all the conversion steps (cropping, deskewing, zoning, layout analysis and OCR) in one seamless workflow. This all-in-one application, combined with continuously optimised processes, results in projects that are both cost and time effective.
Multiple file formats
Various import, export and metadata formats are supported. Import formats are TIF, JPG, JP2, GIF, PNG, BMP, CR2 and PDF. In the export you get METS (including both METS physical structural maps and METS logical structural maps) and ALTO XML, image files, IIIF, PDF, PDF/A, custom XML formats (full-text, other), RTF and EPUB. Metadata schemes are MIX, MARC21, MODS, DC.
With over 40 years of success in implementing large and mass digitization projects for renowned libraries and service providers such as The British Library and Digital Divide Data (DDD), our CCS team offers worldwide first-class service and professional support.
Thanks to the efficient and robust conversion with docWorks, you will produce data with high information content for your sustainable, searchable digital archive.
After the scanned print or microform document pages are imported, they undergo cropping and deskewing.
2. Zoning/ Layout Analysis
3. Structure Analysis
The structural analysis includes the identification of the components of the entire publication, such as table of contents, articles, chapters and appendix.
4. Text Recognition (OCR)
In the final step, the data is outputted in METS/ALTO metadata standard format for libraries, saved and is then available for further use.
docWorks is used by innovative, renowned customers around the globe and is the software of choice for many service providers. To date, some 200 million document pages have been successfully processed with docWorks, including collections from 15 national libraries.