With its comfortable, seamless workflow, CCS docWorks has established itself as a global leader in conversion software: The pages of scanned newspaper archives and library holdings are converted, enriched with sustainable METS/ALTO metadata, secured long-term and made available for flexible further use.
From importing the scans to exporting the METS/ALTO or IIIF files, CCS docWorks runs through all the conversion steps (cropping, deskewing, zoning, layout analysis and OCR) in one seamless workflow. This all-in-one application, combined with continuously optimised processes, results in projects that are both cost and time effective.
CCS docWorks’ proven layout analysis is now optional enhanced by machine-learning. This automated step gets visibly more precise results that significantly reduce the amount of any additional manual work. For an even more precise layout analysis, we can use individual training data to tailor the analysis to your specific project materials.
The flexible machine-learning-supported layout analysis allows CCS docWorks to process any publication type and layout format. A broad choice of OCR Engines give access to a huge variety of languages and writing systems. CCS docWorks scales easily between projects of a few thousand to many millions of pages.
Multiple file formats
Various import, export and metadata formats are supported. Import formats are TIF, JPG, JP2, GIF, PNG, BMP, CR2 and PDF. In the export you get METS (including both METS physical structural maps and METS logical structural maps) and ALTO XML, image files, IIIF, PDF, PDF/A, custom XML formats (full-text, other), RTF and EPUB. Metadata schemes are MIX, MARC21, MODS, DC.
With over 40 years of success in implementing large and mass digitization projects for renowned libraries and service providers such as The British Library and Digital Divide Data (DDD), our CCS team offers worldwide first-class service and professional support.
Thanks to the efficient and robust conversion with CCS docWorks, you will produce data with high information content for your sustainable, searchable digital archive.
After the scanned print or microform document pages are imported, they undergo cropping and deskewing.
2. Zoning/ Layout Analysis
3. Structure Analysis
The structural analysis includes the identification of the components of the entire publication, such as table of contents, articles, chapters and appendix.
4. Text Recognition (OCR)
From the set of supported OCR systems, CCS docWorks will automatically select the best engine based on language, font, and zone information.
In the final step, the data is outputted in METS/ALTO metadata standard format for libraries, saved and is then available for further use.
CCS docWorks is used by innovative, renowned customers around the globe and is the software of choice for many service providers. To date, some 200 million document pages have been successfully processed with CCS docWorks, including collections from 15 national libraries.