With its comfortable, seamless workflow, docWizz has established itself as a global leader in conversion software: The pages of scanned newspaper archives and library holdings are converted, enriched with sustainable METS/ALTO metadata, secured long-term and made available for flexible further use.
From importing the scans to exporting the METS/ALTO or IIIF files, docWizz runs through all the conversion steps (cropping, deskewing, zoning, layout analysis and OCR) in one seamless workflow. This all-in-one application, combined with continuously optimised processes, results in projects that are both cost and time effective.
docWizz proven layout analysis is now optional enhanced by machine-learning. This automated step gets visibly more precise results that significantly reduce the amount of any additional manual work. For an even more precise layout analysis, we can use individual training data to tailor the analysis to your specific project materials.
The flexible machine-learning-supported layout analysis allows docWizz to process any publication type and layout format. A broad choice of OCR Engines give access to a huge variety of languages and writing systems. docWizz scales easily between projects of a few thousand to many millions of pages.
Multiple file formats
Various import, export and metadata formats are supported. Import formats are TIF, JPG, JP2, GIF, PNG, BMP, CR2 and PDF. In the export you get METS (including both METS physical structural maps and METS logical structural maps) and ALTO XML, image files, IIIF, PDF, PDF/A, custom XML formats (full-text, other), RTF and EPUB. Metadata schemes are MIX, MARC21, MODS, DC.
With over 40 years of success in implementing large and mass digitization projects for renowned libraries and service providers such as The British Library and Digital Divide Data (DDD), our CCS team offers worldwide first-class service and professional support.
Thanks to the efficient and robust conversion with docWizz, you will produce data with high information content for your sustainable, searchable digital archive.
After the scanned print or microform document pages are imported, they undergo cropping and deskewing.
2. Zoning/ Layout Analysis
3. Structure Analysis
The structural analysis includes the identification of the components of the entire publication, such as table of contents, articles, chapters and appendix.
4. Text Recognition (OCR)
From the set of supported OCR systems, docWizz will automatically select the best engine based on language, font, and zone information.
In the final step, the data is outputted in METS/ALTO metadata standard format for libraries, saved and is then available for further use.
docWizz is used by innovative, renowned customers around the globe and is the software of choice for many service providers. To date, some 200 million document pages have been successfully processed with docWizz, including collections from 15 national libraries.