docWorks is the world leading digitization and conversion software to preserve and store newspaper archives and library holdings for future accessibility and use in digital libraries.

One Seamless Workflow

From importing scans to exporting metadata-rich METS/ALTO files, you handle all necessary conversion steps in one seamless workflow. docWorks ensures fast processing, data transfers between conversion steps are obsolete.

Machine Learning

docWorks offers the option to adapt your projects to machine learning based layout analysis. Accuracy and precision in the layout analysis process are increased noticeably, especially when working with custom trained models.

Premium Service

Specialized in OCR and large scale digitization projects for over 40 years, and serving the world’s most renowned libraries and content providers such as The British Library, our experienced team offers premium service and maintenance, worldwide.

Sustainable Data for Long-term Storage

docWorks is the no. 1 software to convert valuable print holdings into professional digital libraries. This process consists of two steps: the digitization, i.e. the scanning of the printed page, and the conversion, i.e. the recognition of all contained text, image, layout and structural information.


The docWorks Conversion Process

  • Image Processing
  • Layout Analysis
  • Text Recognition
  • Structure Capture

The software provides layout analysis and offers multiple OCR engines to handle any type of publication, language or writing system.

docWorks supports a wide range of import and export formats as well as metadata schemes. Import formats are TIF, JPG, JP2, GIF and PDF and you can export (in one step) METS and ALTO XML, image files, PDF, PDF/A-1, full-text XML, RTF and EPUB. Metadata schemes are MIX, MARC, MODS, DC, METS physical structural maps and METS logical structural maps.

Thanks to the thorough conversion with docWorks you receive sustainable data for your searchable and metadata-rich digital collection.

docWorks is used by the world’s most innovative libraries and content providers and is also the preferred software tool of many service providers. To date, more than 150 million book and newspaper pages have been successfully processed with docWorks, including collections from 15 national libraries.