Word converter

Word converter for WordPerfect

Conversion of thousands of documents to Word 2007, preparation of content and adaptation to the new CI

eWorks has developed software for batch conversion of WordPerfect 6 documents to Microsoft Word 2007 on behalf of a large German service provider.

At the time the order was placed, the client had more than 4,000 WordPerfect documents that needed to be converted to Word 2007. Manual conversion by hand was ruled out for time and budget reasons, as this would have taken a good 20 person-years at around 8 hours per document. To make matters worse, the documents in question were security-relevant and would have required time-consuming and costly verification by an independent expert after conversion. As a solution, eWorks was commissioned to design and develop software for automatic conversion from WordPerfect to MS Word.

Since the creation of the first WordPerfect documents, not only the technology but also the client's corporate design (CD) had changed, so that layout adjustments were necessary in addition to the conversion. The document converter must therefore also modernize the visual appearance of the documents during the conversion process. To achieve this, the developed converter exchanges the document template of the documents, replaces local formats with format templates and inserts new cover pages into the target documents. In this way, the document converter renews the layout of the decades-old documents and automatically (re)gives them conformity with the corporate identity (CI).

Tabellen-KonvertierungIt proved to be a particular challenge not only to convert the documents technically from one format to the other, but also to modernize the content. This requirement resulted from the fact that the source documents dated back to the 1970s and in some cases still originated from "typewriters". The document converter to be developed had to master the task of converting these "typewriter documents", which were up to 35 years old, into modern word processing documents by analyzing the source documents and replacing all "typewriter formatting" with MS Word formatting. For example, by removing countless spaces and tabs and replacing them with "real" paragraph formatting such as indentation, right alignment or centering. By converting from non-proportional typewriter font ("Courier") to proportional font ("Arial" or "Times New Roman"), columns slip beyond recognition, which the converter has to identify and replace with tabs. The biggest challenge was to recognize "typewriter tables" consisting of dashes, spaces, tabs and line feeds ("|---|---|------|"), to analyze their contents, cell structures, connected cells, frame lines and other formatting and to convert them into equivalent MS Word tables. The developed document converter meets these requirements with numerous algorithms and heuristics and automatically brings documents up to 35 years old up to date.

In order to minimize the manual working time required and maximize the degree of automation, the aim was to achieve maximum conversion quality. Both the time required for manual reworking of the result documents and for their validation against the source documents had to be kept to a minimum. To achieve these goals, an "electronic quality assurance" (eQA) was developed and implemented downstream of the conversion process. During eQA, the converted result document is subjected to over 70 different tests that check the structure, content and appearance of the target document against a sophisticated set of rules, write a test protocol and output a rough visual assessment (green, yellow, red). The user of the document converter thus receives an assessment of the conversion quality achieved immediately after completion of the conversion process.

eWorks has designed a document converter for individual or batch processing and developed it as a Microsoft .NET solution. At the end of the development process, there were 46 different conversion functions, around 24,000 lines of source code and a workload of 1-2 person-years. The developed converter has been in daily use since it was handed over to the client.

Used technologies

.NET
C#
Microsoft Office
VBA
XML

Related references

Would you like to
commission a project?

Enquire now!

Are you looking
for a job?

Apply now!