Translation Technology at the United Nations

Improving content delivery efficiency for 100 organizations in six official languages

Like the broader language industry, the United Nations (UN) is facing a looming knowledge management crisis. The rapidly growing gap between the demand for high-quality multilingual content and the lag in the supply of language professionals is driving the requirement for technology that can dramatically improve translation turnaround time while maintaining exceptionally high output quality.

Within the UN, the need for translation productivity and language management technology is poised to explode over the next couple of years. This explosion is being driven by two coinciding trends. On the one hand is the growing volume of new, high-quality multilingual content that must be delivered by UN offices, specialized agencies and affiliated organizations. On the other are the lagging supply of experienced language professionals and cost constraints. This rapidly growing gap between the demand and supply of multilingual content delivery capacity will have a profound impact on the abilities of UN organizations to satisfy their mandates.

This article explores the nature and complexity of multilingual content delivery within the UN, the looming language knowledge management crisis and the language technology requirements that are emerging.

A Flood of Content

With 191 member states, 14,000 employees, principal offices in 15 countries and more than 100 agencies and affiliated organizations, the UN system is a very large, complex and distributed multilingual organization that speaks many languages. In addition to the six official working UN languages (Arabic, Chinese, English, French, Russian and Spanish), more than 20 additional languages involve translation services on a regular basis (Dutch, Farsi, German, Greek, Italian, Japanese, Thai, Turkish and others).

A very decentralized model of operation further complicates the UN’s multilingual content management value chain. Content creation and translation are managed within the headquarters (New York), the decentralized offices (Geneva, Vienna and Nairobi) and the agencies and affiliated organizations such as United Nations Educational, Scientific and Cultural Organization (UNESCO), the World Trade Organization, the World Bank, the World Health Organization, the International Monetary Fund, the World Intellectual Property Organization (WIPO) and many others.

The overall volume of content that the UN system creates, which must be translated for delivery in the six official working languages, has increased significantly over the past few years and continues to grow. A good example of abrupt growth in translation requirements is found within WIPO. WIPO serves 179 member states by administering some 23 international treaties dealing with different aspects of intellectual property protection. Additional countries are joining the treaties, and the application volume that must be processed by WIPO is increasing — in some areas by more than 35% annually. In 2003, several new countries, including the United States, adopted the Madrid System for international registration of trademarks, which will result in a very large increase in the volume of trademark applications managed in multiple languages by WIPO.

Another example is UNESCO, which has expanded activity across its theme areas of education, science, culture, communications and information and has a growing set of special focus areas such as the rebuilding of the cultural, educational and scientific infrastructures of Afghanistan and Iraq.

These examples of the growing demand for the delivery of high-quality multilingual content are repeated across most UN agencies. It is also the trend in other multilateral organizations, most notably the European Union (EU). The EU is in the midst of a major expansion of its membership that may result in its number of official working languages increasing from the current 11 to more than 20. This will generate an overwhelming increase in content creation and translation activity for the European Commission Directorate-General for Translation, which is already the largest translation service in the world.

The Human Capital Drain

At the same time that demand for multilingual content is growing rapidly, the UN is facing a wave of staff retirements. The demographic profile of the average translator is that of an aging baby boomer within five to ten years of retirement. The number of young translators entering the profession has been insufficient to compensate for this attrition and satisfy the growing demand.

Also, it is not a simple numbers game of replacing retiring language professionals. Writing and translation are knowledge-intensive activities. Translators must take information and concepts in one language and communicate them in a different language while maintaining the precise meaning, style and tone of the original communication. Achieving these objectives requires much creativity, linguistic skill and deep subject matter expertise. The level of domain expertise required by UN translators and the quality standard for UN content are particularly high due to the political and legal nature of much of the documentation delivered. Those capabilities develop as accumulated knowledge in the heads of experienced translators. When they retire or leave an organization, that valuable knowledge leaves with them. Without an efficient way to capture and transfer that institutional memory — an important enterprise linguistic asset — new professionals must start from scratch, slowly learning and accumulating that same knowledge for themselves. With the coming wave of language professional retirements, many UN organizations will experience a massive loss of knowledge during the next few years. In some organizations this is of crisis proportions: take the actual UN case of one translation unit of approximately 20 translators that will lose about 80% of the existing team to retirement within about seven years.

The Knowledge Management Crisis

Translation quality and turnaround time are of paramount importance in the UN. The flood of content combined with the human capital drain will have a profound negative impact on the ability of UN translation organizations to satisfy those mandates. For the UN and other multilateral organizations, the failure to deliver high-quality content in a timely and cost-effective manner to their multilingual constituents leads to high-level political fallout. The situation is critical, and the opportunity is ripe for technology to bridge the gap, streamline collaboration in the face of supply chain complexity and solve the knowledge retention problem.

The Challenging Environment

The UN translation environment is unique due to its size, organizational complexity and the variety of the types of content that must be translated. This leads to significant challenges when attempting to apply translation technology.

The translation departments within many of the UN agencies are among the largest in the world. Language professionals working on common projects are often distributed geographically, and the level of collaboration across different UN agencies is quite high in terms of sharing linguistic assets, best practices and information on translation technology.

The UN must deal with a large number of languages. Also, in recent years there has been a dramatic increase in the volume of Arabic translation activity due to global security concerns and related UN activity. This has created challenges for UN organizations, both in terms of building high-quality Arabic translation capacity and in providing software tools that effectively support the bidirectional nature of Arabic text.

A large amount of the translation workload in the UN is extremely time-sensitive. During conference sessions, teams of translators work shifts through the night to translate important working papers for delivery first thing the next morning. When resources become stretched and large projects get split across teams of language professionals to improve turnaround, maintaining consistency and quality becomes even a greater challenge. Excessive editing and rework contribute to delivery delays and excessively high communications costs.

Many UN translation units have a relatively high incidence of non-computer users among their professional staff. In those cases, translation is performed mainly by dictation onto audiotape, which is subsequently sent to a typing pool. Translation technologies may still be deployed to enhance this process, but the workflows are obviously different from the situation where a professional translator is interacting directly with a translation workbench on a PC.

The use of voice recognition software is common for those language professionals who do use computers but prefer not to type or are unable to use a keyboard due to repetitive strain injury.

Relatively extensive use of centralized referencing groups is made within translation departments in the UN. The referencing group is responsible for performing background research and preparing reference packages for translators before the job is dispatched. These packages include terminology extracts and samples of relevant previous translations.

The UN generates and maintains huge volumes of translated content representing many millions of words of high-quality previous translations that can provide tremendous value to everyone involved in the multilingual content delivery value chain. Unlike the technical documentation that forms a significant portion of the workload of translation units working within multinational product manufacturing corporations, the content generated by most UN agencies is not solely repetitive at the full sentence level.

To illustrate the last point, an analysis was performed on three years’ worth of content from UN General Assemblies and four years’ worth of resolutions from the Security Council. The analysis was performed with MultiTrans, a corpus-based translation support workbench.

The figure compares the characteristics of UN content against the characteristics of private sector technical documentation. Unlike the technical documentation, a very small fraction (less than 2%) of the text volume of a typical UN document consists of full sentences that have been previously written and translated. Even at the fuzzy match level above 85%, less than 5% can be recycled. However, repetitive sub-sentence expressions account for over 30% of text volume. While the analysis confirms that conventional full-sentence translation memory (TM) has limited applicability for UN content, the relatively high rate of sub-sentence expressions does present an opportunity to unlock the value in the existing linguistic assets of the UN, enabling significant gains in productivity and consistency.

UN Requirements

As UN organizations look to technology and process reengineering to address the growing knowledge management problem, a number of requirements are emerging. Given the variety and complex nature of UN content and workflows, traditional technologies are insufficient. Many UN organizations have investigated and gained experience with a variety of translation support tools. Sentence-level TM systems have proven difficult to implement and not well-suited to the characteristics of most UN documents. Many available translation support tools are stand-alone desktop systems that do not support the real-time access and sharing of centralized linguistic assets by multiple users that enable collaboration and synergy along translation supply chains and across the entire UN system. Another common shortcoming is the lack of scalability to easily handle the huge volumes of multilingual content that form the linguistic assets of the UN.

Working closely with UN organizations on a number of projects over the past two years, MultiCorpora has identified some of the most critical translation support and language management platform requirements.

Broad language support. Obviously, language technologies must be able to efficiently handle all of the languages processed by the UN. This implies Unicode support to handle the text in any of the languages encountered. It also implies that translation technologies should be based purely on statistical algorithms (as opposed to linguistic algorithms that are specific to each language pair) to enable the effective processing of an unlimited number of languages. Also, since it is not practical to manage bilingual content or terminology repositories for every possible language combination, repository structures that manage multiple languages in a single scheme are important.

A centralized collaboration platform. Given the size and distributed nature of UN translation departments, it is important that an integrated linguistic asset management environment enable, over a network or the Internet, several language professionals to search, share, view and update the same multilingual terminology repository or the same aligned multilingual full-text content repository simultaneously in real-time. Also important is the ability to provide intranet, extranet and/or Internet users with easy access via standard Web browser interfaces to the centralized multilingual terminology repositories and the aligned multilingual full-text content repositories. Authors, translators, terminologists, editors, reviewers, content consumers and other members of the multilingual information management value chain realize greater productivity and quality as a result of anywhere, anytime Web-based access to centralized linguistic assets. A Web-based collaboration platform enables the same level of consistency gains for projects that are outsourced to external translation service providers as for those projects that are translated in-house.

The ability to efficiently recycle translation segments of any length from huge bodies of previously translated UN content. Since sub-sentence expressions recur much more frequently than whole sentences in UN content, a critical requirement is the ability to easily create large full-text multilingual repositories that can be searched, in manual or automated modes, at any level: word, expression, sentence and paragraph in full text context. Given the large volume of reference material available, the automatically created full text repository should require no manual alignment validation prior to use.

“Our evaluation demonstrated that we needed a platform that could begin to deliver benefits immediately. Building a sentence-level database from our legacy content would have taken us more than one year,” says Veronica Battikha, Chief of Terminology, Documentation and Reference Unit at UNESCO. “With the unique approach of a full-text multilingual search agent, we have been able to rapidly build large repositories of previously translated content and to automate the referencing process, leading to greater translation reuse and consistency.”

An integrated industrial-strength terminology management system. The terminology management process of UN organizations requires a scalable platform for storing, tracking and sharing comprehensive terminology management information.

An integrated and automated translation support environment. Given the large volumes of terminology and translation reference material managed by the UN, a tightly integrated translation support environment is required where all of the language assets and search functions are available to a translator from within Microsoft Word or other popular editing environments. That support environment should be able to actively automate and aggregate searches and comparisons of new translation projects with all relevant sources of translation examples and terminology and to propose the set of matches that maximizes the suggested reuse of previous translation work. It should also support on-the-fly alignment correction and terminology capture so that the centralized linguistic assets improve continuously with use.

Automated document analysis, reference package creation and workflow management. With the use of centralized referencing, a variety of other workflows, and existing document management and workflow management systems in place, UN organizations require language technologies that can operate in a number of modes and integrate easily with existing systems. An application programming interface is important to allow the document analysis and multilingual text processing operations to be automatically launched by workflow management systems for the automated creation of reference packages and the determination of document routing based on user-defined business rules.

Conclusion

The UN is facing a mounting knowledge management crisis in the area of its multilingual content delivery capability. In order to meet the challenge of delivering increasing quantities of high-quality content in at least six languages, UN organizations are looking to new language technologies to automate processes and efficiently leverage existing linguistic assets.

Several UN organizations have demonstrated that significant gains in productivity and consistency are possible with those technologies. UNESCO’s Battikha says, “We are already experiencing notable improvements now that we are using these technologies, and user acceptance has been extremely high.” Excited by early gains, other UN organizations are rapidly following.