TextBase TM | Multicorpora Website

The TextBase TM is very different from conventional translation memories, although it benefits from compatibility with this type of memory.

The main differences are:

Fully automated alignment
Contextual, multilingual and multidirectional TextBase TM
More repetitions
Multilingual search engine

For more details, see the TextBase TM versus Traditional TM Comparison Table.

A conventional translation memory divides your document in out-of-context sentences, and then tries to align them. It can be compared to an Excel worksheet, where, for instance, in column A, you have the English, and in column B the French sentences. It is very tedious to check that each sentence or sentence group is perfectly aligned and, because the documents are cut in individual sentences, you lose the context.

The TextBase TM indexes your integral documents in a parallel fashion. Since it does not divide your document into individual sentences, it can easily align paragraphs and sentences nearly perfectly without any human intervention. Furthermore, the context is preserved up to the document level, and you can easily realign a sentence on-the-fly, in the rare cases of misalignment. Note that TextBase TM can be multilingual and multidirectional, eliminating the need for exponentially duplicating bilingual memories.

This difference in philosophy means that, on top of being able to preserve the context, the TextBase TM approach enables you to rapidly build massive memories of legacy translations. There is no need for the costly manual verification of the alignments before they can be used productively.

All this means that you can create a much larger translation memory, more rapidly. On most systems, the TextBase TM can be created at the astonishing rate of 6-10 million words per hour! This means that instead of being limited to the size of a conventional TM, which often remains small because of the effort it takes to build, you can now index all of your legacy documentation. Having a larger pool of reliable, previously translated data to compare against means that you will find a lot more repetition. This increases quality, terminology cohesion, and productivity.

Since it is based on full texts, the TextBase TM approach also enables enhanced data mining when comparing a document against the TM. Since the TM is not segmented, mining can take place on full paragraphs. Instead of assembling a paragraph to be translated from disparate sentences that do not flow together, the TextBase TM will identifies exact paragraph matches, then returns the full paragraph as a single retrieved translation segment.

TextBase TM also identifies and replaces full and fuzzy sentences, like any conventional TM system. It is designed to go beyond the segment, to proactively identify sub-segment. This means that those segments that fall below the fuzzy factor, which are ignored by most conventional TM systems, get identified by MultiTrans. In other words, you can get a lot more repetitions, more cohesiveness and more productivity gains.