Stemming

What is stemming?

XTM Cloud supports stemming for terminology, in which multiple forms of the same word are recognized. In other words, this involves reducing inflected (or sometimes derived) forms of a particular term to their stem, base or root, which is generally a written word form. The current stemming implementation in XTM Cloud is based on Hunspell dictionaries (Hunspell: About).

See the example below:


Guidelines

To activate the option in the XTM UI, you need to have a user with the Administrator role. Go to: Configuration → Settings → Translation → Terminology → Terminology optionsHighlight all term variants.

Once activated, if there is an occurrence of an inflected/derived form of a particular term in the source file in XTM Workbench, and its root form already exists in the terminology base, it will be highlighted:

Root form in the terminology base
Inflected forms highlighted in XTM Workbench

Another case worth noting concerns hyphenated compound words. For example: color-coded. Consider the following example:

There is an English (USA) term color which we want to translate into an English (UK) variant: colour. For this purpose, we create a relevant entry in the XTM Cloud Terminology module:

  • Term: color English (USA);

  • Translation: colour English (UK).

  1. When a project with the term color-coded is created, and the Highlight all term variants option (stemming) is disabled, the word color will not be highlighted in blue (signifying an available translation) since XTM treats hyphenated compounds as one “word”, so in this case it would be a single different word.

  2. When a project with the term color-coded is created, and the Highlight all term variants option (stemming) is enabled, the word color will be highlighted in blue (signifying an available translation) since XTM will correctly recognize a common stemmed form for both words, which is color.

See a couple of example sentences in XTM Workbench containing the word color in various combinations when stemming is enabled.