The Spin - for Linguistic Technology

A range of hints and tips for linguistic work

Apr 29, 2021 8:54:24 PM

Brushing Up Translations with CAT Tools

Project managers might find the most value from computer-aided or -assisted translation (CAT) tools during the preparation phase, when seeing the stored translations automatically applied to the new project they are working on. However, CAT tools' major benefit is in the main translation and review process.

Read More
Mar 31, 2021 7:18:41 PM

Exact Match Entries in Translation Memory

Finding out what your customer's expectations are is always an important first step towards success. As a translator, you should check with your customer or project manager if updating the translations of exact or context match entries is requested, or not.

Read More
Mar 1, 2021 7:44:18 AM

Traditional Approach to PDF

To translate PDF files, try to use the source IDML file when available. In InDesign, INDD files can be saved as an IDML file which can be processed in computer-assisted translation (CAT) tools. This way, all the formatting can safely be handled while working on the translation in the CAT tool. According to memoQ, their tool can even import INDD files directly.

Today's CAT tools also have direct PDF file import features, but the resulting docx file (Microsoft Word format that is generated internally by the CAT tool) will contain many extra line breaks. These line breaks need to be removed so translators can efficiently work on the document. Otherwise, the translator will need to repeatedly use the join segment feature while translating in the CAT tool. 

So, if no source INDD file is available, or if Microsoft Word format is preferred as an intermediate file for some reason, here is a traditional, minimum pre-processing for translators.

  1. Open the PDF in Adobe Reader. Select all and copy. This will store the PDF content as rich text format although some formatting can be lost.
  2. Open Microsoft Word and paste the clipboard content. At this point, the Word file contains many unwanted line breaks. 
  3. Open Advanced Find and Replace in Word, and select the Use Wildcards option.
  4. Find [^l^13]([!A-Z]) and replace with a whitespace + \1. See below for the screenshot.
    Line break replacement in Word
  5. Go through the document by clicking the Find Next button, and click the Replace button if the line break is not needed. 
There can be more formatting issues, such as extra page breaks, repeated header and footer texts, and inappropriate 2-column paragraphs. However, the approach explained above can be a starting point to make the body of the document translatable.

 

Read More