Skip to content

Blocks Localization

This is another instance of using the LLM to produce translation of natural strings with an embedded DSL, similarly to the Documentation Translation guide.

MakeCode uses a microformat to define the shape of coding blocks. When translating the format strings, it is critical to converse the properties of the blocks, such as the number of arguments, their types, and the order of the arguments.

Don’t break the blocks!

The localization strings for the buzzer library are:

jacdac-buzzer-strings.json
{
"jacdac.BuzzerCmd.PlayNote": "Play a note at the given frequency and volume.",
"jacdac.BuzzerCmd.PlayTone": "Play a PWM tone with given period and duty for given duration.\nThe duty is scaled down with `volume` register.\nTo play tone at frequency `F` Hz and volume `V` (in `0..1`) you will want\nto send `P = 1000000 / F` and `D = P * V / 2`.\n* ```\nconst [period, duty, duration] = jdunpack<[number, number, number]>(buf, \"u16 u16 u16\")\n```",
"jacdac.BuzzerReg.Volume": "Read-write ratio u0.8 (uint8_t). The volume (duty cycle) of the buzzer.\n* ```\nconst [volume] = jdunpack<[number]>(buf, \"u0.8\")\n```",
"modules.BuzzerClient.playTone|block": "play %music tone|at %note|for %duration",
"{id:category}Jacdac": "Jacdac",
"{id:category}Modules": "Modules",
"{id:group}Music": "Music"
}

For example, the string for the Jacdac buzzer play tone block contains reference to variables (%music) that should be maintained in the translated string.

{
...
"modules.BuzzerClient.playTone|block":
"play %music tone|at %note|for %duration",
...
}

and Bing Translate gives us the following translation

Bing Translator
%Musikton|bei %Note|für %Dauer abspielen

As one can see, bing translated the %variable name which will break the block definition.

The GenAIScript translation is correct.

GenAIScript
spiele %music Ton|bei %note|für %duration

If you look closely in the script source, you will find guidance in the prompt to properly handle the variables.

block-translator.genai.js
$`...
- Every variable name is prefixed with a '%' or a '$', like %foo or $bar.
- Do NOT translate variable names.
...
`

Custom data format

Another challenge with translations is that the localized string often contain escaped characters that break formats like JSON or YAML. Therefore, we use a custom simple key=value format to encode the strings, to avoid encoding issues. We use the defFileMerge feature to convert the parse key-value file, and merge them with the existing translations.

block-translator.genai.js
// register a callback to custom merge files
defFileMerge((filename, label, before, generated) => {
if (!filename.endsWith("-strings.json")) return undefined
// load existing translatins
const olds = JSON.parse(before || "{}")
// parse out key-value lines into a JavaScript record object
const news = generated
.split(/\n/g)
.map(line => /^([^=]+)=(.+)$/.exec(line))
.filter(m => !!m)
.reduce((o, m) => {
const [, key, value] = m
// assign
o[key] = value
return o
}, {})
// merge new translations with olds ones
Object.assign(olds, news)
// return stringified json
return JSON.stringify(olds, null, 2)
})

Parameterization for Automation

The language code langCode is pulled from variables env.vars or defaulted to de.

const langCode = env.vars.lang || "de"

This technique allows to reconfigure these variables from the command line using the --vars lang=fr argument.

Script

The full script is show below.

block-translator.genai.js
script({
title: "MakeCode Blocks Localization",
description: "Translate block strings that define blocks in MakeCode",
group: "MakeCode",
temperature: 0,
})
// language parameterization
const langCode = (env.vars.lang || "de") + ""
// given a language code, refer to the full name to help the LLM
const langName = {
fr: "French",
"es-ES": "Spanish",
de: "German",
sr: "Serbian",
vi: "Vietnamese",
it: "Italian",
}[langCode]
if (!langName) cancel("unknown language")
// assume we've been pointed at the .json file
const file = env.files[0]
if (!file) cancel("no strings file found")
const { filename, content } = file
const dir = path.dirname(filename)
// read the stings, which are stored as a JSON record
const strings = JSON.parse(content)
// find the existing translation and remove existing translations
const trfn = path.join(dir, langCode, path.basename(filename))
const translated = parsers.JSON5(await workspace.readText(trfn))
if (translated)
for (const k of Object.keys(strings)) if (translated[k]) delete strings[k]
// shortcut: all translation is done
if (Object.keys(strings).length === 0) cancel(`no strings to translate`)
// use simple .env format key=value format
const contentToTranslate = Object.entries(strings)
.map(([k, v]) => `${k}=${v.replace(/(\.|\n).*/s, ".").trim()}`)
.join("\n")
// the prompt engineering piece
$`
## Role
You are an expert at Computer Science education.
You are an expert TypeScript coder.
You are an expert at Microsoft MakeCode.
You are an expert ${langName} translator.
## Task
Translate the content of ORIGINAL to ${langName} (lang-iso '${langCode}').
The ORIGINAL files are formatted with one key and localized value pair per line as follows.
\`\`\`
key1=en value1
key2=en value2
...
\`\`\`
Write the translation to file ${trfn} formatted with one key and localized value pair per line as follows (DO NOT use JSON).
\`\`\` file="${trfn}"
key1=${langCode} value1
key2=${langCode} value2
...
\`\`\`
## Recommendations
- DO NOT translate the keys
- DO translate the values to ${langName} (lang-iso '${langCode}')
- DO NOT use foul language.
### Block Strings
The value for keys ending with "|block" are MakeCode block strings (https://makecode.com/defining-blocks)
and should be translated following these rules:
- Every variable name is prefixed with a '%' or a '$', like %foo or $bar.
- Do NOT translate variable names.
- Some variable names have a value, like '%foo=toggleOnOff'. The value should be NOT translated.
- All variables in the original string should be in the translated string.
- Make sure to translate '\\%' to '\\%' and '\\$' to '\\$' if they are not variables.
- Event string starts with 'on', like 'on pressed'. Interpret 'on' as 'when' when, like 'when pressed', when translating.
- The translations of "...|block" string should be short.
`
// add to prompt context
def(
"ORIGINAL",
{
filename,
content: contentToTranslate,
},
{ language: "txt" }
)
// merge the translations with the old one and marshal yaml to json
defFileMerge((filename, label, before, generated) => {
if (!filename.endsWith("-strings.json")) return undefined
// existing translatins
const olds = JSON.parse(before || "{}")
// parse out kv
const news = generated
.split(/\n/g)
.map(line => /^([^=]+)=(.+)$/.exec(line))
.filter(m => !!m)
.reduce((o, m) => {
const [, key, value] = m
// assign
o[key] = value
return o
}, {})
// merge new translations with olds ones
Object.assign(olds, news)
// return stringified json
return JSON.stringify(olds, null, 2)
})

The result from this script can be inspected in this pull request.