Blocks Localization
This is another instance of using the LLM to produce translation of natural strings with an embedded DSL, similarly to the Documentation Translation guide.
MakeCode uses a microformat to define the shape of coding blocks. When translating the format strings, it is critical to converse the properties of the blocks, such as the number of arguments, their types, and the order of the arguments.
Don’t break the blocks!
The localization strings for the buzzer library are:
{ "jacdac.BuzzerCmd.PlayNote": "Play a note at the given frequency and volume.", "jacdac.BuzzerCmd.PlayTone": "Play a PWM tone with given period and duty for given duration.\nThe duty is scaled down with `volume` register.\nTo play tone at frequency `F` Hz and volume `V` (in `0..1`) you will want\nto send `P = 1000000 / F` and `D = P * V / 2`.\n* ```\nconst [period, duty, duration] = jdunpack<[number, number, number]>(buf, \"u16 u16 u16\")\n```", "jacdac.BuzzerReg.Volume": "Read-write ratio u0.8 (uint8_t). The volume (duty cycle) of the buzzer.\n* ```\nconst [volume] = jdunpack<[number]>(buf, \"u0.8\")\n```", "modules.BuzzerClient.playTone|block": "play %music tone|at %note|for %duration", "{id:category}Jacdac": "Jacdac", "{id:category}Modules": "Modules", "{id:group}Music": "Music"}
For example, the string for the Jacdac buzzer play tone block
contains reference to variables (%music
) that should be maintained in the translated string.
{ ... "modules.BuzzerClient.playTone|block": "play %music tone|at %note|for %duration", ...}
and Bing Translate gives us the following translation
%Musikton|bei %Note|für %Dauer abspielen
As one can see, bing translated the %variable
name which will break the block definition.
The GenAIScript translation is correct.
spiele %music Ton|bei %note|für %duration
If you look closely in the script source, you will find guidance in the prompt to properly handle the variables.
$`...- Every variable name is prefixed with a '%' or a '$', like %foo or $bar.- Do NOT translate variable names....`
Custom data format
Another challenge with translations is that the localized string often
contain escaped characters that break formats like JSON or YAML.
Therefore, we use a custom simple key=value
format
to encode the strings, to avoid encoding issues.
We use the defFileMerge
feature to convert the parse key-value file, and merge them with the existing translations.
// register a callback to custom merge filesdefFileMerge((filename, label, before, generated) => { if (!filename.endsWith("-strings.json")) return undefined
// load existing translatins const olds = JSON.parse(before || "{}")
// parse out key-value lines into a JavaScript record object const news = generated .split(/\n/g) .map(line => /^([^=]+)=(.+)$/.exec(line)) .filter(m => !!m) .reduce((o, m) => { const [, key, value] = m // assign o[key] = value return o }, {})
// merge new translations with olds ones Object.assign(olds, news)
// return stringified json return JSON.stringify(olds, null, 2)})
Parameterization for Automation
The language code langCode
is pulled from variables env.vars
or defaulted to de
.
const langCode = env.vars.lang || "de"
This technique allows to reconfigure these variables from the command line
using the --vars lang=fr
argument.
Script
The full script is show below.
script({ title: "MakeCode Blocks Localization", description: "Translate block strings that define blocks in MakeCode", group: "MakeCode", temperature: 0,})
// language parameterizationconst langCode = (env.vars.lang || "de") + ""
// given a language code, refer to the full name to help the LLMconst langName = { fr: "French", "es-ES": "Spanish", de: "German", sr: "Serbian", vi: "Vietnamese", it: "Italian",}[langCode]if (!langName) cancel("unknown language")
// assume we've been pointed at the .json fileconst file = env.files[0]if (!file) cancel("no strings file found")
const { filename, content } = fileconst dir = path.dirname(filename)
// read the stings, which are stored as a JSON recordconst strings = JSON.parse(content)
// find the existing translation and remove existing translationsconst trfn = path.join(dir, langCode, path.basename(filename))const translated = await workspace.readJSON(trfn)if (translated) for (const k of Object.keys(strings)) if (translated[k]) delete strings[k]
// shortcut: all translation is doneif (Object.keys(strings).length === 0) cancel(`no strings to translate`)
// use simple .env format key=value formatconst contentToTranslate = Object.entries(strings) .map(([k, v]) => `${k}=${v.replace(/(\.|\n).*/s, ".").trim()}`) .join("\n")
// the prompt engineering piece$`## Role
You are an expert at Computer Science education.You are an expert TypeScript coder.You are an expert at Microsoft MakeCode.You are an expert ${langName} translator.
## Task
Translate the content of ORIGINAL to ${langName} (lang-iso '${langCode}').The ORIGINAL files are formatted with one key and localized value pair per line as follows.
\`\`\`key1=en value1key2=en value2...\`\`\`
Write the translation to file ${trfn} formatted with one key and localized value pair per line as follows (DO NOT use JSON).
\`\`\` file="${trfn}"key1=${langCode} value1key2=${langCode} value2...\`\`\`
## Recommendations
- DO NOT translate the keys- DO translate the values to ${langName} (lang-iso '${langCode}')- DO NOT use foul language.
### Block Strings
The value for keys ending with "|block" are MakeCode block strings (https://makecode.com/defining-blocks)and should be translated following these rules:
- Every variable name is prefixed with a '%' or a '$', like %foo or $bar.- Do NOT translate variable names.- Some variable names have a value, like '%foo=toggleOnOff'. The value should be NOT translated.- All variables in the original string should be in the translated string.- Make sure to translate '\\%' to '\\%' and '\\$' to '\\$' if they are not variables.- Event string starts with 'on', like 'on pressed'. Interpret 'on' as 'when' when, like 'when pressed', when translating.- The translations of "...|block" string should be short.
`
// add to prompt contextdef( "ORIGINAL", { filename, content: contentToTranslate, }, { language: "txt" })
// merge the translations with the old one and marshal yaml to jsondefFileMerge((filename, label, before, generated) => { if (!filename.endsWith("-strings.json")) return undefined
// existing translatins const olds = JSON.parse(before || "{}")
// parse out kv const news = generated .split(/\n/g) .map((line) => /^([^=]+)=(.+)$/.exec(line)) .filter((m) => !!m) .reduce((o, m) => { const [, key, value] = m // assign o[key] = value return o }, {})
// merge new translations with olds ones Object.assign(olds, news)
// return stringified json return JSON.stringify(olds, null, 2)})
The result from this script can be inspected in this pull request.