Translating XML/Epub using DeepL & ChatGPT

I developed two ruby gems last year:

natsukantou for XML translation
epub-translator for Epub translation

Originally they started out as one simple script to call DeepL API. However gradually my use cases grew, and the script became very unmaintainable, with lots of duplicated code and similar method names. I needed something flexible, so I restarted from scratch.

My goal was to make it very customizable to the user:

Swappable components:
- translation engine
- plugin, e.g. glossary substitution
Configuration of components at different level:
- Static config which is reusable
- Overridable at call time
User friendly

A typical UNIX console program can be be configurable by providing many flags, but I wonder how to represent swappable components with those flags.

What can be friendlier than command line flags? The middleware pattern came into mind. Rails developers should be familiar with this due to Rack, basically it allows stacking of components, which can be filters or different translation engines. The ruby gem middleware provides exactly this framework, so my configuration file ends up looking like this:

Middleware::Builder.new do
  use Natsukantou::Glossary, filepath: 'file.tsv'
  use Natsukantou::OtherPlugin, foo: 'bar'
  use Natsukantou::DeepL, auth_key: "123", host: 'http://example.com'
end

The XML DOM will be accessible in an hash called env, which will be passed through each stack. The “Glossary” would substitute terms in the DOM, then passing env down to “OtherPlugin”, which would do its own thing and finally passing env to “DeepL” for translation. All the static configurations such as API keys can be specified on each stack.

The benefit is that I could persist this configuration as a .rb file, then reuse it everytime I want to translate. It also can be used from other Ruby programs directly.

Plugins in action

I want to showcase one available plugin: HandleRubyMarkup. No, this “Ruby” is not referring to the Ruby programming language, but instead the HTML Ruby Markup often used in annotating Japanese text. It is often desirable to remove this in XMLs before translation, because each individual characters in a term can be tokenized and marked individually by separate XML tags, and DeepL is not smart enough to handle that. This plugin flattens the text and removes the annotation.

Wizard Yard

The middleware configuration file is flexible, but would still be a big ask for someone who does not know ruby. Can we improve it further?

I decided to create a wizard which would create this config file. A few ruby gems offers this, and I decided to use tty-prompt. Ideally the wizard asks the user to select desired components, and then for each of them input the parameters.

At first, I thought I could just access Ruby’s method object and use its parameters, e.g. method(:initialize).parameters. However soon I realized that this is not enough. A good wizard needs to provide explanatory text, and tell the user what type of input is expected. The Ruby method parameter declaration do not encode these information. So where should I write those?

RBS type signature could be one possibility, but YARD comments is the most comprehensive. They are easier to write, and also offers API to parse YARD comments (though not as convenient as I had wished). In the end the parser can obtain parameter name, type, description, and whether they are optional, mandatory or with default value. This means if anyone is to implement a plugin or translation engine, they must write Yard comments for the wizard to work.

Finally, I used erb template for generating the middleware configuration file.

XML and Epub

Originally I choose Oga for XML process, thinking it is a better candidate to run on Windows than Nokogiri. This turned out to not matter because the epub-maker gem I used relies on Nokogiri (but its partner epub-parser is gem agnostic). Anyways I did encounter & fix a bug on node manipulation, contribution FTW.

How to use

To run it as a command:

$ epub-translator [EPUB_FILE]

The wizard would trigger and guide you through setting up a translator configuration.

Wizard

Then it asks you how and what to translate. Enter the language codes (e.g. “en”), and select chapters to translate (by default all are selected).

Wizard

During the process, the wizard will ask you whether you want to save the configuration for later reuse. If you choose yes, the config will be saved as translator_config.rb file.

You will be able to use this config later by using the -c flag:

$ epub-translator [EPUB_FILE] -c translator_config.rb

Interlace

Wizard

Interlacing translations is a very useful feature, especially for proof checking the machine translation. I wrote an interlace method on Oga::XML::Document, which loops through nodes from two documents, adding nodes from one document to the other. During this I also set the HTML lang attribute. This is so the CSS to style different languages differently.

I made it into a separate command, to adhere to the UNIX philosophy. Theoretically, we could give it three files and ask it to interlace them together. I wonder if there is such a need.

ChatGPT and the future

ChatGPT is the hype, but I didn’t think much about it because what I just assumed it could not handle XML, and I was wrong. Last week, I discovered that the following prompt works:

Translate the XML #{text} from #{env.lang_from.code} to #{env.lang_to.code}, maintaining XML structure.

I actually don’t know how well it works in different cases, but FOMO! I quickly implemented the prompt. Thanks to the existing architecture, writing this only took half a day.

One problem which I still can’t resolve is to make the glossary plugin work. My current glossary plugin works by replacing terms from the source language to the target language, and then wrapping it with a tag. DeepL can then be configured to skip those. ChatGPT however, is very inconsistent. Sometimes my modified request would work on the first try, [sometime I need to ask it again](https://twitter.com/lulalala_it/status/1633115456138534914). If you are a prompt guru and knows how to resolve this, please enlighten me.

I also haven’t tackled the “overridable config at call time” yet. Currently configuration are static.

All in all, currently DeepL does everything ChatGPT does better, and offers more functionalities. Treat ChatGPT as a toy for now.

Anyways, thanks for reading, and I wish you will find epub-translator and natsukantou useful. Happy translating.