Why are AppStream metainfo files XML data?

n7ipb's picture

This is a question raised quite quite often, the last time in a blogpost by Thomas, so I thought it is a good idea to give a slightly longer explanation (and also create an article to link to…).

There are basically three reasons for using XML as the default format for metainfo files:

1. XML is easily forward/backward compatible, while YAML is not

This is a matter of extending the AppStream metainfo files with new entries, or adapt existing entries to new needs.

Take this example XML line for defining an icon for an application:

<icon type="cached">foobar.png</icon>

and now the equivalent YAML:

Icons:
cached: foobar.png

Now consider we want to add a width and height property to the icons, because we started to allow more than one icon size. Easy for the XML:

<icon type="cached" width="128" height="128">foobar.png</icon>

This line of XML can be read correctly by both old parsers, which will just see the icon as before without reading the size information, and new parsers, which can make use of the additional information if they want. The change is both forward and backward compatible.

This looks differently with the YAML file. The “foobar.png” is a string-type, and parsers will expect a string as value for the cached key, while we would need a dictionary there to include the additional width/height information:

Icons:
cached: name: foobar.png
width: 128
height: 128

The change shown above will break existing parsers though. Of course, we could add a cached2 key, but that would require people to write two entries, to keep compatibility with older parsers:

Icons:
cached: foobar.png
cached2: name: foobar.png
width: 128
height: 128

Less than ideal.

While there are ways to break compatibility in XML documents too, as well as ways to design YAML documents in a way which minimizes the risk of breaking compatibility later, keeping the format future-proof is far easier with XML compared to YAML (and sometimes simply not possible with YAML documents). This makes XML a good choice for this usecase, since we can not do transitions with thousands of independent upstream projects easily, and need to care about backwards compatibility.

2. Translating YAML is not much fun

A property of AppStream metainfo files is that they can be easily translated into multiple languages. For that, tools like intltool and itstool exist to aid with translating XML using Gettext files. This can be done at project build-time, keeping a clean, minimal XML file, or before, storing the translated strings directly in the XML document. Generally, YAML files can be translated too. Take the following example (shamelessly copied from Dolphin):

<summary>File Manager</summary>
<summary xml:lang="bs">Upravitelj datoteka</summary>
<summary xml:lang="cs">Správce souborů</summary>
<summary xml:lang="da">Filhåndtering</summary>

This would become something like this in YAML:

Summary:
C: File Manager
sk: Správca súborov
sl: Upravljalnik datotek
sr-ijekavian: Менаџер фајлова

(different languages there, I know ^^ – but his is just an example)

Looks manageable, right? Now, AppStream also covers long descriptions, where individual paragraphs can be translated by the translators. This looks like this in XML:

<description>
  <p>Dolphin is a lightweight file manager. It has been designed with ease of use and simplicity in mind, while still allowing flexibility and customisation. This means that you can do your file management exactly the way you want to do it.</p>
  <p xml:lang="de">Dolphin ist ein schlankes Programm zur Dateiverwaltung. Es wurde mit dem Ziel entwickelt, einfach in der Anwendung, dabei aber auch flexibel und anpassungsfähig zu sein. Sie können daher Ihre Dateiverwaltungsaufgaben genau nach Ihren Bedürfnissen ausführen.</p>
  <p>Features:</p>
  <p xml:lang="de">Funktionen:</p>
  <p xml:lang="es">Características:</p>
  <ul>
    <li>Navigation (or breadcrumb) bar for URLs, allowing you to quickly navigate through the hierarchy of files and folders.</li>
    <li xml:lang="de">Navigationsleiste für Adressen (auch editierbar), mit der Sie schnell durch die Hierarchie der Dateien und Ordner navigieren können.</li>
    <li xml:lang="es">barra de navegación (o de ruta completa) para URL que permite navegar rápidamente a través de la jerarquía de archivos y carpetas.</li>
    <li>Supports several different kinds of view styles and properties and allows you to configure the view exactly how you want it.</li>
    ....
  </ul>
</description>

Now, how would you represent this in YAML? Since we need to preserve the paragraph and enumeration markup somehow, and creating a large chain of YAML dictionaries is not really a sane option, the only choices would be:

  • Embed the HTML markup in the file, and risk non-careful translators breaking the markup by e.g. not closing tags.
  • Use Markdown, and risk people not writing the markup correctly when translating a really long string in Gettext.

In both cases, we would loose the ability to translate individual paragraphs, which also means that as soon as the developer changes the original text in YAML, translators would need to translate the whole bunch again, which is inconvenient.

On top of that, there are no tools to translate YAML properly that I am aware of, so we would need to write those too.

3. Allowing XML and YAML makes a confusing story and adds complexity

While adding YAML as a format would not be too hard, given that we already support it for DEP-11 distro metadata (Debian uses this), it would make the business of creating metainfo files more confusing. At time, we have a clear story: Write the XML, store it in /usr/share/metainfo, use standard tools to translate the translatable entries. Adding YAML to the mix adds an additional choice that needs to be supported for eternity and also has the problems mentioned above.

I wanted to add YAML as format for AppStream, and we discussed this at the hackfest as well, but in the end I think it isn’t worth the pain of supporting it for upstream projects (remember, someone needs to maintain the parsers and specification too and keep XML and YAML in sync and updated). Don’t get me wrong, I love YAML, but for translated metadata which needs a guarantee on format stability it is not the ideal choice.

So yeah, XML isn’t fun to write by hand. But for this case, XML is a good choice.

See original: Planet KDE Why are AppStream metainfo files XML data?