Differences between revisions 7 and 8
Revision 7 as of 2009-02-27 02:49:50
Size: 7973
Editor: GreyCat
Comment: note on LANG and LC_ALL variables
Revision 8 as of 2009-02-27 20:49:30
Size: 10035
Editor: GreyCat
Comment: rewrite, almost completely ... still need to look into this security warning
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
Looking for examples how to add simple localization to your bash scripts, and how to do testing, this is probably what you need ...

{{{
# Tue Feb 24 19:59:40 CET 2009 <jelledejong@powercraft.nl> <http://www.tuxcrafter.net/>
}}}

{{{
# localization:
http://pgas.freeshell.org/mirror/ABSlocalization.html
http://www.linuxtopia.org/online_books/advanced_bash_scripting_guide/localization.html
}}}

{{{
# gettext
http://www.gnu.org/software/gettext/manual/html_node/index.html
}}}

{{{
# step 1: update your strings

# before translation
function version()
{
   echo "Usage: $0 --help"
   echo "Version: $version"
   echo "Author: $author"
   echo "Donation: $donation"
}

# after translation
function version()
{
   echo $"Usage: $0 --help"
   echo $"Version: $version"
   echo $"Author: $author"
   echo $"Donation: $donation"
}
}}}

{{{
# step 2: create some sort of structure for your source
mkdir --parent --verbose locale/nl/LC_MESSAGES/
mkdir --parent --verbose lang
}}}

{{{
# step 3: check the strings
bash -D pct-scanner-script
}}}

{{{
# step 4: dump po strings to your directory
bash --dump-po-strings pct-scanner-script > lang/nl.pot
}}}
Looking for examples how to add simple localization to your bash scripts, and how to do testing, this is probably what you need....

<<TableOfContents>>

=== First, some variables you must understand ===

Before we can even begin, we have to understand all the [[http://www.gnu.org/software/hello/manual/gettext/Locale-Environment-Variables.html|locale environment variables]]. This is fundamental, and extremely under-documented in the places where people actually look for documentation (man pages, etc.). Some of these variables may not apply to your system, because there seem to be various competing standards and extensions....

On recent GNU systems, the variables are used in this order:

 1. If LANG is C or POSIX, this overrides everything.
 1. Otherwise, if LANGUAGE is set, use that.
 1. Otherwise, if LC_ALL is set, use that.
 1. Otherwise, use LANG.

That means, you first have to check your current environment to see which of these, if any, are already set. If they are set, and you don't know about them, they may interfere with your testing, leaving you befuddled.

{{{
$ env | egrep 'LC|LANG'
LANG=en_US.UTF-8
LANGUAGE=en_US:en_GB:en
}}}

Here's an example from a Debian system. In this case, the `LANGUAGE` variable is set, which means any testing we do that involves changing `LANG` or `LC_ALL` is likely to fail, unless we also change `LANGUAGE`. Now here's another example from another Debian system:

{{{
$ env | egrep 'LC|LANG'
LANG=en_US.utf8
}}}

In that case, setting `LANG` or `LC_ALL` would actually work. A user using that system, then writing a document on how to perform localization testing, might create instructions that would fail to work for the user on the first system....

So, go ahead and play around with your own system and see what works and what doesn't. You may not have a `LANGUAGE` variable at all (especially if you are not on GNU/Linux), so setting it may do nothing for you. You may have to "generate locales" on your operating system (a process which is beyond the scope of this page, but which on Debian consists of running `dpkg-reconfigure locales` and answering questions) in order to make them work.

Try to get to the point where you can produce error messages in at least two languages:
{{{
$ wc -q
wc: invalid option -- 'q'
Try `wc --help' for more information.
$ LANGUAGE=es_ES wc -q
wc: opción inválida -- q
Pruebe `wc --help' para más información.
}}}

Once you can do that reliably, you can begin the actual work of producing a bash script with localisation.

=== Marking strings as translatable ===

This is the simplest part, at least to understand. Any string in `$"..."` is translated using the system's native language support (NLS) facilities. Find all the constant strings in your program that you want to translate, and mark them accordingly. Don't mark strings that contain variables or other substitutions. For example,

{{{#!nl numbers=off
#!/bin/bash
echo $"Hello, world"
}}}

(As you can see, we're starting with ''very'' simple material here.)

If you wanted to translate strings that contain variable substitutions, the cleanest way to do that is to use `printf` with placeholders:

{{{
printf $"The answer is %s\n" "$answer"
# Instead of: echo $"The answer is $answer"
}}}

The commented-out version would fail to be translated, as the answer (and therefore the contents of `$"..."`) changes each time.

=== Generating and/or merging PO files ===

Next, generate what are called a "PO files" from your program. These contain the strings we've marked, and their translations (which we'll fill in later).

We start by creating a `*.pot` file, which is a template.

{{{
bash --dump-po-strings hello > hello.pot
}}}

This produces output which looks like:

{{{#!nl numbers=off
#: hello:5
msgid "Hello, world"
msgstr ""
}}}

The name of your file (without the `.pot` extension) is called the ''domain'' of your translatable text. A ''domain'' in this context is similar to a package name. For example, the GNU `coreutils` package contains lots of little programs, but they're all distributed together; and so it makes sense for all their translations to be together as well. In our example, we're using a domain of `hello`. In a larger example containing lots of programs in a suite, we'd probably use the name of the whole suite.

This template will be copied once for each language we want to support. Let's suppose we wanted to support Spanish and French translations of our program. We'll be creating two PO files (one for each translation), so let's make two subdirectories, and copy the template into each one:

{{{
mkdir es fr
cp hello.pot es/hello.po
cp hello.pot fr/hello.po
}}}

This is what we do the ''first'' time through. If there were already some partially- or even fully-translated PO files in place, we wouldn't want to overwrite them. Instead, we would ''merge'' the new translatable material into the old PO file. We use a special tool for that called `msgmerge`. Let's suppose we add some more code (and translatable strings) to our program:

{{{
vi hello
bash --dump-po-strings hello > hello.pot
msgmerge --update es/hello.po hello.pot
msgmerge --update fr/hello.po hello.pot
}}}

The original author of this page created some notes which I am leaving intact here. Maybe they'll be helpful...?
Line 79: Line 129:
{{{
# step 6: create .mo file
msgfmt -o locale/nl/LC_MESSAGES/pct-scanner-script.mo lang/pct-scanner-script-nl.po
msgfmt -o locale/nl/LC_MESSAGES/pct-scanner-script-process.mo lang/pct-scanner-script-process-nl.po
}}}

{{{
# step 7: move single .mo file to system locales
sudo cp --verbose locale/nl/LC_MESSAGES/pct-scanner-script.mo /usr/share/locale/nl/LC_MESSAGES/
sudo chown root:root /usr/share/locale/nl/LC_MESSAGES/pct-scanner-script.mo
sudo chmod 644 /usr/share/locale/nl/LC_MESSAGES/pct-scanner-script.mo
sudo ls -hal /usr/share/locale/nl/LC_MESSAGES/pct-scanner-script.mo
}}}

{{{
# step 8: move all .mo files to system locales
sudo cp --verbose --recursive locale/* /usr/share/locale/
sudo chown root:root /usr/share/locale/nl/LC_MESSAGES/pct-scanner-*.mo
sudo chmod 644 /usr/share/locale/nl/LC_MESSAGES/pct-scanner-*.mo
sudo ls -hal /usr/share/locale/nl/LC_MESSAGES/pct-scanner-*.mo
}}}

{{{
# step 9: add the following to you bash script under #!/bin/bash
TEXTDOMAINDIR=/usr/share/locale
TEXTDOMAIN=pct-scanner-script
}}}

{{{
# step 10: start testing

LANG=nl_NL
bash pct-scanner-script --version

LANG=en_GB
bash pct-scanner-script --version
}}}

{{{
------------------------------------------------------------------------
}}}

{{{
# sudo dpkg-reconfigure locales

export LANG=en_GB
export TEXTDOMAINDIR=/usr/share/locale
export TEXTDOMAIN=pct-scanner-script
echo $"lineart: yes"
echo $"color: yes"
echo $"source: $SOURCE"
echo $"testing"

LANG=nl_NL
TEXTDOMAINDIR=/usr/share/locale
TEXTDOMAIN=pct-scanner-script
echo $"lineart: yes"
echo $"color: yes"
echo $"source: $SOURCE"
echo $"testing"
}}}

{{{
------------------------------------------------------------------------
}}}

{{{
jelle@debian-eeepc:~/packages-checkout/source/pct-scanner-scripts/pct-scanner-scripts-devel$ LANG=nl_NL
jelle@debian-eeepc:~/packages-checkout/source/pct-scanner-scripts/pct-scanner-scripts-devel$ bash pct-scanner-script --version
Gebruik: pct-scanner-script --help
Versie: 0.0.5
Maker: Jelle de Jong <jelledejong@powercraft.nl>
Donatie: http://www.tuxcrafter.net/pages/contact.html#paypal-donation
jelle@debian-eeepc:~/packages-checkout/source/pct-scanner-scripts/pct-scanner-scripts-devel$
jelle@debian-eeepc:~/packages-checkout/source/pct-scanner-scripts/pct-scanner-scripts-devel$ LANG=en_GB
jelle@debian-eeepc:~/packages-checkout/source/pct-scanner-scripts/pct-scanner-scripts-devel$ bash pct-scanner-script --version
Usage: pct-scanner-script --help
Version: 0.0.5
Author: Jelle de Jong <jelledejong@powercraft.nl>
Donation: http://www.tuxcrafter.net/pages/contact.html#paypal-donation
jelle@debian-eeepc:~/packages-checkout/source/pct-scanner-scripts/pct-scanner-scripts-devel$
}}}

{{{
------------------------------------------------------------------------
}}}

{{{
jelle@debian-eeepc:~$ export LANG=en_GB
jelle@debian-eeepc:~$ export TEXTDOMAINDIR=/usr/share/locale
jelle@debian-eeepc:~$ export TEXTDOMAIN=pct-scanner-script
jelle@debian-eeepc:~$ echo $"lineart: yes"
lineart: yes
jelle@debian-eeepc:~$ echo $"color: yes"
color: yes
jelle@debian-eeepc:~$ echo $"source: $SOURCE"
source:
jelle@debian-eeepc:~$ echo $"testing"
testing
jelle@debian-eeepc:~$
jelle@debian-eeepc:~$ LANG=nl_NL
jelle@debian-eeepc:~$ TEXTDOMAINDIR=/usr/share/locale
jelle@debian-eeepc:~$ TEXTDOMAIN=pct-scanner-script
jelle@debian-eeepc:~$ echo $"lineart: yes"
lineart: ja
jelle@debian-eeepc:~$ echo $"color: yes"
kleur: ja
jelle@debian-eeepc:~$ echo $"source: $SOURCE"
bron:
jelle@debian-eeepc:~$ echo $"testing"
dit is een test
jelle@debian-eeepc:~$
}}}

----

Note, systems get ''really bloody pissy'' about the exact settings of `LANG` and `LC_ALL` variables. It's hard as hell to figure out what you need to set things to, to make things work. I recommend brute force:

{{{
$ msgunfmt grep.mo
...
msgid "out of memory"
msgstr "memoria agotada"

$ LANG=es_ES TEXTDOMAIN=grep gettext -s "out of memory"
out of memory

$ LANG=es_ES LC_ALL=es_ES TEXTDOMAIN=grep gettext -s "out of memory"
out of memory

$ LANG=es_ES LC_ALL=es_ES gettext -d grep -s "out of memory"
out of memory

$ LANG=es_ES LC_ALL=es_ES gettext -s grep "out of memory"
grep out of memory

$ LANG=es_ES LC_ALL=es_ES gettext grep "out of memory"
out of memory

$ LANG=es_ES grep -dfsdf
grep: unknown directories method

$ LANG=es_ES LC_ALL=es_ES grep -z
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.

$ LANG=es_ES LC_ALL=es_ES bash -c 'grep -z'
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.

$ sudo dpkg-reconfigure locales
[sudo] password for greg:
Generating locales (this might take a while)...
  en_US.ISO-8859-1... done
  en_US.ISO-8859-15... done
  en_US.UTF-8... done
  es_ES.UTF-8... done
Generation complete.

$ LANG=es_ES LC_ALL=es_ES bash -c 'grep -z'
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.

$ locale -a
C
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
es_ES.utf8
POSIX

$ LANG=es_ES LC_ALL=es_ES.utf8 bash -c 'grep -z'
Modo de empleo: grep [OPCIÓN]... PATRÓN [FICHERO]...
Pruebe `grep --help' para más información.

$ LANG=es_ES LC_ALL=es_ES bash -c 'grep -z'
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.

$ LANG=es_ES LC_ALL=es_ES gettext -d grep -s "out of memory"
out of memory

$ LANG=es_ES LC_ALL=es_ES.utf8 gettext -d grep -s "out of memory"
memoria agotada
}}}

Holy fucking hallelujah, ''finally'' it worked! Christ!
=== Translate the strings ===

This is a step which is 100% human labor. Edit each language's PO file and fill in the blanks.

{{{#!nl numbers=off
#: hello:5
msgid "Hello, world"
msgstr "Hola el mundo"

#: hello:6
msgid "How are you?"
msgstr ""
}}}

=== Install MO files ===

Your operating system, if it has gotten you this far, probably already has some localized programs, with translation catalogs installed in some location such as `/usr/share/locale` (or elsewhere). If you want your translations to be installed there as well, you'll have to have superuser privileges, and you'll have to manage your translation ''domain'' (namespace) in such a way as to avoid collision with any OS packages.

If you're going to use the standard system location for your translations, then you only need to worry about making one change to your program: setting the `TEXTDOMAIN` variable.

{{{#!nl numbers=off
#!/bin/bash
TEXTDOMAIN=hello

echo $"Hello, world"
echo $"How are you?"
}}}

This tells bash and the system libraries which MO file to use, from the standard location. If you're going to use a nonstandard location, then you have to set that as well, in a variable called `TEXTDOMAINDIR`:

{{{#!nl numbers=off
#!/bin/bash
TEXTDOMAINDIR=/usr/local/share/locale
TEXTDOMAIN=hello

echo $"Hello, world"
echo $"How are you?"
}}}

Use one of these two depending on your needs.

Now, an MO file is essentially a compiled PO file. A program called `msgfmt` is responsible for this compilation. We just have to tell it where the PO file is, and where to write the MO file.

{{{
msgfmt -o /usr/share/locale/es/LC_MESSAGES/hello.mo es/hello.po
msgfmt -o /usr/share/locale/fr/LC_MESSAGES/hello.mo fr/hello.po

or

mkdir -p /usr/local/share/locale/{es,fr}/LC_MESSAGES
msgfmt -o /usr/local/share/locale/es/LC_MESSAGES/hello.mo es/hello.po
msgfmt -o /usr/local/share/locale/fr/LC_MESSAGES/hello.mo fr/hello.po
}}}

(If we had more than two translations to support, we might choose to mimic the structure of `/usr/share/locale` in order to facilitate mass-copying of MO files from the local directory to the operating system's repository. This is left as an exercise.)

=== Test! ===

Remember what we said earlier about setting locale environment variables... the examples here may or may not work for your system.

The `gettext` program can be used to retrieve individual translations from the catalog:

{{{
$ LANGUAGE=es_ES gettext -d hello -s "Hello, world"
Hola el mundo
}}}

Any untranslated strings will be left alone:

{{{
$ LANGUAGE=es_ES gettext -d hello -s "How are you?"
How are you?
}}}

And, finally, there is no substitute for actually running the program itself:

{{{
wooledg@wooledg:~$ LANGUAGE=es_ES ./hello
Hola el mundo
How are you?
}}}

As you can see, there's still some more translation to be done for our example. Back to work....

=== References ===
 * [[http://pgas.freeshell.org/mirror/ABSlocalization.html|Original ABS appendix showing $"..." syntax]]
 * [[http://www.linuxtopia.org/online_books/advanced_bash_scripting_guide/localization.html|New ABS appendix showing rubbish]]
 * [[http://www.gnu.org/software/hello/manual/gettext/Locale-Environment-Variables.html|Locale environment variables (GNU)]]
 * [[http://www.gnu.org/software/gettext/manual/html_node/index.html|GNU gettext manual top node]]
  * [[http://www.gnu.org/software/gettext/manual/html_node/bash.html|Warning about security issues...?]]

How to add localization support to your bash scripts

Looking for examples how to add simple localization to your bash scripts, and how to do testing, this is probably what you need....

First, some variables you must understand

Before we can even begin, we have to understand all the locale environment variables. This is fundamental, and extremely under-documented in the places where people actually look for documentation (man pages, etc.). Some of these variables may not apply to your system, because there seem to be various competing standards and extensions....

On recent GNU systems, the variables are used in this order:

  1. If LANG is C or POSIX, this overrides everything.
  2. Otherwise, if LANGUAGE is set, use that.
  3. Otherwise, if LC_ALL is set, use that.
  4. Otherwise, use LANG.

That means, you first have to check your current environment to see which of these, if any, are already set. If they are set, and you don't know about them, they may interfere with your testing, leaving you befuddled.

$ env | egrep 'LC|LANG'
LANG=en_US.UTF-8
LANGUAGE=en_US:en_GB:en

Here's an example from a Debian system. In this case, the LANGUAGE variable is set, which means any testing we do that involves changing LANG or LC_ALL is likely to fail, unless we also change LANGUAGE. Now here's another example from another Debian system:

$ env | egrep 'LC|LANG'
LANG=en_US.utf8

In that case, setting LANG or LC_ALL would actually work. A user using that system, then writing a document on how to perform localization testing, might create instructions that would fail to work for the user on the first system....

So, go ahead and play around with your own system and see what works and what doesn't. You may not have a LANGUAGE variable at all (especially if you are not on GNU/Linux), so setting it may do nothing for you. You may have to "generate locales" on your operating system (a process which is beyond the scope of this page, but which on Debian consists of running dpkg-reconfigure locales and answering questions) in order to make them work.

Try to get to the point where you can produce error messages in at least two languages:

$ wc -q
wc: invalid option -- 'q'
Try `wc --help' for more information.
$ LANGUAGE=es_ES wc -q
wc: opción inválida -- q
Pruebe `wc --help' para más información.

Once you can do that reliably, you can begin the actual work of producing a bash script with localisation.

Marking strings as translatable

This is the simplest part, at least to understand. Any string in $"..." is translated using the system's native language support (NLS) facilities. Find all the constant strings in your program that you want to translate, and mark them accordingly. Don't mark strings that contain variables or other substitutions. For example,

#!/bin/bash
echo $"Hello, world"

(As you can see, we're starting with very simple material here.)

If you wanted to translate strings that contain variable substitutions, the cleanest way to do that is to use printf with placeholders:

printf $"The answer is %s\n" "$answer"
# Instead of: echo $"The answer is $answer"

The commented-out version would fail to be translated, as the answer (and therefore the contents of $"...") changes each time.

Generating and/or merging PO files

Next, generate what are called a "PO files" from your program. These contain the strings we've marked, and their translations (which we'll fill in later).

We start by creating a *.pot file, which is a template.

bash --dump-po-strings hello > hello.pot

This produces output which looks like:

#: hello:5
msgid "Hello, world"
msgstr ""

The name of your file (without the .pot extension) is called the domain of your translatable text. A domain in this context is similar to a package name. For example, the GNU coreutils package contains lots of little programs, but they're all distributed together; and so it makes sense for all their translations to be together as well. In our example, we're using a domain of hello. In a larger example containing lots of programs in a suite, we'd probably use the name of the whole suite.

This template will be copied once for each language we want to support. Let's suppose we wanted to support Spanish and French translations of our program. We'll be creating two PO files (one for each translation), so let's make two subdirectories, and copy the template into each one:

mkdir es fr
cp hello.pot es/hello.po
cp hello.pot fr/hello.po

This is what we do the first time through. If there were already some partially- or even fully-translated PO files in place, we wouldn't want to overwrite them. Instead, we would merge the new translatable material into the old PO file. We use a special tool for that called msgmerge. Let's suppose we add some more code (and translatable strings) to our program:

vi hello
bash --dump-po-strings hello > hello.pot
msgmerge --update es/hello.po hello.pot
msgmerge --update fr/hello.po hello.pot

The original author of this page created some notes which I am leaving intact here. Maybe they'll be helpful...?

# step 5: try to merge existing po with new updates
# remove duplicated strings by hand or with sed or something else
# awk '/^msgid/&&!seen[$0]++;!/^msgid/' lang/nl.pot > lang/nl.pot.new
msgmerge lang/nl.po lang/nl.pot

# step 5.1: try to merge existing po with new updates
cp --verbose lang/pct-scanner-script-nl.po lang/pct-scanner-script-nl.po.old
awk '/^msgid/&&!seen[$0]++;!/^msgid/' lang/pct-scanner-script-nl.pot > lang/pct-scanner-script-nl.pot.new
msgmerge lang/pct-scanner-script-nl.po.old lang/pct-scanner-script-nl.pot.new > lang/pct-scanner-script-nl.po

# step 5.2: try to merge existing po with new updates
touch lang/pct-scanner-script-process-nl.po lang/pct-scanner-script-process-nl.po.old
awk '/^msgid/&&!seen[$0]++;!/^msgid/' lang/pct-scanner-script-process-nl.pot > lang/pct-scanner-script-process-nl.pot.new
msgmerge lang/pct-scanner-script-process-nl.po.old lang/pct-scanner-script-process-nl.pot.new > lang/pct-scanner-script-process-nl.po

Translate the strings

This is a step which is 100% human labor. Edit each language's PO file and fill in the blanks.

#: hello:5
msgid "Hello, world"
msgstr "Hola el mundo"

#: hello:6
msgid "How are you?"
msgstr ""

Install MO files

Your operating system, if it has gotten you this far, probably already has some localized programs, with translation catalogs installed in some location such as /usr/share/locale (or elsewhere). If you want your translations to be installed there as well, you'll have to have superuser privileges, and you'll have to manage your translation domain (namespace) in such a way as to avoid collision with any OS packages.

If you're going to use the standard system location for your translations, then you only need to worry about making one change to your program: setting the TEXTDOMAIN variable.

#!/bin/bash
TEXTDOMAIN=hello

echo $"Hello, world"
echo $"How are you?"

This tells bash and the system libraries which MO file to use, from the standard location. If you're going to use a nonstandard location, then you have to set that as well, in a variable called TEXTDOMAINDIR:

#!/bin/bash
TEXTDOMAINDIR=/usr/local/share/locale
TEXTDOMAIN=hello

echo $"Hello, world"
echo $"How are you?"

Use one of these two depending on your needs.

Now, an MO file is essentially a compiled PO file. A program called msgfmt is responsible for this compilation. We just have to tell it where the PO file is, and where to write the MO file.

msgfmt -o /usr/share/locale/es/LC_MESSAGES/hello.mo es/hello.po
msgfmt -o /usr/share/locale/fr/LC_MESSAGES/hello.mo fr/hello.po

or

mkdir -p /usr/local/share/locale/{es,fr}/LC_MESSAGES
msgfmt -o /usr/local/share/locale/es/LC_MESSAGES/hello.mo es/hello.po
msgfmt -o /usr/local/share/locale/fr/LC_MESSAGES/hello.mo fr/hello.po

(If we had more than two translations to support, we might choose to mimic the structure of /usr/share/locale in order to facilitate mass-copying of MO files from the local directory to the operating system's repository. This is left as an exercise.)

Test!

Remember what we said earlier about setting locale environment variables... the examples here may or may not work for your system.

The gettext program can be used to retrieve individual translations from the catalog:

$ LANGUAGE=es_ES gettext -d hello -s "Hello, world"
Hola el mundo

Any untranslated strings will be left alone:

$ LANGUAGE=es_ES gettext -d hello -s "How are you?"
How are you?

And, finally, there is no substitute for actually running the program itself:

wooledg@wooledg:~$ LANGUAGE=es_ES ./hello
Hola el mundo
How are you?

As you can see, there's still some more translation to be done for our example. Back to work....

References

BashFAQ/098 (last edited 2014-05-13 18:34:11 by 93-103-91-248)