fred-wang · 067a9f4b
--- a/Localization-Proposal.md
+++ b/Localization-Proposal.md
@@ -8,6 +8,7 @@ The code must be able to handle the following:

 * expressions with substitution values (e.g., "file xxx not found")
 * plural forms (e.g. "loaded xx file" versus "loaded xx files")
+* number localization (e.g. "100%" versus "۱۰۰٪")
 * multiple forms for a word (e.g., "Post" as a verb versus "Post" as a noun)
 * HTML-snippets as defined in MathJax (since many dialogs are constructed from these)
 * fallback to English when translations are not available
@@ -117,6 +118,38 @@ Note that if one of the options for the plural forms requires a literal close br
 would produce `One {only}` when the first argument is 1, and `Two {or more}` otherwise.  If a string needs to include a literal string that looks like one of these selectors, the original `%` can be quoted.  So
 `"%%{plural:%%1|A|B}"` would be the literal string `%{plural:%1|A|B}`.

+***fred: The treatment for plurals is to use the value n of the argument %1 (an arbitrary real number) to determine which variation to use. In English, if n=1 then the singular form is used ; for any other values of n the plural form is used. This can be much more complex in other languages and for consistency with other formats and to use something localizers are familiar with, we will follow the [CLDR rules](http://unicode.org/cldr/charts/supplemental/language_plural_rules.html).***
+
+Each language has mnemotechnic terms for plural forms and a way to map n to these terms. For example:
+  * English, n=1 maps to the singular form "one" and other values to the plural form "other".
+  * French also has two plural forms, but the mapping is different, 0 <= n < 2 maps to the singular form "one" and other values to the plural form "other".
+  * Welsh has six plural forms: "zero" (n=0), "one" (n=1), "two", (n=2), "few" (n=3), "many" (n=6), "other".
+  * Polish has four plural forms: "one" (n=1), "few" (n mod 10 is 2, 3 or 4 and n mod 100 is not 12, 13, 14) "many" (n is not 1 and n mod 10 is 0 or 1 or n mod 10 is 5, 6, 7, 8, 9 or n mod 100 is 12, 13, 14) and "other".
+  * and so on...
+
+It is up to the localizers to ensure that all the forms for their languages are specified in the translated strings. The mapping from n to the form index will be implemented in the localization data (see below) of each language. If the index is out-of-range (perhaps because a plural rule was forgotten), the plural rule is ignored so that localizers can realize the mistake. For example, the default is the English mapping
+
+    plural: function(n) {
+      if (n == 1) return 1;
+      return 2;
+    }
+
+while the French and Polish would be
+
+    plural: function(n) {
+      if (0 <= n < 2) return 1;
+      return 2;
+    }
+
+    plural: function(n) {
+      if (n==1) return 1;
+      if (n % 10 >= 2 && n % 10 <= 4 && n%100 < 12 && n%100 > 14) return 2;
+      if ((n % 10 >= 0 && n %10 <= 1) || (n%10 >= 5 && n%10 <= 9) || (n%100 >= 12 && n%100 <= 14)) return 3;
+      return 4;
+    }
+
+Below is Davide's initial proposal
+
 The usual treatment for plurals is that the value after the colon is treated as an index into the array of options separated by vertical bars, and if the index is outside the range of the choices, the last choice is used.  So

    _("om","%{plural:%1|One|Many}",n)
@@ -133,7 +166,17 @@ That is, the specification for the value matches `%(\d+)([+-]\d+)?` as a regular

 Some languages have a more complex means of determining forms.  For instance, Polish has different forms for 1, 2 through 4, 5 through 21, 22 through 24, 25 through 31, and so on  (see the [gnu gettext documentation](http://www.gnu.org/software/gettext/manual/html_mono/gettext.html#Plural-forms) for more examples).  So the plural escape must be more complex for these languages.  One approach would be to allow the language files to provide their own routine that implements the selection of the form.  The routine would be passed the value and the array and would return the proper one.  That way, any special treatment could be done on a language-by-language basis.  Alternatively, there could be data describing the value-to-index transformation needed for the language.

+### Numbers ###
+
+Numbers must be localized in some languages e.g. to use Arabic digits. As for plural forms, the localization data will contain a "number" function to do that conversion. This function will be called by MathJax when doing substitution of numeric arguments. For example for the French localization:

+    number: function (n)
+    {
+       return n.replace(".", ",");
+    }
+
+will allow to use comma instead of digits in number. Then _("sum","%1 + %2 = %3", 5.3, 2.45, 7.75) will be localized into "5,3 + 2,45 = 7,75". See https://github.com/wikimedia/jquery.i18n/blob/master/src/jquery.i18n.language.js#L670 for other languages to consider.
+    
 ### HTML Snippets ###

 A number of the dialogs used in MathJax are defined using [HTML snippets](http://docs.mathjax.org/en/latest/HTML-snippets.html), which allow you to encode an HTML DOM fragment using JavaScript objects.  These can include things like bold and italic indicators, as well as other styling or layout.  While it is possible to break these into pieces to pass to `_()` separately, it may be better to allow the translator to translate the complete snippet, so that styling and layout can be properly adjusted for the target language.  Thus `_()` allows a complete HTML snippet in place of the message string (and will return an HTML snippet rather than a string literal).  E.g.,
@@ -188,8 +231,11 @@ The methods in `MathJax.Localization` include:
 <dt>fontFamily()</dt>
 <dd>Get the font-family needed to display text in the selected language.  Returns <code>null</code> if no special font is required.</dd>

-<dt>plural(n,str)</dt>
-<dd>The method that returns the correct plural form for the value <i>n</i> from an array of strings.  This is the <i>n</i>-th string in the array, if there is one, or the last entry if not.  Individual languages can override this function with one that properly handles the requirements for their plural forms.</dd>
+<dt>plural(n)</dt>
+<dd>The method that returns the correct plural form for the value <i>n</i>. See the [CLDR rules](http://unicode.org/cldr/charts/supplemental/language_plural_rules.html) above.</dd>
+
+<dt>number(n)</dt>
+<dd>The method that returns the localized version of the string <i>n</i> representing a number.</dd>

 </dl>