X Locale Database Definition
1. GeneralAn X Locale Database contains the subset of a user’senvironment that depends on language, in X Window System.It is made up from one or more categories. Each categoryconsists of some classes and sub-classes.It is provided as a plain ASCII text file, so a user canchange its contents easily. It allows a user to customizethe behavior of internationalized portion of Xlib withoutchanging Xlib itself.This document describes;Database Format DefinitionContents of Database in sample implementationSince it is hard to define the set of required informationfor all platforms, only the flexible database format isdefined. The available entries in database areimplementation dependent.2. Database Format DefinitionThe X Locale Database contains one or more categorydefinitions. This section describes the format of eachcategory definition.The category definition consists of one or more classdefinitions. Each class definition has a pair of class nameand class value, or has several subclasses which areenclosed by the left brace ({) and the right brace (}).Comments can be placed by using the number sign character(#). Putting the number sign character on the top of theline indicates that the entire line is comment. Also,putting any whitespace character followed by the number signcharacter indicates that a part of the line (from the numbersign to the end of the line) is comment. A line can becontinued by placing backslash (\) character as the lastcharacter on the line; this continuation character will bediscarded from the input. Comment lines cannot be continuedon a subsequent line using an escaped new line character.X Locale Database only accepts XPCS, the X PortableCharacter Set. The reserved symbols are; the quotationmark("), the number sign (#), the semicolon(;), thebackslash(\), the left brace({) and the right brace(}).The format of category definition is;Elements separated by vertical bar (|) are alternatives.Curly braces ({...}) indicate zero or more repetitions ofthe enclosed elements. Square brackets ([...]) indicatethat the enclosed element is optional. Quotes ("...") areused around literal characters.The backslash, which is not the top character of theNumericString, is recognized as an escape character, so thatthe next one character is treated as a literal character.For example, the two-character sequence, ‘‘\"’’(thebackslash followed by the quotation mark) is recognized andreplaced with a quotation mark character. Any whitespacecharacter, that is not the Delimiter, unquoted andunescaped, is ignored.3. Contents of DatabaseThe available categories and classes depend onimplementation, because different platform will requiredifferent information set. For example, some platform havesystem locale but some platform don’t. Furthermore, theremight be a difference in functionality even if the platformhas system locale.In current sample implementation, categories listed beloware available.4. XLC_FONTSET CategoryThe XLC_FONTSET category defines the XFontSet relativeinformation. It contains theCHARSET_REGISTRY-CHARSET_ENCODING name and character mappingside (GL, GR, etc), and is used in Output Method (OM).fsN Includes an encoding information for Nth charset, whereN is the index number (0,1,2,...). If there are 4charsets available in current locale, 4 fontsets, fs0,fs1, fs2 and fs3, should be defined. This class hastwo subclasses, ‘charset’ and ‘font’.charsetSpecifies an encoding information to be used internallyin Xlib for this fontset. The format of value is;For detail definition ofCHARSET_REGISTRY-CHARSET_ENCODING, refer "X LogicalFont Descriptions" document.example:ISO8859-1:GLfont Specifies a list of encoding information which is usedfor searching appropriate font for this fontset. Theleft most entry has highest priority.5. XLC_XLOCALE CategoryThe XLC_XLOCALE category defines character classification,conversion and other character attributes.encoding_nameSpecifies a codeset name of current locale.mb_cur_maxSpecifies a maximum allowable number of bytes in amulti-byte character. It is corresponding toMB_CUR_MAX of "ISO/IEC 9899:1990 C Language Standard".state_depend_encodingIndicates a current locale is state dependent. Thevalue should be specified "True" or "False".wc_encoding_maskSpecifies a bit-mask for parsing wide-char string.Each wide character is applied bit-and operation withthis bit-mask, then is classified into the uniquecharset, by using ‘wc_encoding’.wc_shift_bitsSpecifies a number of bit to be shifted for convertingfrom a multi-byte character to a wide character, andvice-versa.csN Includes a character set information for Nth charset,where N is the index number (0,1,2,...). If there are4 charsets available in current locale, cs0, cs1, cs2and cs3 should be defined. This class has fivesubclasses, ‘side’, ‘length’, ‘mb_encoding’‘wc_encoding’ and ‘ct_encoding’.side Specifies a mapping side of this charset. The format ofthis value is;The suffix ":Default" can be specified. It indicatesthat a character belongs to the specified side ismapped to this charset in initial state.lengthSpecifies a number of bytes of a multi-byte characterof this charset. It should not contain the length ofany single-shift sequence.mb_encodingSpecifies a list of shift sequence for parsingmulti-byte string. The format of this value is;example:<LSL> \x1b \x28 \x4a; <LSL> \x1b \x28 \x42wc_encodingSpecifies an integer value for parsing wide-charstring. It is used to determine the charset for eachwide character, after applying bit-and operation using‘wc_encoding_mask’. This value should be unique in allcsN classes.ct_encodingSpecifies a list of encoding information that can beused for Compound Text.6. Sample of X Locale DatabaseThe following is sample X Locale Database file.# $Xorg: LocaleDB.ms,v 1.3 2000/08/17 19:42:49 cpqbld Exp $# XLocale Database Sample for ja_JP.euc### XLC_FONTSET category#XLC_FONTSET# fs0 class (7 bit ASCII)fs0 {charset ISO8859-1:GLfont ISO8859-1:GL; JISX0201.1976-0:GL}# fs1 class (Kanji)fs1 {charset JISX0208.1983-0:GLfont JISX0208.1983-0:GL}# fs2 class (Half Kana)fs2 {charset JISX0201.1976-0:GRfont JISX0201.1976-0:GR}# fs3 class (User Defined Character)# fs3 {# charset JISX0212.1990-0:GL# font JISX0212.1990-0:GL# }END XLC_FONTSET## XLC_XLOCALE category#XLC_XLOCALEencoding_name ja.eucmb_cur_max 3state_depend_encoding Falsewc_encoding_mask \x00008080wc_shift_bits 8# cs0 classcs0 {side GL:Defaultlength 1wc_encoding \x00000000ct_encoding ISO8859-1:GL; JISX0201.1976-0:GL}# cs1 classcs1 {side GR:Defaultlength 2wc_encoding \x00008080ct_encoding JISX0208.1983-0:GL; JISX0208.1983-0:GR;\JISX0208.1983-1:GL; JISX0208.1983-1:GR}# cs2 classcs2 {side GRlength 1mb_encoding <SS> \x8ewc_encoding \x00000080ct_encoding JISX0201.1976-0:GR}# cs3 class# cs3 {# side GL# length 2# mb_encoding <SS> \x8f# #if HasWChar32# wc_encoding \x20000000# #else# wc_encoding \x00008000# #endif# ct_encoding JISX0212.1990-0:GL; JISX0212.1990-0:GR# }END XLC_XLOCALE7. Reference[1] ISO/IEC 9899:1990 C Language Standard[2] X Logical Font Descriptions1
Yoshio Horiuchi
IBM Japan
Copyright © IBM Corporation
1994
All Rights Reserved
License to use, copy, modify,
and distribute this software and its documentation for any
purpose and without fee is hereby granted, provided that the
above copyright notice appear in all copies and that both
that copyright notice and this permission notice appear in
supporting documentation, and that the name of IBM not be
used in advertising or publicity pertaining to distribution
of the software without specific, written prior
permission.
IBM DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS, AND NONINFRINGEMENT
OF THIRD PARTY RIGHTS, IN NO EVENT SHALL IBM BE LIABLE FOR
ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH
THE USE OR PERFORMANCE OF THIS SOFTWARE.
Copyright © 1994 X
Consortium
Permission is hereby granted,
free of charge, to any person obtaining a copy of this
software and associated documentation files (the
‘‘Software’’), to deal in the
Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute,
sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and
this permission notice shall be included in all copies or
substantial portions of the Software.
THE SOFTWARE IS PROVIDED
‘‘AS IS’’, WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE X
CONSORTIUM BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Except as contained in this
notice, the name of the X Consortium shall not be used in
advertising or otherwise to promote the sale, use or other
dealings in this Software without prior written
authorization from the X Consortium.
X Window System is a
trademark of The Open Group.
1. GeneralAn X Locale Database contains the subset of a user’senvironment that depends on language, in X Window System.It is made up from one or more categories. Each categoryconsists of some classes and sub-classes.It is provided as a plain ASCII text file, so a user canchange its contents easily. It allows a user to customizethe behavior of internationalized portion of Xlib withoutchanging Xlib itself.This document describes;Database Format DefinitionContents of Database in sample implementationSince it is hard to define the set of required informationfor all platforms, only the flexible database format isdefined. The available entries in database areimplementation dependent.2. Database Format DefinitionThe X Locale Database contains one or more categorydefinitions. This section describes the format of eachcategory definition.The category definition consists of one or more classdefinitions. Each class definition has a pair of class nameand class value, or has several subclasses which areenclosed by the left brace ({) and the right brace (}).Comments can be placed by using the number sign character(#). Putting the number sign character on the top of theline indicates that the entire line is comment. Also,putting any whitespace character followed by the number signcharacter indicates that a part of the line (from the numbersign to the end of the line) is comment. A line can becontinued by placing backslash (\) character as the lastcharacter on the line; this continuation character will bediscarded from the input. Comment lines cannot be continuedon a subsequent line using an escaped new line character.X Locale Database only accepts XPCS, the X PortableCharacter Set. The reserved symbols are; the quotationmark("), the number sign (#), the semicolon(;), thebackslash(\), the left brace({) and the right brace(}).The format of category definition is;Elements separated by vertical bar (|) are alternatives.Curly braces ({...}) indicate zero or more repetitions ofthe enclosed elements. Square brackets ([...]) indicatethat the enclosed element is optional. Quotes ("...") areused around literal characters.The backslash, which is not the top character of theNumericString, is recognized as an escape character, so thatthe next one character is treated as a literal character.For example, the two-character sequence, ‘‘\"’’(thebackslash followed by the quotation mark) is recognized andreplaced with a quotation mark character. Any whitespacecharacter, that is not the Delimiter, unquoted andunescaped, is ignored.3. Contents of DatabaseThe available categories and classes depend onimplementation, because different platform will requiredifferent information set. For example, some platform havesystem locale but some platform don’t. Furthermore, theremight be a difference in functionality even if the platformhas system locale.In current sample implementation, categories listed beloware available.4. XLC_FONTSET CategoryThe XLC_FONTSET category defines the XFontSet relativeinformation. It contains theCHARSET_REGISTRY-CHARSET_ENCODING name and character mappingside (GL, GR, etc), and is used in Output Method (OM).fsN Includes an encoding information for Nth charset, whereN is the index number (0,1,2,...). If there are 4charsets available in current locale, 4 fontsets, fs0,fs1, fs2 and fs3, should be defined. This class hastwo subclasses, ‘charset’ and ‘font’.charsetSpecifies an encoding information to be used internallyin Xlib for this fontset. The format of value is;For detail definition ofCHARSET_REGISTRY-CHARSET_ENCODING, refer "X LogicalFont Descriptions" document.example:ISO8859-1:GLfont Specifies a list of encoding information which is usedfor searching appropriate font for this fontset. Theleft most entry has highest priority.5. XLC_XLOCALE CategoryThe XLC_XLOCALE category defines character classification,conversion and other character attributes.encoding_nameSpecifies a codeset name of current locale.mb_cur_maxSpecifies a maximum allowable number of bytes in amulti-byte character. It is corresponding toMB_CUR_MAX of "ISO/IEC 9899:1990 C Language Standard".state_depend_encodingIndicates a current locale is state dependent. Thevalue should be specified "True" or "False".wc_encoding_maskSpecifies a bit-mask for parsing wide-char string.Each wide character is applied bit-and operation withthis bit-mask, then is classified into the uniquecharset, by using ‘wc_encoding’.wc_shift_bitsSpecifies a number of bit to be shifted for convertingfrom a multi-byte character to a wide character, andvice-versa.csN Includes a character set information for Nth charset,where N is the index number (0,1,2,...). If there are4 charsets available in current locale, cs0, cs1, cs2and cs3 should be defined. This class has fivesubclasses, ‘side’, ‘length’, ‘mb_encoding’‘wc_encoding’ and ‘ct_encoding’.side Specifies a mapping side of this charset. The format ofthis value is;The suffix ":Default" can be specified. It indicatesthat a character belongs to the specified side ismapped to this charset in initial state.lengthSpecifies a number of bytes of a multi-byte characterof this charset. It should not contain the length ofany single-shift sequence.mb_encodingSpecifies a list of shift sequence for parsingmulti-byte string. The format of this value is;example:<LSL> \x1b \x28 \x4a; <LSL> \x1b \x28 \x42wc_encodingSpecifies an integer value for parsing wide-charstring. It is used to determine the charset for eachwide character, after applying bit-and operation using‘wc_encoding_mask’. This value should be unique in allcsN classes.ct_encodingSpecifies a list of encoding information that can beused for Compound Text.6. Sample of X Locale DatabaseThe following is sample X Locale Database file.# $Xorg: LocaleDB.ms,v 1.3 2000/08/17 19:42:49 cpqbld Exp $# XLocale Database Sample for ja_JP.euc### XLC_FONTSET category#XLC_FONTSET# fs0 class (7 bit ASCII)fs0 {charset ISO8859-1:GLfont ISO8859-1:GL; JISX0201.1976-0:GL}# fs1 class (Kanji)fs1 {charset JISX0208.1983-0:GLfont JISX0208.1983-0:GL}# fs2 class (Half Kana)fs2 {charset JISX0201.1976-0:GRfont JISX0201.1976-0:GR}# fs3 class (User Defined Character)# fs3 {# charset JISX0212.1990-0:GL# font JISX0212.1990-0:GL# }END XLC_FONTSET## XLC_XLOCALE category#XLC_XLOCALEencoding_name ja.eucmb_cur_max 3state_depend_encoding Falsewc_encoding_mask \x00008080wc_shift_bits 8# cs0 classcs0 {side GL:Defaultlength 1wc_encoding \x00000000ct_encoding ISO8859-1:GL; JISX0201.1976-0:GL}# cs1 classcs1 {side GR:Defaultlength 2wc_encoding \x00008080ct_encoding JISX0208.1983-0:GL; JISX0208.1983-0:GR;\JISX0208.1983-1:GL; JISX0208.1983-1:GR}# cs2 classcs2 {side GRlength 1mb_encoding <SS> \x8ewc_encoding \x00000080ct_encoding JISX0201.1976-0:GR}# cs3 class# cs3 {# side GL# length 2# mb_encoding <SS> \x8f# #if HasWChar32# wc_encoding \x20000000# #else# wc_encoding \x00008000# #endif# ct_encoding JISX0212.1990-0:GL; JISX0212.1990-0:GR# }END XLC_XLOCALE7. Reference[1] ISO/IEC 9899:1990 C Language Standard[2] X Logical Font Descriptions1