HIS Registry of Languages

Date: 2005-07-30
Abstract:

Documents the Registry of Languages (ROL) for the Harvest Information System (HIS). This registry defines the standardized codes used for identifying the languages spoken in the world.

Document URL: http://www.harvestinformationsystem.info/ROL.htm

The function of the Registry of Languages (ROL) within HIS is to provide standardized codes for identifying the languages spoken in the world today. Rather than devising its own system of codes, HIS simply makes use of the three-letter codes for identifying languages that have already been defined and published as the ISO 639-3 International Standard. Descpriptions of the languages are found in the Ethnologue (now in its 15th edition; see http://www.ethnologue.com). In the HIS context, this code set is named ROL_Language.

Each three-letter code uniquely identifies one of the more than 6,800 living languages documented in the Ethnologue. Any database application that makes use of these language codes is just one click away from access to the full language descriptions that are available on the Ethnologue web site. That is, for any language identifier XXX that may be stored in a database, an application may present a link to the following URL in order to give the user access to the Ethnologue's description of that language:

http://www.ethnologue.com/show_language.asp?code=XXX

The definition used in the Ethnologue for identifying languages is based on a criterion of shared literature. If varieties of speech are similar enough to use the same literature, they are considered the same language; if they would need different literatures, they are considered different languages. Where there is no existing literature, a judgement as to whether literature might be shared is based on the presence of shared identity and shared intelligibility.

The code set (with documentation) may be downloaded from:

http://www.ethnologue.com/codes

Any party is welcome to incorporate the download tables into its own database application on condition that it is done in accordance with SIL's Terms of Use statement.

From the point of view of the HIS registry conventions, the download file contains a code table, a supplementary table, and a change history table as follows:

LanguageCodes.tab   This is the code table for the ROL_Language code set.
LanguageIndex.tab A supplementary table that provides an index into ROL_Language based on alternate language names, dialect names, and countries.
ChangeHistory.tab The change history table.

The other data table in the distribution package, CountryCodes.tab, is superfluous with respect to HIS since the identification of countries is handled in the ROG_Political code set of the Registry of Geography. However, note that the country codes used in LanguageCodes.tab and LanguageIndex.tab are two-letter codes from ISO 3166-1, and are not ROG_Political codes. The mapping from ISO 3166-1 codes to ROG_Political codes is given in the supplementary table of the Registry of Geography named ROG_PoliticalXref.tab. Execute the following SQL query to generate a two-column table that converts from ISO codes to ROG codes:

SELECT DISTINCT AltCode AS IsoCode, Code AS RogCode
  FROM ROG_PoliticalXref
 WHERE Source = 'ISO_A2' AND CodeRel = '=' AND Code <> 'IP'

The test on Code <> 'IP' in the WHERE clause serves to eliminate one element from the results which arises due to an error in the current version of ROG_PoliticalXref.tab.


Page URL: http://www.harvestinformationsystem.info/ROL.htm
Last modified: 01 July 2005