Wiley --> wiley.com

The Web Testing Companion: The Insider's Guide to Efficient and Effective Tests

Lydia Ash

Code Pages

Pregenerated Windows code pages are excellent data for input as they allow you to paste in an excellent set of test data for almost any text entry field. Here they are presented for reference, either to cut and paste through your browser, or to download each codepage.

Note: Some characters may not be transmitted, received, or displayed correctly, although every attempt has been made towards this. Some code points may still need to be generated using a tool such as Character Map so that you can ensure you have the proper code point for your testing. You can view the code in html form or download it in doc format.

HTML PAGES DOCS
874 874.doc
932 932.doc
936 936.doc
950 950.doc
1250 1250.doc
1251 1251.doc
1252 1252.doc
1253 1253.doc
1254 1254.doc
1255 1255.doc
1256 1256.doc
1257 1257.doc
1258 1258.doc

The mappings between languages used and the code pages that cover them is not direct, but rather loose. This is a general guide for which code pages to use to generate test data for applications that will be localized or globalized for various languages.

ISO 8859

The International Organization for Standardization (ISO) lays out many standards for the computing industry. Each part of ISO/IEC 8859 specifies a character set that is suitable both for data- and text-processing applications and for information interchange.

For information processing, it includes 8-bit single-byte coded graphic character sets as follows:

  • Part 1: Latin alphabet No.1 (1997) - second edition

  • Part 2: Latin alphabet No.2 (1998) - second edition

  • Part 3: Latin alphabet No.3 (1998) - second edition

  • Part 4: Latin alphabet No.4 (1998) - second edition

  • Part 5: Latin/Cyrillic alphabet (1998) - second edition

  • Part 6: Latin/Arabic alphabet (1998) - second edition

  • Part 7: Latin/Greek alphabet (1998) - second edition

  • Part 8: Latin/Hebrew alphabet (1998) - second edition

  • Part 9: Latin alphabet No.5 (1998) - second edition

  • Part 10: Latin alphabet No.6 (1998) - second edition

  • Part 11: Latin/Thai alphabet (1998)

  • Part 12: Unassigned

  • Part 13: Latin alphabet No.7 (1998)

  • Part 14: Latin alphabet No.8 (1998)

  • Part 15: Latin alphabet No.9 (1998)

    Each part specifying a Latin Alphabet lists the languages for which it has been designed. These are:

  • Latin Alphabet No. 1. Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faroese, Finnish, French (with restrictions), Frisian, Galician, German, Greenlandic, Icelandic, Irish Gaelic (new orthography), Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhjaeto-Romanic, Scottish Gaelic, Spanish, and Swedish.

  • Latin Alphabet No. 2. Albanian, Croat, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak, Slovene, and Sorbian.

  • Latin Alphabet No. 3. Esperanto and Maltese, and if needed in conjunction with these, English, French (with restrictions), German, Italian, Latin, and Portuguese. Coding of Turkish characters is deprecated in this code.

  • Latin Alphabet No. 4. Danish, English, Estonian, Finnish, German, Greenlandic, Latin, Latvian, Lithuanian, Norwegian, Sámi (with restrictions), Slovene, and Swedish.

  • Latin Alphabet No. 5. Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faroese, Finnish, French (with restrictions), Frisian, Galician, German, Greenlandic, Irish Gaelic (new orthography), Italian, Latin, Luxemburgish, Norwegian, Portuguese, Spanish, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish, and Turkish.

  • Latin Alphabet No. 6. Danish, English, Estonian, Faroese, Finnish, German, Greenlandic, Icelandic, Irish Gaelic (new orthography), Latin, Lithuanian, Norwegian, Sámi (with restrictions), Slovene, and Swedish.

  • Latin Alphabet No. 7. Danish, English, Estonian, Finnish, German, Latin, Latvian, Lithuanian, Norwegian, Polish, Slovene, and Swedish.

  • Latin Alphabet No. 8. Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch, English, French (with restrictions), Frisian, Galician, German, Greenlandic, Irish Gaelic (old and new orthographies), Ialian, Latin, Luxemburgish, Manx Gaelic, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish, and Welsh.

  • Latin Alphabet No. 9. Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Frisian, Galician, German, Greenlandic, Icelandic, Irish Gaelic (new orthography), Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, and Swedish.

    Note: For writing French, three characters not included in Latin Alphabets 1, 3, 5, and 8 are also needed. These are included in Latin Alphabet No. 9.

    ISO 8859 versus Windows Code Pages

    While the ISO standards are very clear, sometimes their mappings to a Windows code page (or any other corporate interpretation) is not so precise. Because of the slight differences that may occur in the interpretations, I refer to the relationship as a correlation rather than a direct mapping. These correlations will be useful when testing various languages and the globalization of your application. The code pages and more are available on the companion Web site to the book.

  • 1252 correlates with ISO 8859-1

  • 1250 correlates with ISO 8859-2

  • 1257 correlates with ISO 8859-4

  • 1256 correlates with ISO 8859-6

  • 1253 correlates with ISO 8859-7

  • 1255 correlates with ISO 8859-8

  • 874 correlates with ISO 8859-11

    Additional Windows Code Pages

    Other important Windows code pages are not strict interpretations of the ISO standards or the original standards that developed them. Many have additional ranges added for better coverage of the language of the people they represent.

  • Windows 936 code page is the GB 2312-80 (based from the ISO 646) with the Hanzi corrections. (CHS)

  • Windows 932 code page is JIS X 0208-1990 plus the Microsoft extensions by SJIS code. (JPN)

  • Windows 950 is the Big Five set plus row 89 of the ETen extension. (CHT/Taiwanese)

  • Windows 949 is 5601 plus extensions. (Korean)

  • ISCII is a newly developed code page for Indic.

  • GB 18030 is the newest revision of the CHS code page and includes 4-byte characters.



Cover

ISBN 0-4714-30218
578 Pages
May, 2003

Wiley Technology Publishing
Timely. Practical. Reliable.

 
[Book Home] [Links] [App. B] [App. G] [App. L] [Lang Guides] [Samples] [HTTP Responses] [Questions] [Templates] [System Guides] [Readings]