Code Pages

Version:
2019.3
Last modified: October 09, 2019

A Code Page (also referred to as Character Set or Encoding) is a table of values where each character has been assigned a numerical representation. A code page enables a computer to identify characters and display text correctly.

Alteryx supports many code pages that can be selected when inputting and outputting data files via the Input data tool and Output data tool, or when converting data types using the Blob convert tool. Additionally, the ConvertFromCodepage and ConvertToCodepagefunctions, available within tools that have an expression editor, can use code page identifiers to convert strings between code pages and Unicode® encoding, the universal character-encoding standard for all written characters as created by the Unicode Consortium.

Alteryx assumes that a wide string is a Unicode® string and a narrow string is a Latin 1 string. if you convert a string to a code page, it will not display correctly. Therefore, code pages should only be used to override text encoding issues within a file. Code pages can be different on different computers, or can be changed for a single computer, leading to data corruption. For the most consistent results, use Unicode® encoding, such as UTF-8 or UTF-16, instead of a specific code page, which allows different languages to be encoded in the same data stream.

UTF-8 is the most portable and compact way to store any character and is used most often. Both UTF-8 and UTF-16 are variable-width encoding, but UTF-8 is compatible with ASCII and the files tend to be smaller than with UTF-16.

For more information on code pages, see the MSDN Library.

Code page identifiers

These code page identifiers can be used with the ConvertFromCodepage and ConvertToCodepage functions. See Functions:

37    (IBM EBCDIC - U.S./Canada)

437   (OEM - United States)

500   (IBM EBCDIC - International)

708   (Arabic - ASMO)

720   (Arabic - Transparent ASMO)

737   (OEM - Greek 437G)

775   (OEM - Baltic)

850   (OEM - Multilingual Latin I)

852   (OEM - Latin II)

855   (OEM - Cyrillic)

857   (OEM - Turkish)

858   (OEM - Multilingual Latin I + Euro)

860   (OEM - Portuguese)

861   (OEM - Icelandic)

862   (OEM - Hebrew)

863   (OEM - Canadian French)

864   (OEM - Arabic)

865   (OEM - Nordic)

866   (OEM - Russian)

869   (OEM - Modern Greek)

870   (IBM EBCDIC - Multilingual/ROECE (Latin-2))

874   (ANSI/OEM - Thai)

875   (IBM EBCDIC - Modern Greek)

932   (ANSI/OEM - Japanese Shift-JIS)

936   (ANSI/OEM - Simplified Chinese GBK)

949   (ANSI/OEM - Korean)

950   (ANSI/OEM - Traditional Chinese Big5)

1026  (IBM EBCDIC - Turkish (Latin-5))

1047  (IBM EBCDIC - Latin-1/Open System)

1140  (IBM EBCDIC - U.S./Canada (37 + Euro))

1141  (IBM EBCDIC - Germany (20273 + Euro))

1142  (IBM EBCDIC - Denmark/Norway (20277 + Euro))

1143  (IBM EBCDIC - Finland/Sweden (20278 + Euro))

1144  (IBM EBCDIC - Italy (20280 + Euro))

1145  (IBM EBCDIC - Latin America/Spain (20284 + Euro))

1146  (IBM EBCDIC - United Kingdom (20285 + Euro))

1148  (IBM EBCDIC - International (500 + Euro))

1149  (IBM EBCDIC - Icelandic (20871 + Euro))

1250  (ANSI - Central Europe)

1251  (ANSI - Cyrillic)

1252  (ANSI - Latin I)

1253  (ANSI - Greek)

1254  (ANSI - Turkish)

1255  (ANSI - Hebrew)

1256  (ANSI - Arabic)

1257  (ANSI - Baltic)

1258  (ANSI/OEM - Viet Nam)

1361  (Korean - Johab)

10000 (MAC - Roman)

10001 (MAC - Japanese)

10002 (MAC - Traditional Chinese Big5)

10003 (MAC - Korean)

10004 (MAC - Arabic)

10005 (MAC - Hebrew)

10006 (MAC - Greek I)

10007 (MAC - Cyrillic)

10008 (MAC - Simplified Chinese GB 2312)

10010 (MAC - Romania)

10017 (MAC - Ukraine)

10021 (MAC - Thai)

10029 (MAC - Latin II)

10079 (MAC - Icelandic)

10081 (MAC - Turkish)

10082 (MAC - Croatia)

20000 (CNS - Taiwan)

20001 (TCA - Taiwan)

20002 (Eten - Taiwan)

20003 (IBM5550 - Taiwan)

20004 (TeleText - Taiwan)

20005 (Wang - Taiwan)

20105 (IA5 IRV International Alphabet No.5)

20106 (IA5 German)

20107 (IA5 Swedish)

20108 (IA5 Norwegian)

20127 (US-ASCII)

20261 (T.61)

20269 (ISO 6937 Non-Spacing Accent)

20273 (IBM EBCDIC - Germany)

20277 (IBM EBCDIC - Denmark/Norway)

20278 (IBM EBCDIC - Finland/Sweden)

20280 (IBM EBCDIC - Italy)

20284 (IBM EBCDIC - Latin America/Spain)

20285 (IBM EBCDIC - United Kingdom)

20290 (IBM EBCDIC - Japanese Katakana Extended)

20297 (IBM EBCDIC - France)

20420 (IBM EBCDIC - Arabic)

20423 (IBM EBCDIC - Greek)

20424 (IBM EBCDIC - Hebrew)

20833 (IBM EBCDIC - Korean Extended)

20838 (IBM EBCDIC - Thai)

20866 (Russian - KOI8)

20871 (IBM EBCDIC - Icelandic)

20880 (IBM EBCDIC - Cyrillic (Russian))

20905 (IBM EBCDIC - Turkish)

20924 (IBM EBCDIC - Latin-1/Open System (1047 + Euro))

20932 EUC-JP Japanese (JIS 0208-1990 and 0212-1990)

20936 (Simplified Chinese GB2312)

21025 (IBM EBCDIC - Cyrillic (Serbian, Bulgarian))

21027 (Ext Alpha Lowercase)

21866 (Ukrainian - KOI8-U)

28591 (ISO 8859-1 Latin I)

28592 (ISO 8859-2 Central Europe)

28593 (ISO 8859-3 Latin 3)

28594 (ISO 8859-4 Baltic)

28595 (ISO 8859-5 Cyrillic)

28596 (ISO 8859-6 Arabic)

28597 (ISO 8859-7 Greek)

28598 (ISO 8859-8 Hebrew: Visual Ordering)

28599 (ISO 8859-9 Latin 5)

28603 (ISO 8859-13 Latin 7)

28605 (ISO 8859-15 Latin 9)

38598 (ISO 8859-8 Hebrew: Logical Ordering)

50220 (ISO-2022 Japanese with no halfwidth Katakana)

50221 (ISO-2022 Japanese with halfwidth Katakana)

50222 (ISO-2022 Japanese JIS X 0201-1989)

50225 (ISO-2022 Korean)

50227 (ISO-2022 Simplified Chinese)

50229 (ISO-2022 Traditional Chinese)

51949 (EUC-Korean)

52936 (HZ-GB2312 Simplified Chinese)

54936 (GB18030 Simplified Chinese)

57002 (ISCII - Devanagari)

57003 (ISCII - Bengali)

57004 (ISCII - Tamil)

57005 (ISCII - Telugu)

57006 (ISCII - Assamese)

57007 (ISCII - Oriya)

57008 (ISCII - Kannada)

57009 (ISCII - Malayalam)

57010 (ISCII - Gujarati)

57011 (ISCII - Punjabi (Gurmukhi))

65000 (UTF-7)

65001 (UTF-8)

Was This Helpful?

Need something else? Visit the Alteryx Community or contact support.