Code Pages

Last modified: September 09, 2021

Docs are available before the release of Designer Cloud so you can get a sneak peek. This content might change between now and the official release.

A Code Page (also referred to as Character Set or Encoding) is a table of values where each character has been assigned a numerical representation. A code page lets a computer identify characters and display text correctly.

Alteryx supports many code pages that you can select when you input and output data files via the Input Data tool and Output Data tool. Additionally, the ConvertFromCodepage and ConvertToCodepage functions (available within tools that have an expression editor), can use code page identifiers to convert strings between code pages and Unicode®, the universal character-encoding standard for all written characters as created by the Unicode Consortium.

Alteryx assumes that a wide string is a Unicode® string and a narrow string is a Latin 1 string. If you convert a string to a code page, it will not display correctly. Therefore, code pages should only be used to override text encoding issues within a file. Code pages can be different on different computers or can be changed for a single computer, leading to data corruption. For the most consistent results, use Unicode®, like UTF-8 or UTF-16 encoding, instead of a specific code page, which allows different languages to be encoded in the same data stream.

UTF-8 is the most portable and compact way to store any character and is used most often. Both UTF-8 and UTF-16 are variable-width encoding, but UTF-8 is compatible with ASCII and the files tend to be smaller than with UTF-16.

For more information on code pages, visit the MSDN Library.

To support the same functionality on Linux, Alteryx employs the ICU library. We use the same IDs that are on Windows, converting them to string ICU converters. ICU does not support the whole list of Windows encodings or there can be differences when converting the data from one code page to another.

Code Page Identifiers

These code page identifiers are supported with the ConvertFromCodepage and ConvertToCodepage functions. Support is through AMP only.

ID Description
37 IBM EBCDIC - U.S./Canada
500 IBM EBCDIC - International
932 ANSI/OEM - Japanese Shift-JIS
949 ANSI/OEM - Korean EUC-KR
1250 ANSI - Central Europe
1251 ANSI - Cyrillic
1252 ANSI - Latin I
1253 ANSI - Greek
1254 ANSI - Turkish
1255 ANSI - Hebrew
1256 ANSI - Arabic
1257 ANSI - Baltic
1258 ANSI/OEM - Vietnamese
10000 MAC - Roman
28591 ISO 8859-1 Latin I
28592 ISO 8859-2 Central Europe
28593 ISO 8859-3 Latin 3
28594 ISO 8859-4 Baltic
28595 ISO 8859-5 Cyrillic
28596 ISO 8859-6 Arabic
28597 ISO 8859-7 Greek
28599 ISO 8859-9 Latin 5
28605 ISO 8859-15 Latin 9
54936 Simplified Chinese GB18030
65001 Unicode UTF-8
UTF16 Unicode UTF-16
Was This Page Helpful?

Running into problems or issues with your Alteryx product? Visit the Alteryx Community or contact support. Can't submit this form? Email us.