Code Pages

Version:
Current
Last modified: November 18, 2020

A Code Page (also referred to as Character Set or Encoding) is a table of values where each character has been assigned a numerical representation. A code page enables a computer to identify characters and display text correctly.

Alteryx supports many code pages that can be selected when inputting and outputting data files via the Input Data tool and Output Data tool, or when converting data types using the Blob Convert tool. Additionally, the ConvertFromCodepage and ConvertToCodepagefunctions, available within tools that have an expression editor, can use code page identifiers to convert strings between code pages and Unicode®, the universal character-encoding standard for all written characters as created by the Unicode Consortium.

Alteryx assumes that a wide string is a Unicode® string and a narrow string is a Latin 1 string. If you convert a string to a code page, it will not display correctly. Therefore, code pages should only be used to override text encoding issues within a file. Code pages can be different on different computers or can be changed for a single computer, leading to data corruption. For the most consistent results, use Unicode®, such as UTF-8 or UTF-16 encoding, instead of a specific code page, which allows different languages to be encoded in the same data stream.

UTF-8 is the most portable and compact way to store any character and is used most often. Both UTF-8 and UTF-16 are variable-width encoding, but UTF-8 is compatible with ASCII and the files tend to be smaller than with UTF-16.

For more information on code pages, see the MSDN Library.

To support the same functionality on Linux, Alteryx employs the ICU library. We use the same IDs that are on Windows, converting them to string ICU converters. ICU does not support the whole list of Windows encodings or there can be differences when converting the data from one code page to another.

 

Code Page Identifiers

Following code page identifiers are supported with the ConvertFromCodepage and ConvertToCodepage functions. See Functions

 

ID Description Support
37 IBM EBCDIC - U.S./Canada Original engine and AMP.
500 IBM EBCDIC - International Original engine and AMP.
932 ANSI/OEM - Japanese Shift-JIS Original engine and AMP.
949 ANSI/OEM - Korean EUC-KR Original engine and AMP. Not supported for the Download and Blob Convert.
1250 ANSI - Central Europe Original engine and AMP.
1251 ANSI - Cyrillic Original engine and AMP.
1252 ANSI - Latin I Original engine and AMP.
1253 ANSI - Greek Original engine and AMP.
1254 ANSI - Turkish Original engine and AMP.
1255 ANSI - Hebrew Original engine and AMP.
1256 ANSI - Arabic Original engine and AMP.
1257 ANSI - Baltic Original engine and AMP.
1258  ANSI/OEM - Vietnamese Original engine and AMP.
10000 MAC - Roman Original engine and AMP.
28591 ISO 8859-1 Latin I Original engine and AMP.
28592 ISO 8859-2 Central Europe Original engine and AMP.
28593 ISO 8859-3 Latin 3 Original engine and AMP.
28594 ISO 8859-4 Baltic Original engine and AMP.
28595 ISO 8859-5 Cyrillic Original engine and AMP.
28596 ISO 8859-6 Arabic Original engine and AMP.
28597 ISO 8859-7 Greek Original engine and AMP.
28598 ISO 8859-8 Hebrew: Visual Ordering Original engine.
28599 ISO 8859-9 Latin 5 Original engine and AMP.
28605 ISO 8859-15 Latin 9 Original engine and AMP.
54936 Simplified Chinese GB18030 Original engine and AMP. Not supported for the Download and Blob Convert.
65001 Unicode UTF-8 Original engine and AMP.
UTF16 Unicode UTF-16 Original engine and AMP.
Was This Helpful?

Running into problems or issues with your Alteryx product? Visit the Alteryx Community or contact support.