| |
| | Character encoding - Wikipedia, the free encyclopedia |
 | | Conventionally character set and character encoding were considered synonymous, as the same standard would specify both what characters were available and how they were to be encoded into a stream of code units (usually with a single character per code unit). |
 | | With Unicode, a simple character encoding scheme is used in most cases, simply specifying if the bytes for each integer should be in big-endian or little-endian order (even this isn't needed with UTF-8). |
 | | However, there are also compound character encoding schemes, which use escape sequences to switch between several simple schemes (such as ISO 2022), and compressing schemes, which try to minimise the number of bytes used per code unit (such as SCSU, BOCU, and Punycode). |
| en.wikipedia.org /wiki/Text_encoding (1160 words) |
|