| |
| | UTR#17: Character Encoding Model |
 | | Coded character sets are the basic object that both ISO and vendor character encoding committees produce. |
 | | EUC (similar to the DBCS Shift encodings, with the application of different numeric shift rules, and the introduction of single-shift bytes: 0x8E and 0x8F, that may introduce 3-byte and 4-byte sequences), for example, EUC-JP or EUC-TW on UNIX. |
 | | In Java or C#, the 16-bit code units are by definition UTF-16 code units, while in C and C++, the binding to a specific character set is again up to the implementation. |
| www.unicode.org /reports/tr17 (6354 words) |
|