Factbites
 Where results make sense
About us   |   Why use us?   |   Reviews   |   PR   |   Contact us  

Topic: ISO 10646


Related Topics

  
  Universal Character Set - Wikipedia, the free encyclopedia
ISO set out to compose the universal character set in 1989, and published the draft of ISO 10646 in 1990.
The ISO standardisers realised they could not continue to support the standard in its current state and negotatied the unification of their standard with Unicode.
ISO 10646, a general, informal citation for the ISO/IEC 10646 family of standards, is acceptable in most prose.
en.wikipedia.org /wiki/ISO_10646   (1287 words)

  
 ISO-10646 Concept Dictionary
ISO 10646 was accepted in June 1992, and is to be published in 1993.
It would be very difficult to use 10646 as a file code because of ISO C requirements combined with the de facto standard size of a byte.
The 10646 nomenclature refers to coded characters as multiples of octets and assumes taht octets are serialized, while the Unicode nomenclature rfefores to coded characters as indivisible 16 bit entitles.
www.cit.gu.edu.au /~davidt/cit3611/C_UNIX/ISO-10646.htm   (2011 words)

  
 RFC 1815 (rfc1815) - Character Sets ISO-10646 and ISO-10646-J-1
For the practical use of ISO 10646, a lot of external profiling such as restriction of characters, restriction of combination of characters and addition of language information is necessary.
As the profiling of ISO 10646 largely affects which character or combination of characters could be properly displayed, changes of profiling of ISO 10646 are as significant as additions of new character sets of ISO 2022.
The other problem of ISO 10646 for Han characters is that, to display them in quality required for daily plain text processing in China/Japan/Korea, it is necessary to add profiling information on which one of Chinese/Japanese/Korean the text is using.
www.faqs.org /rfcs/rfc1815.html   (1280 words)

  
 Zvon - RFC 1815 [Character Sets ISO-10646 and ISO-10646-J-1] - Introduction
As ISO 10646 specifies too little about how text is visualized, to practically use ISO 10646, it is necessary to restrict the standard minimally and then add some amount of profiling information.
For ISO 2022 [ISO2022] based national standards, sufficient profiling information is provided by national standardization bodies, but, for ISO 10646, such a profiling is not yet provided.
That is, it's impractical to support the entirety of ISO 10646 (new restriction or profiling can always be added), so a client needs to know whether some restriction or profiling is being used before it can decide whether to display the body part.
www.zvon.org /tmRFC/RFC1815/Output/chapter1.html   (627 words)

  
 The fight about Unicode in IETF   (Site not responding. Last check: 2007-10-08)
Proponents of ISO 10646 claim that it is an universal character set, capable of representing most graphic characters in the world, "and those we have forgotten will be in the next version".
The main complaint about ISO 10646 is the so-called "Han unification", the decision to make the Japanese, Chinese and Korean character sets into one character set where the characters of the same "meaning" and general shape were joined together.
ISO plans to allocate characters outside the 16-bit range in the next study period; it remains to be seen how and when the UNICODE consortium will follow.
www.alvestrand.no /ietf/unicode.html   (504 words)

  
 A tutorial on character code issues
For example, in the ISO 10646 character code the numeric codes for "a", "!", "ä", and "‰" (per mille sign) are 97, 33, 228, and 8240.
In one possible encoding for ISO 10646, the string a!ä‰ is presented as the following sequence of octets (using two octets for each character): 0, 97, 0, 33, 0, 228, 32, 48.
The ISO 8859-1 standard (which is part of the ISO 8859 family of standards) defines a character repertoire identified as "Latin alphabet No. 1", commonly called "ISO Latin 1", as well as a character code for it.
www.cs.tut.fi /~jkorpela/chars.html   (13641 words)

  
 RFC 2781 (rfc2781) - UTF-16, an encoding of ISO 10646   (Site not responding. Last check: 2007-10-08)
In ISO 10646, each character is assigned a number, which Unicode calls the Unicode scalar value.
Here the CCS is Unicode/ISO 10646 and the CES is the same in all three cases, except for the serialization order of the octets in each character, and the external determination of which serialization is used.
Security Considerations UTF-16 is based on the ISO 10646 character set, which is frequently being added to, as described in Section 6 and Appendix A of this document.
www.faqs.org /rfcs/rfc2781.html   (3415 words)

  
 RFC 2781 - UTF-16, an encoding of ISO 10646. P. Hoffman, F. Yergeau.
RFC 2781 UTF-16, an encoding of ISO 10646 February 2000 The term "network byte order" has been used in many RFCs to indicate big-endian serialization, although that term has yet to be formally defined in a standards-track document.
RFC 2781 UTF-16, an encoding of ISO 10646 February 2000 that interpret text entities (such as looking for embedded programming code), must be careful not to execute the code without first alerting the recipient.
RFC 2781 UTF-16, an encoding of ISO 10646 February 2000 In practice, then, a version-independent label is warranted, provided the label is understood to refer to all versions after Amendment 5, and provided no incompatible change actually occurs.
rfc.sunsite.dk /rfc/rfc2781.html   (3669 words)

  
 Universal character set -- Facts, Info, and Encyclopedia article   (Site not responding. Last check: 2007-10-08)
There are several (Click link for more info and facts about character encoding) character encoding forms defined by ISO 10646 for the Universal Character Set.
(Click link for more info and facts about ISO) ISO set out to compose the universal character set in 1989, and the draft of ISO 10646 was published in 1990.
Related ISO standards from the (Click link for more info and facts about List of ISO standards) List of ISO standards are: (Click link for more info and facts about ISO 2022) ISO 2022, ISO 6429, ISO 14651
www.absoluteastronomy.com /encyclopedia/u/un/universal_character_set.htm   (1000 words)

  
 FAQ - Unicode and ISO 10646
A: In 1991, the ISO Working Group responsible for ISO/IEC 10646 (JTC 1/SC 2/WG 2) and the Unicode Consortium decided to create one universal standard for coding multilingual text.
Since then, the ISO 10646 Working Group (SC 2/WG 2) and the Unicode Consortium have worked together very closely to extend the standard and to keep their respective versions synchronized.
ISO is planning to issue a consolidated standard, to be known simply as ISO/IEC 10646.
www.unicode.org /faq/unicode_iso.html   (407 words)

  
 Short overview of ISO/IEC 10646 and Unicode
I also had the pleasure to take part in the big merger of Unicode and ISO/IEC 10646 that was accomplished at three meetings during 1991 in San Francisco, Geneva and Paris, representing Sweden on the ISO side.
ISO/IEC 10646 is a relatively new character set standard, published in 1993 by the International Organization for Standardization (ISO).
In short, Unicode can be characterized as the (restricted) 2-octet form of UCS on (the most general) implementation level 3, with addition of a more precise specification of the bi-directional behavior of characters, when used in the Arabic and Hebrew scripts.
www.nada.kth.se /i18n/ucs/unicode-iso10646-oview.html   (3204 words)

  
 ISO 8859 Alphabet Soup
The ISO 8859 charsets are not even remotely as complete as the truly great Unicode but they have been around and usable for quite a while (first
ISO 10646) will make this whole chaos of mutually incompatible charsets superfluous because it unifies a superset of all established charsets and is out to cover all the world's languages.
ISO 639 language codes for some 150 of the world's several thousand known languages.
czyborra.com /charsets/iso8859.html   (1564 words)

  
 ISO/IEC 15445:1998 HTML (Work in progress)
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization.
National bodies that are member of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity.
ISO/IEC 10646 specifies a large number of facilities from which different selections may be made to suit individual applications.
www.y12.doe.gov /sgml/wg8/document/1995.htm   (3892 words)

  
 SC4 Activities - National Information Standards Organization (NISO)
The U.S. objections centered primarily on changes made to character names to align them with ISO 10646 naming conventions, and new fonts having been used for the character tables.
Although the U.S. did not oppose, in principle, the idea of including mappings to ISO/IEC 10646, they were concerned that these mappings had not been reviewed by groups such as JTC 1/SC 2 or the Unicode Technical Committee (UTC).
The U.S. recommended, in its disapproval of these registrations, that ISO 2375 (the standard that specifies how character sets are registered), be revised to include a review process to validate aspects such as proposed mappings to other ISO standards.
www.niso.org /international/SC4/wg1-0599.html   (505 words)

  
 UTF-8 and Unicode FAQ
The ISO 10646-1 standard was first published in 1993 and defines the architecture of the character set and the content of the BMP.
Unicode 1.1 corresponded to ISO 10646-1:1993 and Unicode 3.0 corresponds to ISO 10646-1:2000.
ISO 14755 hexadecimal entry of arbitrary characters to input entry support for Hangul and Han characters.
ijstokes.paunix.org /unicode/unicode.html   (8483 words)

  
 Unicode fonts and tools for X11
Perl script, which converts ISO 10646-1 fonts into any other encoding for which there is a Unicode mapping table available.
Unicode and ISO 10646 merged CJK ideograph repertoires from several groups of national source standards.
Misc-Fixed ISO 10646-1 Outline Font Project is to develop Type1 versions of the BDF font family provided here.
www.cl.cam.ac.uk /~mgk25/ucs-fonts.html   (1636 words)

  
 Production First Software Encyclopedia of Typography and Electronic Communication : I
ISO Latin-1 to ISO Latin-4 and ISO Latin-5 to ISO Latin-10 Character sets for various groups of languages of the Latin script using the Roman-based alphabet as defined by the ISO (ISO 8859 parts 1 to 4 and parts 9 to 16).
ISO Latin-1 to ISO Latin-4 and ISO Latin-5 to ISO Latin-10 are character subsets of the more extensive Language Group character subsets defined for Production First Software fonts.
ISO 10646 (ISO/IEC/10646.x) A 4-byte, 32-bit, font character set and encoding standard divided into 32,768 « planes, » each of which permits 65,536 characters (for a total of 2,147,483,648 characters).
ourworld.compuserve.com /homepages/profirst/i.htm   (4668 words)

  
 RFC 2279 - UTF-8, a transformation format of ISO 10646
Versions of the standards ISO/IEC 10646 is updated from time to time by published amendments; similarly, different versions of the Unicode standard exist: 1.0, 1.1 and 2.0 as of this writing.
This is intentional, the rationale being as follows: A MIME charset label is designed to give just the information needed to interpret a sequence of bytes received on the wire into a sequence of characters, nothing more (see RFC 2045, section 2.2, in [MIME]).
Yergeau Standards Track [Page 6] RFC 2279 UTF-8 January 1998 In practice, then, a version-independent label is warranted, provided the label is understood to refer to all versions after Amendment 5, and provided no incompatible change actually occurs.
www.packetizer.com /rfc/rfc.cgi?num=2279   (2521 words)

  
 Character Sets: UCS/Unicode Environment (Library of Congress)   (Site not responding. Last check: 2007-10-08)
The restrictions in these specifications are intended to optimize the interchange of data encoded using the MARC-8 character sets and UCS/Unicode during the period of transition from a largely 8-bit environment to the 16-bit UCS/Unicode environment.
Since MARC-8 and ISO 2022 (Character code structure and extension techniques) allow character sets to be designated as either G0 or G1 sets, the configuration of field 066 subfields and the script codes these subfields contain will depend on the desired assignment of MARC-8 sets to a particular graphic character value range.
Subfield $6 (Linkage) is used in MARC 21 records to link alternate graphic representations of the same data, to identify the presence of specific scripts in a field, and to flag fields in which the display/print directionality of data is right-to-left (e.g., for Arabic script).
www.loc.gov /marc/specifications/speccharucs.html   (1105 words)

  
 Mapping Sinhala between ISO 10646 and SLS 1134
Since Sinhala will be encoded according the the Brahmic harmonization in ISO/IEC 10646, it is important that mappings between UCS Sinhala and 7- and 8-bit Sinhala be made, so that roundtrip integrity in text transfer for data encoded in the two standards can be achieved.
This paper is a contribution toward a mapping for data encoded in ISO/IEC 10646 and data encoded in SLS 1134.
It is true that binary sorting is easy to achieve in 7-bit and 8-bit code tables by following the hexadecimal values of the characters and so their arrangement is of relevance in such codings (it is also true that correct sorting can be achieved in other ways in 7-bits and 8-bits).
www.evertype.com /standards/si/iso10646-to-sls1134.html   (1206 words)

  
 RFC 2044 - UTF-8, a transformation format of Unicode and ISO 10646
ISO 10646 further defines a 31-bit character set, UCS-4, with currently no assignments outside of the region corresponding to UCS-2 (the Basic Multilingual Plane, BMP).
UTF-8 encodes UCS-2 or UCS-4 characters as a varying number of octets, where the number of octets, and the value of each, depend on the integer value assigned to the character in ISO 10646.
This string would label media types containing text consisting of characters from the repertoire of ISO 10646-1 encoded to a sequence of octets using the encoding scheme outlined above.
www.packetizer.com /rfc/rfc.cgi?num=2044   (1448 words)

  
 UTF-8 and Unicode FAQ
The ISO 10646 standard on the other hand is not much more than a simple character set table, comparable to the old ISO 8859 standards.
ISO 10646 was from the beginning designed as a 31-bit character set (with possible code positions ranging from U-00000000 to U-7FFFFFFF), however it took until 2001 for the first characters to be assigned beyond the Basic Multilingual Plane (BMP), that is beyond the first 2
The ISO 10646 working group has agreed to modify their standard to exclude code positions beyond U-0010FFFF, in order to turn the new UCS-4 and UTF-32 into practically the same thing.
www.cl.cam.ac.uk /~mgk25/unicode.html   (14421 words)

  
 i18n/l10n: HTML - base character set   (Site not responding. Last check: 2007-10-08)
The UCS is a Coded Character Set that assigns unique numbers to (currently) about 50,000 of the worlds characters.
Unicode and ISO/IEC 10646 are codepoint by codepoint identical and developed in close synchronization.
The difference between ISO/IEC 10646 and Unicode is that Unicode adds some rules about how the characters are to be used.
www.w3.org /International/O-unicode.html   (139 words)

  
 Transliteration of Indic scripts: How to use ISO 15919   (Site not responding. Last check: 2007-10-08)
Opening a printed copy of the international standard ISO 15919 Transliteration of Devanagari and related Indic scripts into Latin characters, available from National Standards Bodies, it might seem to be rather complicated because of the amount of information, the choices to be made for a transliteration, and the number of tables.
This site also provides further information on the transliteration of Indic scripts, some of which is not given in ISO 15919 (such as the relation to Unicode).
Please note that the transliteration system of ISO 15919 is defined by means of options and recommendations, tables and (on this site) Notes, and not just by tables.
homepage.ntlworld.com /stone-catend/trind.htm   (289 words)

  
 ISO 646 (Good old ASCII)
And the inevitable success seems to be there: RFC 2070 internationalized the Internet's hypertext markup language HTML and declared ISO 10646 its new base charset.
RFC 2277 recommends the use of ISO 10646 to all new Internet protocols.
ISO-8859-1 but goes beyond the 8bit barrier and encodes all the world's characters in a 16bit space and a 20bit extension zone for everything that did not fit into the 16bit space.
czyborra.com /charsets/iso646.html   (505 words)

  
 Misc-Fixed ISO 10646-1 Outline Font Project   (Site not responding. Last check: 2007-10-08)
This project is aimed at producing a free software outline version of the classic bitmapped misc-fixed terminal fonts, with the same coverage as Markus G. Kuhn's extended ISO 10646-1 version of the screen fonts.
For PostScript the CID-keyed font format seems appropriate to handle large character sets such as ISO 10646-1 or Unicode, but the general support is limited to Level 3 devices, and is not yet implemented in this project.
The planned coverage is about 3300 glyphs, the same as the bitmapped ISO 10646-1 version.
www.etek.chalmers.se /~e4jordan/font   (533 words)

Try your search on: Qwika (all wikis)

Factbites
  About us   |   Why use us?   |   Reviews   |   Press   |   Contact us  
Copyright © 2005-2007 www.factbites.com Usage implies agreement with terms.