Factbites
 Where results make sense
About us   |   Why use us?   |   Reviews   |   PR   |   Contact us  

Topic: UCS 2


Related Topics

In the News (Sat 28 Nov 09)

  
  Universal Character Set - Wikipedia, the free encyclopedia
After the publication of Unicode 3.0 in February 2000, corresponding new and updated characters entered the UCS via ISO/IEC 10646-1:2000.
The UCS has over 1.1 million code points, but only the first 65,536 (the Basic Multilingual Plane, or BMP) had entered into common use before 2000.
This situation began changing when the People's Republic of China (PRC) mandated in 2000 that computer systems sold in its territory must support GB18030, which required that computer systems intended for sale in the PRC must move beyond the BMP.
en.wikipedia.org /wiki/Universal_character_set   (1287 words)

  
 Short overview of ISO/IEC 10646 and Unicode
UCS is the first offcially standardized coded character set with the purpose to eventually include all characters used in all the written languages in the world (and, in addition, all mathematical and other symbols).
UCS is intended to be usable both for internal data representation in computer systems and in data communication.
The character repertoire of the first version of UCS is based on an amalgamation of all internationally standardized coded character sets and the most important company-defined de facto standards for coded character sets that existed in 1991.
www.nada.kth.se /i18n/ucs/unicode-iso10646-oview.html   (3204 words)

  
 [No title]
Multi-octet characters, however, are not compatible with many current applications and protocols, and this has led to the development of a few so-called UCS transformation formats (UTF), each with different characteristics.
2) Prepare the high-order bits of the octets as per the second column of the table.
2) Determine which bits encode the character value from the number of octets in the sequence and the second column of the table above (the bits marked x).
www.ietf.org /rfc/rfc2279.txt   (2481 words)

  
 Character Sets: UCS/Unicode Environment (Library of Congress)
This subset is made up of the UCS characters that correspond to the over 16,000 characters defined in the separate MARC-8 character sets for MARC 21.
This encoding has the advantage of allowing the Basic Latin (ASCII) subset of the MARC 21 repertoire to be encoded with the same 8-bit encodings as in MARC-8 (with only one octet per character), thus preserving the basic structural elements of the MARC 21 record, while enabling record content to be multiscript.
It represents characters in a systematic way as 1, 2, or 3 octets, using the left-most bits of each octet to indicate how the octet is to be interpreted.
www.loc.gov /marc/specifications/speccharucs.html   (1105 words)

  
 Problems and Solutions for Unicode and User/Vendor Defined Characters   (Site not responding. Last check: 2007-10-20)
UCS does include all of the characters that exist in existing standards such as JIS, but it may not necessarily contain characters such as vendor defined characters which did not exist in standards.
In UCS there are several undefined reserved areas, and it is possible to use user defined characters in excess of 6400 characters if user defined characters are allotted to these areas.
BMP, are expressed in 2 bytes and characters from Planes 0x01 to 0x10 are expressed in 4 bytes, and as one character is no longer a 2 byte fixed length, it is possible that character processing may become complex.
www.opengroup.or.jp /jvc/cde/ucs-conv-e.html   (5136 words)

  
 ISO-10646 Concept Dictionary
The goal in creating ISO 10646 was to include all characters from all significant languages; to be a UCS.
Plane-octet - byte 2 in a UCS-4 encoded character which designates a plane of characters within a group.
If one tries to imagine that either UCS encoding may be used as a multibyte encoding, several problems occur.
www.cit.gu.edu.au /~davidt/cit3611/C_UNIX/ISO-10646.htm   (2011 words)

  
 Info: (recode) UCS-2   (Site not responding. Last check: 2007-10-20)
Universal Character Set, 2 bytes ================================ One surface of `UCS' is usable for the subset defined by its first sixty thousand characters (in fact, 31 * 2^11 codes), and uses exactly two bytes per character.
It is a mere dump of the internal memory representation which is *natural* for this subset and as such, conveys with it endianness problems.
The value `0xFFFE' is not an `UCS' character, so if this value is seen at the beginning of a file, `recode' reacts by swapping all pairs of bytes.
www.cims.nyu.edu /cgi-comment/info2html?(recode)UCS-2   (370 words)

  
 The Old Joel on Software Forum - Unicode
Multi-byte is a different particular way of representing characters using 1 or 2 bytes, and does not cover all of Unicode.
In the early days, 2 bytes was enough precision to encode all the code points which is why Windows uses it.
However, when it was determined that 2 bytes did not provide enough values for all possible code points, UTF-16 and UTF-8 where created (along with UCS-4 which uses 4 bytes per code point).
discuss.fogcreek.com /joelonsoftware?cmd=show&ixPost=168543   (931 words)

  
 [No title]
The UCS actually has several versions of this depending on whether it's mounted on a building or a unit and whether the hard point is light or heavy.
This is the weapon of choice for the UCS when assaulting enemy bases and encampments, though even at with full ammo upgrades this weapon doesn't quite match the ED's mobile artillery weapons.
This means that each shot actually uses up 2 rockets rather than 1,However the Rocket Launcher upg.1 comes with 20 more missiles than the previous version and with the higher damage inflicted, this is more than worth it.
www.cheatcc.com /pc/sg/moon_project.txt   (15540 words)

  
 Universal character set -[ruv.net : Information Portal]-
UCS is kept synchronized character by character with Unicode.
It has over a million code points, but only the first 65536 (the Basic Multilingual Plane, or BMP) are commonly used, the remainder being reserved for such purposes as representing ancient Egyptian hieroglyphics or rare Chinese characters.
Another encoding is UCS-4, which uses a single code value between 0 and, theoretically, hexadecimal FFFFFFFF for each character (although the UCS stops at 10FFFF), and allowing that value to be represented as exactly four bytes (one 32-bit word).
www.artpolitic.org /infopedia/uc/UCS.html   (520 words)

  
 UTF-8 and Unicode FAQ
UCS and Unicode are first of all just code tables that assign integer numbers to characters.
At this level, UCS support is very comparable to ISO 8859 support and the only significant difference is that we have now thousands of different characters available, that characters can be represented by multibyte sequences, and that ideographic Chinese/Japanese/Korean characters require two terminal character positions (double-width).
GB 18030 a new encoding of UCS for use in Chinese government systems that is backwards-compatible with the widely used GB 2312 and GBK encodings for Chinese.
www.cl.cam.ac.uk /~mgk25/unicode.html   (14421 words)

  
 Chap2
On this account, if the CS is presented before a UCS, then the CS center will create a pathway to the UCS center, and activation will flow from the CS to the UCS (I will adopt informal terminology here to simplify the writing: activation doesn't flow between stimuli, but between their centers).
Thus, the UCS is being associated with the memory of the CS.
A less-intense shock UCS that lasts for a relatively long time may be equivalent to a more intense shock UCS that lasts for a relatively short time.
www.ucs.louisiana.edu /~cgc2646/LRN/Chap2.html   (18121 words)

  
 UTF-8 Computer Encyclopedia Enterprise Resource Directory Complete Guide to Internet   (Site not responding. Last check: 2007-10-20)
(UCS transformation format 8) An {ASCII}-compatible multibyte {Unicode} and {UCS} encoding, used by {Java} and {Plan 9}.
For these reasons, UCS-2 is not a suitable external encoding of Unicode in filenames, text files, environment variables, etc. The {ISO 10646} {Universal Character Set} (UCS), a superset of Unicode, occupies a 31-bit code space and the obvious UCS-4 encoding for it (a sequence of 32-bit words) has the same problems.
The UTF-8 encoding of Unicode and UCS avoids the problems of fixed-length Unicode encodings because an ASCII file encoded in UTF is exactly same as the original ASCII file and all non-ASCII characters are guaranteed to have the most significant bit set (bit 0x80).
www.jaysir.com /computer-encyclopedia/u/utf-8-computer-terms.htm   (235 words)

  
 Universal character set -- Facts, Info, and Encyclopedia article   (Site not responding. Last check: 2007-10-20)
The UCS has over 1.1 million code points, but only the first 65536 (the (Click link for more info and facts about Basic Multilingual Plane) Basic Multilingual Plane, or BMP) were commonly used before 2000.
In Unicode terminology these characters are called high surrogates and low surrogates respectively and (Click link for more info and facts about UTF-16) UTF-16 is the Unicode terminology for UCS-2.
However, any normative references to the UCS as a publication should cite a particular part and version, using the form ISO/IEC 10646-:; for example: ISO/IEC 10646-1:1993.
www.absoluteastronomy.com /encyclopedia/u/un/universal_character_set.htm   (1000 words)

  
 www.collocations.de: Software
The UCS toolkit is a collection of libraries and scripts for the statistical analysis of cooccurrence data.
NB: Future releases of the UCS toolkit are expected to require Perl version 5.8.0 or newer (for Unicode support) and may also require R version 1.9.0 or newer.
Therefore, the UCS system is not intended as a number cruncher that extracts and processes cooccurrences from several hundred million words of text in a few minutes.
www.collocations.de /software.html   (313 words)

  
 Production First Software Encyclopedia of Typography and Electronic Communication : U
An encoding transformation form which conforms to Unicode character semantics, extended with surrogate code points, so as to be able to reference the first group of 17 planes (planes 0 through 16) of ISO/IEC/10646.
The latter form uses a byte order mark as the first character in the data stream to determine the byte polarity of the data.
An encoding transformation form which conforms to Unicode character semantics, able to reference the first group of 17 planes (planes 0 through 16) of ISO/IEC/10646 directly using 32-bit code points instead of surrogate code points.
ourworld.compuserve.com /homepages/profirst/u.htm   (2351 words)

  
 Joliet Specification   (Site not responding. Last check: 2007-10-20)
The UCS-2 Level 1, UCS Level 2, and UCS-2 Level 3 escape sequences are considered to be registered according ISO 2735 for purposes of setting bit 0 of the Volume Flags field of the SVD.
Otherwise, the definitions of SEPARATOR 1 and SEPARATOR 2 shall be recorded according to section 7.4.3 of ISO 9660:1988.
Mode 2 Form 2 sectors and CD-Digital Audio tracks may be recorded on the same media as a Joliet volume.
bmrc.berkeley.edu /people/chaffee/jolspec.html   (3055 words)

  
 The skew.org XML Tutorial
The basic idea of Unicode and the UCS is that a set of abstract objects called characters can be represented by at least one descriptive name and also by at least one unique number.
Since UCS characters are intangible, decoding, to a computer, really means conversion to some other encoding form, most likely UTF-16, UCS-2 or UCS-4.
The allowable UCS character sequences in a decoded document fall into two main categories: markup and character data.
skew.org /xml/tutorial   (8463 words)

  
 Mandragor & Apinc - Free Documentation Base
16-bit characters, however, are not compatible with many current applications and protocols, and this has led to the development of a few so-called UCS transformation formats (UTF), each with different characteristics.
UTF-8, the object of this memo, has the characteristic of preserving the full US-ASCII range: US-ASCII characters are encoded in one octet having the usual US-ASCII value, and any octet with such a value can only be an US-ASCII character.
Yergeau Informational [Page 2] RFC 2044 UTF-8 October 1996 A description can also be found in Unicode Technical Report #4 [UNI- CODE].
docs.mandragor.org /files/RFCs/20xx/2044   (1404 words)

  
 ADA Lexical
This document is based on the definition found in the official ADA Reference Manual chapter 2.
This lexer will understand the replacement characters for two reasons: (1) for completeness and (2) because it doesn't prevent any of the regular tokens to be used and defined properly.
Note that the base is by default limited to 2 to 16.
ada.m2osw.com /ADA_Lexical.html   (2040 words)

  
 UCS-2 Encoding Form   (Site not responding. Last check: 2007-10-20)
In a UCS-2 Unicode system, one cannot legally interpret individual bytes that constitute only a portion of a Unicode character; rather, the entire 16-bit integral value must be tested.
Consequently, the following code may return 0 or 1 depending on whether the machine is big-endian or little-endian, respectively.
In neither case would the correct answer (2) be returned.
www.uazone.org /multiling/unicode/ucs2.html   (497 words)

  
 Funny, It Worked Last Time : Encodings in Strings are Evil Things (Part 2)
And someone who is reading a string from a file, or from memory, needs to use the exact same encoding scheme, or we're off in la-la land.
As we said, early versions of Unicode specifiy UCS-2 as a standard, back when nothing existed in the UCS tables beyond the BMP.
This adds a brand new level of complexity to string handling, because now a single codepoint could be either 2 or 4 bytes.
blogs.msdn.com /ryanmy/archive/2004/10/19/244865.aspx   (2081 words)

  
 Compaq Tru64 UNIX Technical Reference for Using Korean Features
UCS is a standard character encoding for the universal character set specified in the Unicode and ISO/IEC 10646 standards.
UCS has two forms; UCS-2 (16-bit, or 2 octet units) and UCS-4 (32-bit, or 4 octet units).
UTF-8, the standard method for transforming UCS-4 or UCS-2 data into a sequence of 8-bit bytes and ensuring interchange transparency for characters from the ASCII character set (code positions 0 through 127).
www.helsinki.fi /atk/unix/dec_manuals/DOC_51A/HTML/SUPPDOCS/KOREADOC/KOREACH2.HTM   (870 words)

  
 [No title]
Proposed changes: Page 61, 8.1.1, Computer's coded character set, fourth paragraph, NOTE 2, first sentence, change "UCS-2" to "UTF-16".
Page 63, 8.1.2.1, COBOL character repertoire, general rule 1, NOTE 2, first sentence, change "UCS-2" to "UTF-16".
Page 76, 8.3.1.2.1.2, Alphanumeric literals, syntax rule 2, NOTE, change "UCS-2" to "UTF16" (2 occurrences) Page 79, 8.3.1.2.4.2, National literals, syntax rule 2, NOTE, third sentence, change "UCS-2" to "UTF-16".
www.ncits.org /tc_home/j4htm/m232/01-0448.doc   (471 words)

  
 Unicode(5)   (Site not responding. Last check: 2007-10-20)
The Unicode Standard specifies a universal character set (UCS) that contains definitions in Ver- sion 2.1 for 38,887 characters and also includes a Private Use Area for vendor- or user-defined characters.
The 16-bit character values in Unicode are zero-extended through a second 16-bit unit in the larger encoding format.
Universal character encoding that an implementation parses in 16-bit units (2 octets) is known as UCS-2.
www.uwm.edu /cgi-bin/IMT/wwwman?topic=Unicode(5)&msection=   (1395 words)

Try your search on: Qwika (all wikis)

Factbites
  About us   |   Why use us?   |   Reviews   |   Press   |   Contact us  
Copyright © 2005-2007 www.factbites.com Usage implies agreement with terms.