Factbites
 Where results make sense
About us   |   Why use us?   |   Reviews   |   PR   |   Contact us  

Topic: JIS encoding


Related Topics

In the News (Tue 2 Dec 08)

  
  [Ping] Japanese text encoding
JIS Roman runs from 0 to $7f and is identical to ASCII except for a few minor differences (notably, the backslash at 92 is instead a yen symbol, and the tilde at 126 is replaced by an overbar).
The JIS values get all rearranged in order to reserve the range $a0 to $df for a set of 64 half-width katakana; to accomplish this, the characters are squashed into half as many columns (values for the first byte) but twice as many rows (values for the second byte).
The figure shows the encoding ranges for JIS: the first byte will land either from $81 to $9f or from $e0 to $ef, and the second byte will land either from $40 to $7e or from $80 to $fc.
lfw.org /text/jp.html   (978 words)

  
 HTML Document Representation
The "charset" parameter identifies a character encoding, which is a method of converting a sequence of bytes into a sequence of characters.
These tools may employ any convenient encoding that covers most of the characters contained in the document, provided the encoding is correctly labeled.
A given character encoding may not be able to express all characters of the document character set.
www.w3.org /TR/REC-html40/charset.html   (2143 words)

  
  JWPce Manual - Files
JIS: In the JIS system (which has a number of varieties), escape sequences are used to change from ASCII mode to Japanese mode (double-byte mode).
JIS encoding has the advantage that the character coding does not use the extended ASCII character space (characters with the high-bit set).
EUC encoding has the advantage that you can lextract parts of a character sequence easily, because you do not have to deal with whether you are in double-byte mode or not, as you do in JIS.
home.physics.ucla.edu /~grosenth/m_files.html   (3410 words)

  
  JIS encoding - Definition, explanation
In computing, JIS encoding refers to several Japanese Industrial Standards for encoding the Japanese language.
JIS X 0202 (also known as ISO-2022-JP), a set of encoding mechanisms for sending JIS data over transmission mediums that only support 7-bit data.
The main alternative to JIS encoding is Unicode, in the form of UTF-8.
www.calsky.com /lexikon/en/txt/j/ji/jis_encoding.php   (152 words)

  
 [Ping] Japanese text encoding
JIS Roman runs from 0 to $7f and is identical to ASCII except for a few minor differences (notably, the backslash at 92 is instead a yen symbol, and the tilde at 126 is replaced by an overbar).
The JIS values get all rearranged in order to reserve the range $a0 to $df for a set of 64 half-width katakana; to accomplish this, the characters are squashed into half as many columns (values for the first byte) but twice as many rows (values for the second byte).
The figure shows the encoding ranges for JIS: the first byte will land either from $81 to $9f or from $e0 to $ef, and the second byte will land either from $40 to $7e or from $80 to $fc.
www.lfw.org /text/jp.html   (978 words)

  
 UTR#17: Character Encoding Model
When an encoding form specifies that the integers that are being encoded are to be serialized as sequences of bytes, there are often constraints placed on the particular values that those bytes may have.
A character encoding form is a mapping from the set of integers used in a CCS to the set of sequences of code units.
Character encoding schemes are relevant to the issue of cross-platform persistent data involving code units wider than a byte, where byte-swapping may be required to put data into the byte polarity canonical for a particular platform.
www.unicode.org /reports/tr17   (6354 words)

  
 dating JIS_encoding - dating-report.com   (Site not responding. Last check: )
There is also the Shift_JIS encoding, which adds the kanji, full-width hiragana and full-width katakana from JIS X 0208 in a compatible way to JIS X 0201.
Shift_JIS is perhaps the most widely used encoding in Japan, as the compatibility with the single-byte JIS X 0201 character set made it possible for electronic equipment manufacturers (such as cash register manufacturers) to offer an upgrade from older cheaper equipment that was not capable of displaying kanji to newer equipment while retaining character-set compatibility.
The main alternatives to JIS encoding are EUC (used on UNIX systems where the JIS encodings are incompatible with POSIX standards) and more recently Unicode, particularly in the form of UTF-8.
www.dating-report.com /JIS_encoding   (409 words)

  
 Character Encodings for Localizing Alphabets
Encoding schemes that use shift state are not very efficient for internal storage or processing.
Encodings containing shift sequences are used primarily as an external code, which allows information interchange between a program and the outside world.
JIS is the primary encoding method used for electronic transmission such as email because it uses only 7 bits of each byte.
incubator.apache.org /stdcxx/doc/stdlibug/23-3.html   (1665 words)

  
 Character encodings, JIS (OMFFJIS) - OmniMark Library   (Site not responding. Last check: )
is an external output function that accepts UTF-8 encoded data and writes that data to a value string sink, its first argument, converted from a UTF-8 encoding to a JIS encoding.
JIS X 0202) based escape sequences to shift between the encodings defined by the three standards.
In this case, the converted value use is DEL (0x7F) in the JIS encoding, and NOT-A-CHARACTER (0xFFFD) in the Unicode (UTF-8) encoding.
developers.omnimark.com /documentation/library/89.htm   (346 words)

  
 Japanese Locale Concept Dictionary
JIS X0201 - a single byte codeset consisting of 7bit characters corresponding to ISO 646, 7bit characters for katakana, and 8bit characters for both Roman and katakana characters.
JIS X 0212-1990 - Late in 1990 a supplemental Japanese character standard called JIS X 0212-1990 was published by JIS which specified an additional 5,801 kanji, 21 symbols/diacritical marks, and 245 Latin-based characters with diacritical marks.
In the case of New-JIS, the simplified form is in the JIS Level 1 column, and the unsimplified form is in the JIS Level 2 column.
www.cit.gu.edu.au /~davidt/cit3611/C_UNIX/japanese.htm   (2367 words)

  
 SP - Character sets
This conversion is determined by the encoding associated with the storage object.
An encoding may be specified using the name of a mapping from sequences of characters to sequences of bytes.
A bit combination with the 0x8000 and 0x80 bits set is encoded by the sequence of bytes with which the SJIS encoding encodes the character whose number in JIS X 0208 added to 0x8080 is equal to the bit combination.
jclark.com /sp/charset.htm   (1176 words)

  
 QNX Developer Support
Since all members of the basic C character set have byte values in the range [0x00, 0x7F] in ASCII, EUC meets the requirements for a multibyte encoding in Standard C. Such a sequence is not in the initial conversion state immediately after a byte value in the interval [0xA1, 0xFe].
How you interpret a byte in such an encoding depends on a conversion state that involves both a parse state, as before, and a shift state, determined by bytes earlier in the sequence of characters.
JIS also meets the requirements for a multibyte encoding in Standard C. Such a sequence is not in the initial conversion state when partway through a three-byte shift sequence or when in two-byte mode.
www.qnx.com /developers/docs/momentics621_docs/dinkum_en/abridged/charset.html   (1526 words)

  
 Shift-JIS   (Site not responding. Last check: )
Shift_JIS (SJIS) is a character encoding for the Japanese language developed by Microsoft.
As the name implies, it is based on the ISO-2022-JP (JIS) encoding, but with most byte values shifted to accommodate an additional 64 katakana characters in the range 0xA0 to 0xDF.
For a double-byte JIS sequence, the transformation to the corresponding Shift_JIS bytes is:
www.xasa.com /wiki/en/wikipedia/s/sh/shift_jis.html   (130 words)

  
 FIX: JIS Japanese Character Encoding Throws Exception
When you use "JIS" Japanese character encoding on a 3200-series build of the Microsoft virtual machine (Microsoft VM), you might encounter an UnsupportedEncodingException when you use certain aliases.
To work around this problem, for any JIS encoding aliases that fail specify "JIS" explicitly or an alias other than those listed in the "More Information" section of this article.
To prove that the encoding is supported, change the second parameter back to "JIS", and then recompile and run the code.
support.microsoft.com /kb/260818   (437 words)

  
 Developer Resources
Converts a JIS encoded textchar to its equivalent Shift-JIS encoding.
Converts a Kuten code encoded textchar to its equivalent JIS encoding.
Converts a Shift-JIS encoded PlatformChar to its equivalent JIS encoding.
partners.adobe.com /public/developer/indesign/sdk/explodedSDK/win/Documentation/WebDocs/ie/IEncodingUtils.html   (1129 words)

  
 abiword-dev.archive.0105: Re: Patch: Multi-encoding Text import
In fact, the native encoding of RTF (as specified by \ansicpg) is ignored by
I don't understand why new field to existing document class is needed.
"..(encoded)" version,dialog pops up and asks what encoding to use).
www.abisource.com /mailinglists/abiword-dev/01/May/0664.html   (747 words)

  
 Tru64 UNIX Technical Reference for Using Japanese Features   (Site not responding. Last check: )
To allow the characters defined in these standards to be encoded in a single codeset, the first byte of each JIS X 0208 character is encoded in the ranges 81-9F and EO-FC, while the second byte is between 40 and FC, as shown in Table 2-4.
Table 2-5 illustrates the mapping from the encoding of the first byte to the corresponding character sets in the Shift JIS encoding.
If there are JIS X 0208 characters on a line, there must be a switch to ASCII or to the left-hand part of (Roman letters) before the end of the line (in other words, before the CRLF, or carriage return and line feed).
h30097.www3.hp.com /docs/base_doc/DOCUMENTATION/V40G_HTML/SUPPDOCS/JAPANDOC/JAPANCH2.HTM   (2660 words)

  
 UTF-8 and Unicode FAQ
It also announces the encoding of the file to the parser only after the parser has already started to read the file, so it is clearly the less elegant approach.
It is important to understand that the primary purpose of these tables was to demonstrate that Unicode is a superset of the mapped legacy encodings, and to document the motivation and origin behind those Unicode characters that were included into the standard primarily for round-trip compatibility reasons with older character sets.
GB 18030 a new encoding of UCS for use in Chinese government systems that is backwards-compatible with the widely used GB 2312 and GBK encodings for Chinese.
www.cl.cam.ac.uk /~mgk25/unicode.html   (14457 words)

  
 Spartanburg SC | GoUpstate.com | Spartanburg Herald-Journal   (Site not responding. Last check: )
The Jōyō kanji 常用漢字 are 1,945 characters consisting of all the kyōiku kanji, plus an additional 939 kanji taught in junior high and high school.
This standard is rarely used, mainly because the common Shift JIS encoding system could not use it.
JIS X 0221:1995, the Japanese version of the ISO 10646/Unicode standard.
www.goupstate.com /apps/pbcs.dll/section?category=NEWS&template=wiki&text=Kanji   (4014 words)

  
 IUC27: Are We Counting Bytes Yet?
For most SBCS encodings, the mapping of bytes to Unicode characters is also 1:1 (and nearly all of these encodings map entirely to the Basic Multilingual Plane of Unicode, which means that each character consumes just one UTF-16 code unit).
This allows encoding tables to be added to the classpath independently of the provider's code, allowing customers to create their own encodings without having access to the code in the charset provider or any of the related classes.
In a stateful encoding a shift sequence may be required to write the next arbitrary character, so the length must include the length of the longest shift sequence in the encoding.
www.inter-locale.com /whitepaper/IUC27-a303.html   (8649 words)

  
 SP - Character sets
An encoding may be specified using the name of a mapping from sequences of characters to sequences of bytes.
The default encoding is used for file input and output, and, except under Windows 95 and Windows NT, for all other interfaces with the operating system including filenames, environment varable names, environment variable values and command line arguments.
A bit combination with the 0x8000 and 0x80 bits set is encoded by the sequence of bytes with which the SJIS encoding encodes the character whose number in JIS X 0208 added to 0x8080 is equal to the bit combination.
www.jclark.com /sp/charset.htm   (1176 words)

  
 Shift-JIS - Definition, explanation
Shift_JIS (SJIS) is a character encoding for the Japanese language developed by a Japanese company called ASCII and adopted by, amongst others, Microsoft.
It is based on character sets defined within JIS standards JIS X 0201:1997 (for the single-byte characters) and JIS X 0208:1997 (for the double byte characters).
The single-byte characters 0x00 to 0x7F match the ASCII encoding (unrelated to the ASCII company mentioned earlier!), except for a Yen sign at 0x5C and an overline at 0x7E in place of the ASCII character set's backslash and tilde respectively.
www.calsky.com /lexikon/en/txt/s/sh/shift_jis.php   (430 words)

  
 [Ping] Japanese encoding schemes and the WWW
Let's combine the maps for all three encoding systems, together with ASCII (which we'll represent as a "first byte"), and see what we get.
The ASCII region shown on the map corresponds to the subset of ASCII understood by HTML: all of the printable characters from 33 to 127, tab (9), linefeed (10), carriage-return (13), and space (32).
When text is encoded with EUC, it does indeed stand out from ASCII -- but it then becomes indistinguishable from any other locale-specific EUC encoding, such as EUC-encoded Chinese or Korean, for instance.
www.lfw.org /text/jp-www.html   (637 words)

  
 Introduction to i18n - Characters in Each Country
JIS (Japan Industrial Standards) is an organization responsible for coded character sets (CCS) and encodings used in Japan.
Though JIS X 0201 is included in SHIFT-JIS encoding (explained later) and widely used for Windows/Macintosh, usage of this is not encouraged in UNIX.
JIS X 0212 is not widely used, probably because it cannot be included in SHIFT-JIS, the standard encoding for Japanese version of Windows and Macintosh.
www.debian.org /doc/manuals/intro-i18n/ch-languages.en.html   (5041 words)

  
 Dr. Dobb's | Internationalization: A Primer, Part 2 | April 15, 2003
The representation of a multibyte character is determined by its encoding scheme, and more than one encoding scheme may be available for a given multibyte character set.
Encoding schemes that use shift sequences are not very efficient for internal storage or processing.
Encodings containing shift sequences are used primarily as an external code, one that allows information interchange between a program and the outside world.
www.ddj.com /184403076;jsessionid=MMMHFFDI1VL02QSNDLQSKH0CJUNN2JVN?_requestid=626265   (2677 words)

  
 Japanese language and computers . Japanese language . JIS encoding . Unicode . Internet . Kunrei-shiki
Despite efforts, none of the encoding schemes have become the de facto standard, and multiple encoding standards are still in use today.
Not all required characters may be included in a character set standard such as JIS X 0208 JIS, so gaiji 外字, external characters are sometimes used to supplement the character set.
Strictly speaking, the term means either: A set of standard character sets for Japanese, notably: JIS X 0201, the Japanese version of ISO 646 ASCII JIS X 0208, the most common kanji character set containing...
www.uk.kunsimuna.net /Japanese_language_and_computers_UK_862688_ai   (676 words)

  
 [No title]
JIS (of which ISO-2022-JP, the encoding method used for Japanese email, is a subset) is a 7-bit encoding that encompasses several Japanese character sets, as well as ASCII.
As JIS text does not necessarily start in any specific character set, the object's state is initialized to the non-JIS state.
It is aware of JIS encoding rules, and as such will not escape these characters when they are inside Japanese-encoded segments of the text.
people.debian.org /~che/mailman/JisEscape.py   (465 words)

Try your search on: Qwika (all wikis)

Factbites
  About us   |   Why use us?   |   Reviews   |   Press   |   Contact us  
Copyright © 2005-2007 www.factbites.com Usage implies agreement with terms.