Factbites
 Where results make sense
About us   |   Why use us?   |   Reviews   |   PR   |   Contact us  

Topic: Text encoding


Related Topics

In the News (Thu 24 Dec 09)

  
  MT-NW Manual: International Language Support
Converting text to and from a form in which it can be safely transmitted over the internet is a complex task, when you take into account the number of different ways that text is represented in the different writing systems of the world.
Because the processes of text encoding conversion and character set mapping are often done at the same time, and by the same software modules, this document often considers both at the same time.
The first process of text encoding is necessary because the transport systems in use are mostly incapable, for historical reasons, of dealing with text that uses more than one byte per character (so-called multi-byte systems, used by countries whose languages have more than 256 individual characters, for example Japanese, or Chinese).
www.smfr.org /mtnw/docs/TextEncoding.html   (3279 words)

  
 NINCH Guide to Good Practice   (Site not responding. Last check: 2007-11-03)
Text encoding thus makes it possible to bridge the gap between local research and insight and the discourse of the larger community, and to articulate interpretative statements in a way that is broadly intelligible.
Encoding languages like TEI and EAD use tag names which are expressive of their function—, , , , , and the like—and because they represent the ideas people actually have about documents, they quickly become intelligible even to the untrained reader.
Among the projects surveyed, the use of TEI DTDs in encoding texts is one of the clearest cases of the adoption of standards for a particular type of material.
www.nyu.edu /its/humanities/ninchguide/V   (7242 words)

  
 [Ping] Japanese text encoding
The text itself between the escape sequences consists of pairs of plain 7-bit bytes in the printable range from $21 to $7e, simply formed by splitting apart the JIS value into two bytes, also known as "raw JIS".
The figure shows the encoding ranges for JIS: the first byte will land either from $81 to $9f or from $e0 to $ef, and the second byte will land either from $40 to $7e or from $80 to $fc.
You might notice that the encoding range excludes $9f to $fc for the second byte when the first byte is $ef.
lfw.org /text/jp.html   (978 words)

  
 Text Encoding Initiative - Wikipedia, the free encyclopedia
The Text Encoding Initiative (TEI) is a consortium of institutions and research projects which collectively maintains and develops a standard for the representation of texts in digital form.
Its major deliverable is a set of Guidelines, which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics.
the Electronic Text Center and the Institute for Advanced Technology in the Humanities at the University of Virginia.
en.wikipedia.org /wiki/Text_Encoding_Initiative   (402 words)

  
 Refining Our Notion of What Text Really Is.
We examine the claim that 'text is an ordered hierarchy of content objects'; this thesis was affirmed by the authors, and others, in the late 1980s and has been associated with certain approaches to text processing and the encoding of literary texts.
The early positive arguments for text being a hierarchy of content objects were advanced largely to promote a particular approach to text processing and text encoding and to discourage the competing alternatives.
The theory of text and text encoding methodology is still in a rudimentary state; we hope the concepts discussed here contributed to the groundwork for further discussion.
www.stg.brown.edu /resources/stg/monographs/ohco.html   (6748 words)

  
 Encoding Class (System.Text)
Encoding is the process of transforming a set of Unicode characters into a sequence of bytes.
Although numerous non-Unicode encodings are supported for compatibility with legacy applications, the Unicode encodings (UTF8Encoding, UnicodeEncoding, and UTF32Encoding) are recommended when there is a choice.
Optionally, the Encoding provides a preamble which is an array of bytes that can be prefixed to the sequence of bytes resulting from the encoding process.
msdn2.microsoft.com /en-us/library/system.text.encoding.aspx   (1451 words)

  
 Theoretical Issues in Text Encoding: A Critical Review   (Site not responding. Last check: 2007-11-03)
Text encoding has long had an important role in the humanities computing community.
Some encoding theorists claim, for instance, that the markup that predominates in the TEI is inappropriately interpretative, compromising the relevance and value of that encoding system for humanists.
Questions are raised about possible misunderstandings of the fundamental nature of markup, and some writers have argued that we limit the value of text encoding when we neglect to exploit certain sorts of markup.
eprg.isrl.uiuc.edu /markuptheory/abstract/poster.html   (1189 words)

  
 Cover Pages: Text Encoding Initiative (TEI)
Text Encoding Initiative Consortium Releases P4 Draft Guidelines in XML and SGML.
The Text Encoding Initiative uses XML in the markup of literary and linguistic texts.
Levels 1-4 allow the conversion and encoding of texts to be performed without the assistance of content experts and can be enriched with more markup at any time.
xml.coverpages.org /tei.html   (7415 words)

  
 Binary to text encoding - Wikipedia, the free encyclopedia
A binary to text encoding is encoding of data in plain text.
These encodings are necessary for transmission of data when the channel or the protocol only allows ASCII printable characters, such as e-mail or usenet.
This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text.
en.wikipedia.org /wiki/Binary_to_text_encoding   (725 words)

  
 Converting Non-Unicode Text (The Java™ Tutorials > Internationalization > Working with Text)
The text editor we used to write this section's code examples supports only ASCII characters, which are limited to 7 bits.
Data in text files is automatically converted to Unicode when its encoding matches the default file encoding of the Java Virtual Machine.
If the default file encoding differs from the encoding of the text data you want to process, then you must perform the conversion yourself.
java.sun.com /docs/books/tutorial/i18n/text/convertintro.html   (387 words)

  
 TEI Text Encoding in Libraries
Encoding is performed automatically based on artifacts of the OCR or other document creation process (page breaks, for example) and metadata collected during the imaging or preparation process.
Level 1 texts are not intended to be adequate for textual analysis; they are more likely to be suited to the goals of a preservation unit or mass digitization initiative.
Texts encoded at Level 4 are able to stand alone as part of a library collection, and do not require images in order for them to be read by students, scholars and general readers.
www.indiana.edu /~letrs/tei   (2948 words)

  
 The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No ...
All that stuff about "plain text = ascii = characters are 8 bits" is not only wrong, it's hopelessly wrong, and if you're still programming that way, you're not much better than a medical doctor who doesn't believe in germs.
Some popular encodings of English text are Windows-1252 (the Windows 9x standard for Western European languages) and ISO-8859-1, aka Latin-1 (also useful for any Western European language).
Encoding menu and tries a bunch of different encodings (there are at least a dozen for Eastern European languages) until the picture comes in clearer.
www.joelonsoftware.com /articles/Unicode.html   (3701 words)

  
 Character encoding - Wikipedia, the free encyclopedia
Conventionally character set and character encoding were considered synonymous, as the same standard would specify both what characters were available and how they were to be encoded into a stream of code units (usually with a single character per code unit).
With Unicode, a simple character encoding scheme is used in most cases, simply specifying if the bytes for each integer should be in big-endian or little-endian order (even this isn't needed with UTF-8).
However, there are also compound character encoding schemes, which use escape sequences to switch between several simple schemes (such as ISO 2022), and compressing schemes, which try to minimise the number of bytes used per code unit (such as SCSU, BOCU, and Punycode).
en.wikipedia.org /wiki/Text_encoding   (1160 words)

  
 Python 3000, Files and Text Encodings
For text this is what you want (and very useful), but for binary data it corrupts it.
This means that when you open a file in text mode, a new I/O layer will determine the encoding and return a unicode object [5].
In summary, if you don't know the encoding of a text file it is a lucky dip as to whether you can correctly decode.
www.voidspace.org.uk /python/articles/guessing_encoding.shtml   (1390 words)

  
 The Nature of Linguistic Data: Using Text Encoding
Text encoding refers specifically to the way in which the structural (and even interpretative) information in text is encoded.
The discipline of using markup codes in a text to describe the function or purpose of the elements in the text, rather than their formatting.
SGML-based guidelines for the encoding of texts and the analysis of texts.
www.sil.org /computing/routledge/simons/text.html   (726 words)

  
 Corpus Encoding Standard
This document is the first version of the Corpus Encoding Standard (CES), which are a part of the EAGLES Guidelinesdeveloped by the Expert Advisory Group on Language Engineering Standards (EAGLES).
The CES is an application of SGML (ISO 8879:1986, Information Processing--Text and Office Systems--Standard Generalized Markup Language) compliant with the specifications of the TEI Guidelines for Electronic Text Encoding and Interchange of the Text Encoding Initiative.
The CES specifies a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive representation (marking of structural and typographic information) as well as general architecture (so as to be maximally suited for use in a text database).
www.cs.vassar.edu /CES   (283 words)

  
 Text Encoding (codepages)
Characters (such as "a" or "1" or "and") are represented by computers as numbers, and there is more than one way to do this.
A text encoding method is a way to encode the characters into the numbers (bytes) of which a file is comprised.
This is an old encoding which only supports English letters, numbers, and puncuation.
winmerge.org /2.2/manual/textencoding.html   (354 words)

  
 XML Matters: TEI -- the Text Encoding Initiative
In this installment, David looks at Text Encoding Initiative, an XML schema devoted to the markup of literary and linguistic texts.
The Text Encoding Initiative (TEI) is a decade older than XML itself, and older than other common documentation encoding XML schemas like DocBook.
In addition, with the text so marked, you might decide, for example, to underline rather than italicize titles in a later edition.
www-128.ibm.com /developerworks/library/x-matters30.html   (1951 words)

  
 Adobe - Flash TechNote : URL Encoding: Reading special characters from a text file   (Site not responding. Last check: 2007-11-03)
One common technique used to load variables in Macromedia Flash is reading the data from a text file on the server or CD.
URL encoding replaces the alphanumeric character with the hexadecimal combination which represents that character.
More information on URL encoding can be found at www.blooberry.com/indexdot/html/topics/urlencoding.htm.
www.adobe.com /cfusion/knowledgebase/index.cfm?id=tn_14143   (283 words)

  
 TEI Guidelines for Electronic Text Encoding -- Electronic Text Center
Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange (P4)
Made available by the Electronic Text Center at the University of Virginia
The Electronic Text Center Introduction to TEI and Guide to Document Preparation.
etext.virginia.edu /TEI.html   (81 words)

  
 TEI: Yesterday's information tomorrow
The Text Encoding Initiative (TEI) Guidelines are an international and interdisciplinary standard that enables libraries, museums, publishers, and individual scholars to represent a variety of literary and linguistic texts for online research, teaching, and preservation.
Electronic Text Editing Electronic Textual Editing is a volume of essays jointly sponsored by the Modern Language Association and the TEI Consortium, and scheduled for publication in paper form in late 2005 by the MLA.
Web site redesign This is a new look for the TEI web site, with XML pages dynamically
www.tei-c.org   (386 words)

  
 SourceForge.net: Text Encoding Initiative
The TEI is an international and interdisciplinary standard used by libraries, museums, publishers, and academics to represent all kinds of literary and linguistic texts, using an encoding scheme that is maximally expressive and minimally obsolescent.
Topic : Education, Indexing/Search, Other/Nonlisted Topic, Documentation, Text Processing
View list of RSS feeds available for this project
sourceforge.net /projects/tei   (114 words)

Try your search on: Qwika (all wikis)

Factbites
  About us   |   Why use us?   |   Reviews   |   Press   |   Contact us  
Copyright © 2005-2007 www.factbites.com Usage implies agreement with terms.