Factbites
 Where results make sense
About us   |   Why use us?   |   Reviews   |   PR   |   Contact us  

Topic: Brown Corpus


Related Topics

In the News (Mon 6 Jul 09)

  
  ICAME CORPUS COLLECTION
The jury further said in term-end A01 0040 presentments that the City Executive Committee, which had over-all A01 0050 charge of the election, "deserves the praise and thanks of the A01 0060 City of Atlanta" for the manner in which the election was conducted.
Brown Corpus, untagged text format II This version is identical to text format I, but typographical information is reduced and the line division is new.
A01 0030 5 The jury further said in term-end presentments that A01 0040 3 the City Executive Committee, which had over-all charge A01 0050 2 of the election, "deserves the praise and thanks of A01 0050 11 the City of Atlanta" for the manner in which the election A01 0060 11 was conducted.
icame.uib.no /browneks.html   (651 words)

  
 Brown Corpus - Wikipedia, the free encyclopedia
Thus "the" constitutes nearly 7% of the Brown Corpus, and "of" more than another 3; while about half the total vocabulary of about 50,000 words are hapax legomena: words that occur only once in the corpus.
The tagged Brown Corpus used a selection of about 80 parts of speech, as well as special indicators for compound forms, contractions, foreign words and a few other phenomena, and formed the basis for many later corpora such as the Lancaster-Oslo/Bergen Corpus.
Although the Brown Corpus pioneered the field of corpus linguistics, by now typical corpora (such as the British National Corpus) tend to be much larger, on the order of 100 million words.
en.wikipedia.org /wiki/Brown_Corpus   (676 words)

  
 A Crash Course in Corpus Linguistics
Corpus linguistics methods are ideal for research on registers and register differences, because in order to establish similarities and/or differences between registers huge amount of texts are needed.
The Brown corpus is approximately 1,2 million words, containing texts from at least 15 written registers within the Humanities, such as belle lettres, reports, fiction, biography, popular culture, etc. It exists, and can be accessed, as a text file, and can thus be used for lexicographic research.
The TIMIT corpus is a corpus of recorded speech, containing 6,300 sentences, recorded from male and female speakers of eight dialects of American English.
www.ling.unt.edu /corpus.html   (3390 words)

  
 Attorney General - Press Release
On February 9, 1990, Brown was retried and convicted for the murder of Brenda Watson in the Superior Court of Gwinnett County, and on February 10, 1990, Brown was once again sentenced to death.
Brown, represented by counsel, filed a petition for a writ of habeas corpus in the Superior Court of Butts County on May 6, 1993.
Brown filed a petition for a writ of habeas corpus in the United States District Court for the Northern District of Georgia on April 23, 1997.
www.state.ga.us /ago/press/press.cgi?prfile=PR.20031023.01   (1396 words)

  
 The Corpus of Spoken Israeli Hebrew (CoSIH)
A corpus — as we see it — is a preliminary desideratum for much larger projects that cannot otherwise be achieved, be it a grammar of modern Hebrew, a comprehensive dictionary, or any other theoretical or applied inquiry.
Obtaining a representative corpus in demographic terms is a known and commonly-used procedure in sampling populations.
For example, the ten-million word spoken corpus of the BNC includes two equally sized parts: a demographic part, containing transcriptions of spontaneous natural conversations made by members of the public, and a context-governed part, containing transcriptions of recordings made at specific types of meetings and events.
www.tau.ac.il /humanities/semitic/cosih.html   (7704 words)

  
 About the British National Corpus
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written.
The British National Corpus (BNC) Consortium was formed in 1990, and started work in 1991 on the three-year task of producing a hundred-million word corpus of modern British English for use in commercial and academic research.
The Brown Corpus of Standard American English was created at the Brown university by by W. Francis and H. Kucera.
www.natcorp.ox.ac.uk /corpus/index.xml?style=printable   (1975 words)

  
 English or Russian, results of football matches, etc.
A plebiscite is an enactment of corpus the commonalty, such as was made on the motion of one of their own magistrates, as a tribune.
Corpus linguistics specifies corpus design in respect to research interests, provides computational methods of extracting linguistic knowledge, and conceives tools to validate the accuracy of linguistic description.
The composition, annotation, encoding and availability of the corpus are meant to corpus facilitate developments of language technology and studies in bilingual terminology extraction, primarily for the Slovene language..
snappy75.sitesled.com   (3511 words)

  
 UCREL Corpus Holdings
The BNC is a 100,000,000 word corpus of written and spoken British English from the early 1990s.
The Leeds-Lancaster Treebank and Lancaster Parsed Corpus are analyzed subsamples of the LOB corpus.
The ET10-63 corpus is a bilingual parallel corpus of English and French, containing EC offical documents on telecommunications.
www.comp.lancs.ac.uk /computing/research/ucrel/corpora.html   (752 words)

  
 Writings by Ralf Brown
Using the knowledge implicit in the corpus, it generates a bilingual word-for-word dictionary for alignment during translation.
The final results of the corpus pre-processing are a segmented/bracketed aligned bilingual corpus and a statistical dictionary.
These systems were one of our systems from the 1999 TDT evaluation, retuned for the new corpus, which had the third-best cost measure; and a new system that adds clustering and dynamically-generated stopwording, which had the best cost measure among all submissions for the default evaluation condition.
www-2.cs.cmu.edu /~ralf/papers.html   (3807 words)

  
 CorpusQueryProcessor - SDSU Comp-Ling Lab   (Site not responding. Last check: 2007-10-22)
Brown was put together at Brown University in the 60s.
structure-name expands the intervals of a corpus to the boundaries of the structure structure-name.
SC is the name of the subcorpus with the results of the search "helpful"; we are now creating a new subcorpus named SCS that stores the results of the subcorpus SC when the context is expanded to the whole sentence ("s") in which the word is contained.
bulba.sdsu.edu /docwiki/CorpusQueryProcessor   (2171 words)

  
 Text corpus - Wikipedia, the free encyclopedia
In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed).
An example of annotating a corpus is part-of-speech tagging, or POS-tagging, in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of tags.
When the language of the corpus is not a working language of the researchers who use it, interlinear glossing is used to make the annotation bilingual.
en.wikipedia.org /wiki/Text_corpus   (337 words)

  
 Corpus Linguistics
Monitor corpus – attempts to be a representative cross-section of the spoken and/or written language to be studied (e.g.
Sample corpus – does not pretend to be representative of the whole spoken and/or written forms of the language to be investigated.
Corpus linguistics is simply the study of language through corpus-based research, but it differs from traditional linguistics in its insistence on the systematic study of authentic examples of language in use.
www.engl.polyu.edu.hk /corpuslinguist/corpus.htm   (1438 words)

  
 Brown Corpus Manual
Two complete proofreadings of the Corpus have resulted in corrections of two kinds: errors in the preparation of the original tape, which have been silently corrected in recently issued copies, and further typographical errors and anomalies in the underlying text, which have been recorded in the descriptions of individual samples on pages 33-176.
The corpus may further prove to be standard in setting the pattern for the preparation and presentation of further bodies of data in English or in other languages.
Since the purpose of the tagged corpus is to facilitate automatic or semi-automatic syntactic analysis, the rationale of the tagging system is basically syntactic, though some morphological distinctions with little or no syntactic significance have also been recognized.
khnt.hit.uib.no /icame/manuals/brown/INDEX.HTM   (6690 words)

  
 DAVID JUNIOR BROWN'S LEGAL DOCUMENTATION:
Brown was not wearing his distinctive silver ring.
Brown a writ of habeas corpus might serve as the prosecutor's appro-
Brown's petition for a writ of habeas corpus.
www.ccadp.org /davidjuniorbrown-legaldoc.htm   (3312 words)

  
 Creating a Parallel Corpus from the ``Book of 2000 Tongues''
The corpus provides a representative sample of language styles in the source texts, including narrative, poetry, and correspondence.
However, this is on the order of some monolingual corpora widely used for corpus-based research, such as the Brown Corpus of American English [Kucera and Francis1967], and the breadth across multiple languages offers an opportunity for research not generally available with the larger corpora in use today.
Since the corpus is being created primarily for use in corpus-based computational linguistics research, the restrictions imposed by the CES, inn comparison to the full generality of the TEI, are suited to the task (CES Sec.
www.stg.brown.edu /conferences/tei10/tei10.papers/resnik.html   (4278 words)

  
 Flobman: Basic Information About the Corpus
The ultimate aim was to compile parallel one-million-word corpora of the early 1990s that matched the original LOB and Brown corpora as closely as possible, and that would thus provide linguists with an empirical basis to study language change in progress.
The press sections of Brown and LOB are therefore not representative samples in a strict statistical sense.
In order to ensure that the corpus text would be as ‘readable’ as possible, the use of mark-up symbols was kept to a minimum.
icame.uib.no /flob/flobinfo.htm   (1321 words)

  
 About the GSL
To determine the frequency of a words, we used the frequency numbers from the Brown Corpus (Frances and Kucera, 1982).
In the lemmatized Brown Corpus, while parts of speech are differentiated, non-semantically related homographs of the same part of speech are given a single, composite frequency number.
The frequency number represents the number of occurrences of that word and its related forms in the 1,000,000 words of the Brown corpus.
jbauman.com /aboutgsl.html   (1160 words)

  
 4. Categorizing and Tagging Words   (Site not responding. Last check: 2007-10-22)
One of the notable features of the Brown corpus is that all the words have been tagged for their part-of-speech.
Several large corpora, such as the Brown Corpus and portions of the Wall Street Journal, have already been tagged, and we will be able to process this tagged data.
More details about the Brown corpus tag set can be found in the Appendix.
nltk.sourceforge.net /lite/doc/en/tag.html   (3916 words)

  
 ICAME CORPUS MANUALS   (Site not responding. Last check: 2007-10-22)
The Lancaster/IBM SEC Corpus, The Machine-Readable Corpus of Spoken English
The Wellington Corpus of Spoken New Zealand English (WSC)
The Helsinki Corpus of Older Scots, bibliography biblio.htm or biblio.doc
khnt.hit.uib.no /icame/manuals/index.htm   (144 words)

  
 Linguistics 290A/1: Corpora on corpus.linguistics.berkeley.edu
If you have found a freely available corpus that you would like to see installed on corpus.linguistics.berkeley.edu, again please contact Emily Bender with information on acquiring it.
Sentences from the Brown corpus and the Wall Street Journal, with 192,800 occurrences of the 191 most frequent ambiguous words tagged with WordNet senses
Portion of the Brown corpus, parsed according to the SUSANNE scheme
corpus.linguistics.berkeley.edu /corpora_on_corpus.html   (412 words)

  
 Linguistics 290A/1: Labs/answer key
Use less on some of the corpus files as well as on README files to see how these corpora are structured.
Kennedy reports that the Brown corpus is actually 1,014,312 words long.
Sampson finds that the mean length of an NP in his corpus is 2.32 immediate constituents (ICs), with a standard deviation of 0.94 ICs.
corpus.linguistics.berkeley.edu /answers/answers.html   (5837 words)

  
 CORPUS LINGUISTICS   (Site not responding. Last check: 2007-10-22)
I compiled a 1-million-word corpus of written German during my last summer in Hamburg called the Hamburg corpus.
It mainly uses the same genres as the Brown, LOB and SUC corpora and the distribution also is approximately the same, so if you're doing some comparative corpus work, let me know and I'll let you have a look at it.
The whole corpus consists of 664 text files (8.25 MB).
www.ruf.rice.edu /~hilpert/corpus.htm   (159 words)

  
 School of Linguistics and Applied Language Studies | Victoria University of Wellington
The WWC has the same basic categories as the Brown Corpus of written American English (1961) and the Lancaster-Oslo-Bergen corpus (LOB) of written British English (1961).
The corpus consists of 2,000 word extracts (where possible) and comprises different proportions of formal, semi-formal and informal speech.
Seventy-five percent of the corpus is informal dialogue.
www.vuw.ac.nz /lals/corpora/index.aspx   (524 words)

  
 Geoffrey Sampson: SUSANNE Scheme
The SUSANNE Corpus is freely available without formalities for use by researchers anywhere (and has been heavily used since the first release was published in 1992).
The SUSANNE Corpus contains written English only; but a later project, the CHRISTINE Project, has produced a counterpart of the SUSANNE Corpus based on samples of the spoken language, drawn from spontaneous speech by speakers chosen to represent a cross-section of the present-day British population.
The SUSANNE Corpus was produced as an adjunct to the development of detailed analytic standards; consequently it could only be as big as was compatible with individual attention (often, attention by several individuals) to almost every difficult analytic decision posed by its language.
www.grsampson.net /RSue.html   (1935 words)

  
 LINGUIST List 2.821: Brown and LOB Corpora
This concerns the query re the Brown and LOB corpora: The Brown corpus (American English) is available to non-profit organizations (such as universities), essentially in two formats: text only (so called "untagged" version) on tape or diskettes from our friends at the Norwegian Centre for Humanistic Research, P.O. Box 54, University of Bergen, Bergen, Norway.
The "tagged" version of the corpus (which includes an annotation of every word by an expanded grammatical class-82 classes in all) is available from Text Research, 196 Bowen Street, Providence, RI 02906.
The tagged LOB Corpus, along with several other widely used corpora can be obtained by writing to ICAME (International Computer Archive of Modern English) at this address: Knut Hofland, ICAME Norwegian Computing Centre for the Humanities Harald Harfagresgt.
www.ling.ed.ac.uk /linguist/issues/2/2-821.html   (604 words)

  
 The state of the art in corpus linguistics
as in the post-Bloomfieldian paradigm, be induced from the corpus
Brown Corpus, the LOB Corpus and the Spoken English Corpus
For instance, the Brown Corpus is often assumed to be representative of
angli02.kgw.tu-berlin.de /corpus/art.htm   (6176 words)

  
 Word Frequency Lists   (Site not responding. Last check: 2007-10-22)
These are the Most Frequent Word Lists built from the Brown Corpus with Concapp for Windows.
The lists are based solely on word counts using the Uniique Words Profiler which lists the instances for each word (the Brown Corpus comprises 1,015,945 words with 47,218 unique words).
They are more sophisticated than the lists created with the Brown corpus, as they contain not only the actual high frequency words themselves but also derivative words which may in fact not be used so frequently.
www.edict.com.hk /textanalyser/wordlists.htm   (411 words)

  
 [No title]   (Site not responding. Last check: 2007-10-22)
/* Use the following to standardize names for data sources: Convention When a corpus is also a data source (like Brown), I've adde d 'Data' to the string.
%% This is the data source for, say Brown.
not sure if format is yyyy/mm/dd or yyyy/dd/mm, shortened to 2002 to avoid wrong guess and to provide uniform update format*/ /*assumption made that 'membership year(s)' field on ldc webpage includes years in which modifications/re-releases of a corpus are made*/ subcorpus_datemodified('Brown':'Whole Corpus', '1961').
www-rohan.sdsu.edu /~gawron/ling682/new_corpus_db.pl   (396 words)

  
 The Brown Corpus Tag-set   (Site not responding. Last check: 2007-10-22)
The examples are taken directly from the Brown corpus.
This is done on the e-mail server by appending "notoken" to the start of the subject line of the e-mail message.
Further information on the Brown corpus can be found at the International Computer Archive of Modern English (ICAME) corpus collection.
www.comp.leeds.ac.uk /amalgam/tagsets/brown.html   (2196 words)

  
 Subject: [Corpora-List] Brown Corpus
I don't think that it is a good idea to aim to take samples of the same
Such an approach compromises the integrity of the texts in the corpus.
(c) in order to interpret something you find in a corpus, it is often
www.uib.no /mailman/public/corpora/2005-June/001264.html   (915 words)

Try your search on: Qwika (all wikis)

Factbites
  About us   |   Why use us?   |   Reviews   |   Press   |   Contact us  
Copyright © 2005-2007 www.factbites.com Usage implies agreement with terms.