Topic: Corpus linguistics

  Corpus Linguistics
A collection of linguistic data, either written texts or a transcription of recorded speech, which can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language.
Corpus linguistics is simply the study of language through corpus-based research, but it differs from traditional linguistics in its insistence on the systematic study of authentic examples of language in use.
Linguistics: to study linguistic competence or performance as revealed in naturally occurring data.
www.engl.polyu.edu.hk /corpuslinguist/corpus.htm

  UCREL home page, Lancaster UK.
*** ELRA-W0039 The Lancaster Corpus of Mandarin Chinese (LCMC) ***
The conference, Corpus Linguistics 2005, is run jointly by the universities of Birmingham and Lancaster, and is the third biennial conference in the series on Corpus Linguistics.
New project: the Leverhulme Corpus Project plans to build a corpus which matches as closely as possible the LOB and FLOB corpora of written British English, except that the year of data collection is 1931, or near to that date (+/- 3 years).
www.comp.lancs.ac.uk /ucrel

  Corpus Linguistics - Introduction
The linguistic corpus as described by Kennedy (1998: 1) is “a body of written text or transcribed speech which can serve as a basis for linguistic analysis and description.” The object of corpus linguistics is therefore the design, description and analysis of linguistic corpora (as well as the design and development of relevant software applications).
Corpus linguistics is thus not a theoretical framework or a specific school of linguistics (as for example generative grammar) but a methodology applied by corpus linguists for the description of language, based on corpus data.
In corpus linguistics the corpus is the source of linguistic data (as opposed to the intuition of the linguist or questionnaire data) and serves as a bank of authentic text samples that are to be described and analyzed.
www.uni-giessen.de /anglistik/ling/ALC/cl.html

  Kids.Net.Au - Encyclopedia > Corpus linguistics
Corpus Linguistics is the study of language as expressed in samples (corpora) or "real world" text.
In some areas there is an overlap with computational linguistics, as the latter moves towards language processing applications.
The COBUILD dictionaries, designed for users learning English as a foreign language, are based on corpus linguistics; definitions are based on how words are used rather than on historical definitions of their meaning.
www.kids.net.au /encyclopedia-wiki/co/Corpus_linguistics

 A Crash Course in Corpus Linguistics
Corpus linguistics methods are ideal for research on registers and register differences, because in order to establish similarities and/or differences between registers huge amount of texts are needed.
Without corpus linguistics, research on language acquisition has been limited to the study of the language of very young children, the study of only one or two learners, the study of only a few linguistic features, and has been restricted to only one register.
Corpus linguistics allows for the possibility of studying certain linguistic features across a large amount of speakers, and thus it provides a basis for generalizations across language learners.
www.ling.unt.edu /corpus.html

 Corpus Linguistics, Books about Collocation, Corpus Linguistics Books
Concluding chapters discuss the implications of corpus analysis for linguistic theory, especially lexico-grammar and theories of competence and performance.
Corpus Linguistics has quickly established itself as the leading undergraduate course book in the subject.
The author surveys the emergence of corpora for use in linguistic research, and focuses in particular on the exponential growth of computer corpora in the electronic age.
www.englishjobmaze.com /bookstore/b-colcorpus.htm

 English Module 3.4
Corpus collection continued and diversified after the diary studies period: large sample studies covered the period roughly from 1927 to 1957 - analysis was gathered from a large number of children with the express aim of establishing norms of development.
Interest in the computer for the corpus linguist comes from the ability of the computer to carry out various processes, which when required of humans, ensured that they could only be described as pseudo-techniques The type of analysis that Kading waited years for can now be achieved in a few moments on a desktop computer.
Corpus linguistics, proper, should be seen as a subset of the activity within an empirical approach to linguistics.
www.ict4lt.org /en/en_mod3-4.htm

 Corpus Linguistics
Corpus linguistics has - in the past 10 years - had a huge impact on what aspects of language are taught in the classroom.
Corpus linguists from all over the world have contributed to this volume.
A unique book which is intended to provide linguists, students of linguistics and modern languages, and ELT professionals with a highly accessible and comprehensive introduction to the rapidly-expanding field of corpus-based research into learner language.
englishmaze.com /bookstore/b-fsbt-corpling.htm

 LG3204 Corpus Linguistics
Corpus linguistics isn’t an area of English Language study like grammar or phonetics; it’s the name given to a methodological approach which involves looking at language en masse to identify recurring patterns and trends.
A corpus is simply a large collection of data, written or spoken, current or historic.
A brief historical overview of corpus linguistics will be supplied and this methodological approach will be contextualised in relation to other trends in linguistics, including Chomsky’s theory of Transformational Grammar.
www.uclan.ac.uk /facs/class/humanities/modules/lg3204.htm

 IE meets Corpus Linguistics   (Site not responding. Last check: )
Corpus linguistics relies currently on at most shallow syntactic analysis to carry out automatic annotation of corpora, although there is growing interest in attempting to automate annotation at higher linguistic levels.
Corpus linguistics has developed a battery of sophisticated statistical techniques that could contribute to IE tasks, based on e.g.
Corpus linguists have already gone down the standardisation road a long way, thus have much to offer the IE community in terms of experience.
www.lrec-conf.org /lrec2000/www.ccl.umist.ac.uk/events/iemcorp.html

 Corpus linguistics   (Site not responding. Last check: )
Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text.The approach runs counter to Noam Chomsky 's view that real language isriddled with performance-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlledlaboratory setting.
Corpus Linguistics does away with Chomsky's competence/performance split, viewing that we can onlyever reliably analyse language if the researcher does not interfere.
The British publisher Collins' COBUILD dictionaries, designed for users learning English as a foreign language, were alsocompiled using corpus linguistics.
www.therfcc.org /corpus-linguistics-4365.html

 Corpus Linguistics
It is true that no corpus will ever cover every possible utterance in a given language, so corpora are not sufficient for a complete vision of the human language capacity, but a good 100 million word corpus, will have a lot utterances like any one you could come up with.
Corpus linguistics using concordance software and scripting languages is fast, efficient, and in principle boundless.
The texts in a corpus must be collected in a systematic way, under controlled conditions, and in such a way that the corpus reflects the true distribution of the language/dialect/variety under study.
www2.hawaii.edu /~bergen/corpus/lec1.htm

 Linguist List - Book Information
Corpus Linguistics seeks to provide a comprehensive sampling of real-life usage in a given language, and to use these empirical data to test language hypotheses.
Because corpus linguistics has grown fast from small beginnings, newcomers to the field often find it hard to get their bearings.
This volume reprints forty-two articles on corpus linguistics by an international selection of authors, which comprehensively illustrate the directions in which the subject is developing.
linguistlist.org /pubs/books/get-book.cfm?BookID=18914

 Corpus Linguistics - What is Corpus Linguistics ?
Corpus Linguistics is the study of language as revealed in a corpus or large, scientifically selection collection of texts.
There are a small number of Corpus Linguistics MAs but most MA Applied Linguistics and, increasingly, MA linguistics will cover something about the subject.
Corpus Linguistics, which is based on empirical evidence in how language is actually used, is extremely important in lexicography and dictionary design, grammar and, increasingly, English language teaching and second language acquisition.
www.elgazette.com /corpus_lingusitics.cfm

 Corpora and Corpus Linguistics
Chapter 1, Corpus and Text: Basic Principles John Sinclair (Tuscan Word Centre).
Chapter 4 Character Encoding in Corpus Construction Anthony McEnery and Richard Xiao (Lancaster University).
This site was originally a Corpus Linguistics site at Rice University and consisted of a long list of links.
www.athel.com /corpus.html

 MPhil(B) in Corpus Linguistics   (Site not responding. Last check: )
A professional or academic working with corpora needs to have an understanding of the theories and assumptions that lie behind corpus building and corpus analysis, the implications of corpora for theories of language, and the range of applications of corpora.
The English Department of the University of Birmingham is one of the leading centres for the study of Corpus Linguistics in Britain.
No specific expertise in computers or linguistics is required.
www.english.bham.ac.uk /PG/CorpusLinguistics.htm

 Krieger - Corpus Linguistics: What It Is and How It Can Be Applied to Teaching (I-TESL-J)
The main focus of corpus linguistics is to discover patterns of authentic language use through analysis of actual usage.
The aim of a corpus based analysis is not to generate theories of what is possible in the language, such as Chomsky's phrase structure grammar which can generate an infinite number of sentences but which does not account for the probable choices that speakers actually make.
Corpus linguistics’ only concern is the usage patterns of the empirical data and what that reveals to us about language behavior.
iteslj.org /Articles/Krieger-Corpus.html

 UCSB Linguistics Research: Corpus Linguistics
Corpus research at UCSB is unique in the extent to which it is both theory-driven and pervasive across the subfields of the department.
The corpus may be large or small, written or spoken, automatically assembled from pre-existing online texts, or meticulously transcribed by the researcher in the hands-on process of documenting a previously unwritten language.
In line with theoretical goals of seeking functional explanations for language, more and more linguists are demanding that explanatory generalizations about language be built on firm empirical foundations, and have come to see corpus data and research methods as critical tools for serious research.
www.linguistics.ucsb.edu /research/corpus.html

 corpus linguistics   (Site not responding. Last check: )
Introduction to corpus linguistics at the University of Essex.
Corpus linguistics: module 3.4 of ICT4LT, also written by McEnery and Wilson.
There is a large French/English bilingual corpus at the Laboratoire de Recherche Appliquée en Linguistique Informatique at the University of Montreal.
www.llsh.univ-savoie.fr /anglais/corpus.htm

 iLoveLanguages - Your Guide to Languages on the Web
Extensive collection of links to data files (and commercial sources of data) of use in corpus-based linguistics study (generally large text documents in one or more languages, including parallel corpora).
Linguistic research being performed by the Stewardship Project.
A list of "linguistic zones" in the world, and a few essays on language and linguistics.
www.ilovelanguages.com /index.php?category=Languages|Linguistics

 Linguistics in SIL
Linguistics in SIL focuses on researching undocumented minority languages, training field linguists, and providing resources to assist in linguistic data collection and analysis.
The linguistics courses offered are both theoretical and applied, but with a focus on applied.
SIL produces resources to help fieldworkers and researchers carry out their linguistic analyses; these include textbooks, reference material, software and fonts.
www.sil.org /linguistics

 John Benjamins: Details of International Journal of Corpus Linguistics
The International Journal of Corpus Linguistics (IJCL) seeks to publish research that views language as a social phenomenon that can be investigated empirically on the basis of authentic spoken texts.
Corpus linguistics specifies corpus design in respect to research interests, provides computational methods of extracting linguistic knowledge, and conceives tools to validate the accuracy of linguistic description.
It is the linguistic knowledge extracted from corpora that determines the performance of any NLP application.
www.benjamins.com /jbp/journals/Ijcl_info.html

 Corpus linguistics Details, Meaning Corpus linguistics Article and Explanation Guide
The field was established in 1967 when Henry Kucera and Nelson Francis published their classic work Computational Analysis of Present-Day American English, on the basis of the Brown Corpus, a carefully compiled selection of current American English, totalling about a million words drawn from a wide variety of sources.
Shortly thereafter Boston publisher Houghton-Mifflin approached Kucera to supply a million word, three-line citation base for its new American Heritage Dictionary, the first dictionary to be compiled using corpus linguistics.
The British publisher Collins' COBUILD dictionaries, designed for users learning English as a foreign language, were also compiled using corpus linguistics.
www.e-paranoids.com /c/co/corpus_linguistics.html

 English Corpus Linguistics - Cambridge University Press
It begins with a discussion of the role that corpus linguistics plays in linguistic theory, demonstrating that corpora have proven to be very useful resources for linguists who believe that their theories and descriptions of English should be based on real rather than contrived data.
Charles F. Meyer goes on to describe how to plan the creation of a corpus, how to collect and computerize data for inclusion in a corpus, how to annotate the data that are collected, and how to conduct a corpus analysis of a completed corpus.
The book concludes with an overview of the future challenges that corpus linguists face to make both the creation and analysis of corpora much easier undertakings than they currently are.
www.cambridge.org /uk/catalogue/catalogue.asp?isbn=0521808790

 LINGUISTICS - THE LINKS   (Site not responding. Last check: )
Arguably (to some) the most important contributor to the field of linguistics in this century, Noam Chomsky (Professor at MIT) is equally renouned for his political activism.
For the historical (or was it hysterical?) linguist.
Corpus Linguistics References to corpora in many languages, and info about many aspects of corpus linguistics.
www.newpaltz.edu /linguistics/links.htm

 The Corpus of Spoken Israeli Hebrew (CoSIH)
A corpus — as we see it — is a preliminary desideratum for much larger projects that cannot otherwise be achieved, be it a grammar of modern Hebrew, a comprehensive dictionary, or any other theoretical or applied inquiry.
A spoken corpus of five million words seems to be of a size just large enough to convey both the overall structure and specific features of most linguistic varieties represented within it.
For example, the ten-million word spoken corpus of the BNC includes two equally sized parts: a demographic part, containing transcriptions of spontaneous natural conversations made by members of the public, and a context-governed part, containing transcriptions of recordings made at specific types of meetings and events.
www.tau.ac.il /humanities/semitic/cosih.html

 Linguistics at Rice University
The Rice Linguistics Department is the home of an active community of scholars with a wide range of interests.
Linguistics faculty have been present in various departments at Rice since the early 1960's, and the B.A. program in Linguistics was established in 1968.
The Department of Linguistics is located on the second floor of Herring Hall, on the east side of the building.
www.ruf.rice.edu /~ling

 Blackwell Online - Perspectives in Lexicology and Corpus Linguistics
This textbook is a readable introduction to lexicology and corpus linguistics.
This section expands the study of language and shows how corpus linguistics can advance our study of words and meaning, the benefits of studying the corpora, and how to meaning can best be conceptualised.
In so doing it explains the roots of corpus linguistics in easy to understand terms, and shows how lexicology can be advanced into other modes of linguistics.
bookshop.blackwell.co.uk /jsp/display_product_info.jsp?isbn=0826448615

