In general, a collection of electronic text, usually compiled on a principled or systematic basis for the purposes of linguistic and other research. In computational linguistics, a body of linguistic data, either text or speech, intended to support the study of linguistic phenomena. This data may be annotated in some way to enhance its usefulness. Examples of corpora include the Penn TreeBank and the ATIS corpus.
In this context, a body of messages. Usually referring to a training database. (plural is corpora).
A collection of texts that have been put together for a particular purpose. Nowadays, corpora are usually stored in an electronic format, which can be studied with the assistance of a computer. A corpus can be very large (hundreds of millions of words), as is commonly the case with corpora that have been developed for dictionary making, or it can be highly specialized and quite small (tens of thousands of words ) .
Latin = body, plural - corpora.
a collection of language that provides the raw material for making a dictionary. Special software allows compilers to look at all examples of a particular word to find out information such as how common it is, what meanings it has, how it works grammatically, what words it commonly combines with, etc. This information is then shaped into a dictionary entry. If examples are included in the entry, they are most commonly based on sentences found in a corpus (they are usually too long or obscure to lift directly from the corpus). The largest corpus that we use is the British National Corpus, which contains thousands of novels, newspapers, non-fiction books, magazines, conversations, etc. totalling 100 million words. OUP’s ELT dictionaries are largely based on this, supplemented by other, smaller corpora which contain spoken English, American English, business English, newspaper English, learners’ English, etc. Similar software also enables the Internet to be used as a corpus for up-to-the-minute language information.
collection of linguistic data, either written texts or a transcription of recorded speech. Typically, corpora have to be quite large to be of any linguistic use (upwards of 100,000 tokens).
a collection of writings; "he edited the Hemingway corpus"
a body of text used by language researchers and spell checker builder
a collection of language material, made in some principled way (not haphazardly), either on tape or written in hard copy (e
a collection of linguistic data, either compiled as written texts or as a transcription of recorded speech
a collection of pieces of language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research
a collection of video-based interviews with English native speakers
a computer database of language
a database of written and/or spoken language that can be searched to discover how language is actually used
a large collection of samples of a language held on a computer
a large sample of how people have used language
a remarkable thing, not so much because it is a collection of language text, but because of the properties that it acquires if it is well-designed and carefully-constructed
a standard set of texts used for study and comment by linguistic scholars
Large collection of spam and non-spam mail.
"Corpora" in plural. A collection of writings or recorded remarks used for linguistic analysis. Corpora often include extra information, such as a tag for each word, indicating its part of the speech.
The collection of documents and HTML pages indexed by Index Server.
A collection of documents. Typically, information retrieval is done relative to a given corpus. An open-ended corpus is one that is growing or expanding during a sensemaking activity.
A sculpture of the body of Christ by Gian Lorenzo Bernini. Bernini sculpted it from bronze in 1650 and held onto it in his private collection for 25 years.