{"id":7,"date":"2024-04-04T07:24:52","date_gmt":"2024-04-04T04:24:52","guid":{"rendered":"https:\/\/sisu.ut.ee\/digitalenglishstudies\/key-concepts-corpus-analysis\/"},"modified":"2024-04-04T07:25:14","modified_gmt":"2024-04-04T04:25:14","slug":"key-concepts-corpus-analysis","status":"publish","type":"page","link":"https:\/\/sisu.ut.ee\/digitalenglishstudies\/key-concepts-corpus-analysis\/","title":{"rendered":"Corpus analysis: references &amp; key concpets"},"content":{"rendered":"<p>\n\tList of key concepts for understanding the basics of doing corpus linguistics in English studies.\u00a0\n<\/p>\n<p>\n\t<strong>Corpus linguistics<\/strong>\n<\/p>\n<table class=\"table table-hover\" align=\"left\" border=\"0\" cellpadding=\"5\" cellspacing=\"5\">\n<thead>\n<tr>\n<th scope=\"col\">\n\t\t\t\tConcept\n\t\t\t<\/th>\n<th scope=\"col\">\n\t\t\t\tExplanation\n\t\t\t<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\n\t\t\t\tabsolute frequency\n\t\t\t<\/td>\n<td>\n\t\t\t\tthe number of times a particular piece of data or aparticular value appears during a study; a simplecount of the number of times a value is observed\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tcollocation\n\t\t\t<\/td>\n<td>\n\t\t\t\ta co-occurrence relationship between two words;words are said to collocate with one another ifone is more likely to occur in the presence of theother than elsewhere\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tconcordance\n\t\t\t<\/td>\n<td>\n\t\t\t\ta display of every instance of aspecified word orother search term in a corpus, together wtih agiven amount of preceding and following contextfor each result or \u201chit\u201d\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tcorpus\n\t\t\t<\/td>\n<td>\n\t\t\t\ta collection of texts stored on a computer\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tdata-driven learning\n\t\t\t<\/td>\n<td>\n\t\t\t\ta way of using corpora in language teaching thatinvolves the learners being given direct access tothe corpus and a tool for searching it, theintention being that their exploration of thecorpus helps their learning of the langauge\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tfrequency distribution\n\t\t\t<\/td>\n<td>\n\t\t\t\tinformation about frequency of use of a termacross texts, speakers, etc.\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tencoding\n\t\t\t<\/td>\n<td>\n\t\t\t\tthe process of representing a text as a sequenceof characters in computer memory (e.g.UNICODE, UTF-8, ANSI)\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tfrequency list\n\t\t\t<\/td>\n<td>\n\t\t\t\ta list of all the items of a given type in a corpus(e.g. all words, all POS-tags) together with a countof how often each one occurs\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tKWIC\n\t\t\t<\/td>\n<td>\n\t\t\t\tkey word in context; a format for displaying aconcordance where the search result is lined up ina central column, and the columns on either sidecontain a short chunk of the context precedingand following each result in the corpus; thestandard abbreviation is KWIC; \u201ckey word\u201d heremeans the search term\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tlemma\n\t\t\t<\/td>\n<td>\n\t\t\t\ta group of wordforms that are related by beinginflectional forms of the same base word; e.g. inEnglish destroy, destroys, destroying, destroyedare all part of the verb lemma destroy; the notionof a headword (as found in a dictionary) isgenerally equivalent to that of lemma\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tn-gram\n\t\t\t<\/td>\n<td>\n\t\t\t\ta sequence of n elements (usually words) thatoccur directly one after another in a corpus,where n is two or more; studying n-grams (alsocalled clusters or lexical bundles) is one way tooperationalise the analysis of collocation\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tnormalized frequency\n\t\t\t<\/td>\n<td>\n\t\t\t\tsame as relative frequency; a frequencyexpressed relative to some other value, as aproportion of the whole \u2013 e.g. frequency of aword relative to the total number of words in thecorpus; normalized frequencies can be comparedeven if they arise from datasets of different sizes\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\traw frequency\n\t\t\t<\/td>\n<td>\n\t\t\t\tthe number of times a particular piece of data or aparticular value appears during a study; a simplecount of the number of times a value is observed\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\tregister\n\t\t\t<\/td>\n<td>\n\t\t\t\ta way of classifying texts according to non-linguistic criteria, such as the purpose for which atext was produced, the intended audience, thelevel of formality, whether its purpose is narrationor description and so on\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\trelative frequency\n\t\t\t<\/td>\n<td>\n\t\t\t\tsame as relative frequency; a frequencyexpressed relative to some other value, as aproportion of the whole \u2013 e.g. frequency of aword relative to the total number of words in thecorpus; normalized frequencies can be comparedeven if they arise from datasets of different sizes\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\ttoken\n\t\t\t<\/td>\n<td>\n\t\t\t\tany single, particular instance of an individualword in a text or corpus\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\ttype\n\t\t\t<\/td>\n<td>\n\t\t\t\ta single particular wordform; any difference ofform (e.g. spelling) makes a word into a differenttype; one type may occur many times in a text orcorpus\n\t\t\t<\/td>\n<\/tr>\n<tr>\n<td>\n\t\t\t\ttype-token ratio\n\t\t\t<\/td>\n<td>\n\t\t\t\ta measure of vocabulary diversity in a corpus,equal to the number of types divided by the totalnumber of tokens; a closer the ratio is to 1 (or100%), the more varied the vocabulary\n\t\t\t<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0\n<\/p>\n<p>\n\t\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"<p>List of key concepts for understanding the basics of doing corpus linguistics in English studies.\u00a0 Corpus linguistics Concept Explanation absolute frequency the number of times a particular piece of data or aparticular value appears during a study; a simplecount of &#8230;<\/p>\n","protected":false},"author":132,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"class_list":["post-7","page","type-page","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/sisu.ut.ee\/digitalenglishstudies\/wp-json\/wp\/v2\/pages\/7","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sisu.ut.ee\/digitalenglishstudies\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/sisu.ut.ee\/digitalenglishstudies\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/sisu.ut.ee\/digitalenglishstudies\/wp-json\/wp\/v2\/users\/132"}],"replies":[{"embeddable":true,"href":"https:\/\/sisu.ut.ee\/digitalenglishstudies\/wp-json\/wp\/v2\/comments?post=7"}],"version-history":[{"count":1,"href":"https:\/\/sisu.ut.ee\/digitalenglishstudies\/wp-json\/wp\/v2\/pages\/7\/revisions"}],"predecessor-version":[{"id":18,"href":"https:\/\/sisu.ut.ee\/digitalenglishstudies\/wp-json\/wp\/v2\/pages\/7\/revisions\/18"}],"wp:attachment":[{"href":"https:\/\/sisu.ut.ee\/digitalenglishstudies\/wp-json\/wp\/v2\/media?parent=7"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}