Developing Linguistic Corpora

Developing Linguistic Corpora
Author :
Publisher : Oxbow Books Limited
Total Pages : 100
Release :
ISBN-10 : UVA:X004991162
ISBN-13 :
Rating : 4/5 (62 Downloads)

Synopsis Developing Linguistic Corpora by : Martin Wynne

A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.

Developing Linguistic Corpora

Developing Linguistic Corpora
Author :
Publisher :
Total Pages :
Release :
ISBN-10 : 1842170376
ISBN-13 : 9781842170373
Rating : 4/5 (76 Downloads)

Synopsis Developing Linguistic Corpora by : Oxbow Books, Limited

The Arts and Humanities Data Service (AHDS), funded by the UK government, has produced this series of Guides to Good Practice to provide the arts and humanities research and teaching communities with practical instruction in applying recognized standards and good practice to the creation, preservation and use of digital resources. Some of the Guides focus on methods and applications relevant to arts and humanities disciplines such as archaeology, history, linguistics, text studies and performing arts. Others address those areas which cross-disciplinary boundaries. All Guides identify and explore key issues and provide comprehensive pointers for those who need more specific information. As such they are essential reference material for anyone in interested in computer-assisted research and teaching in the arts and humanities.

Corpora in Language Acquisition Research

Corpora in Language Acquisition Research
Author :
Publisher : John Benjamins Publishing
Total Pages : 280
Release :
ISBN-10 : 9027234760
ISBN-13 : 9789027234766
Rating : 4/5 (60 Downloads)

Synopsis Corpora in Language Acquisition Research by : Heike Behrens

Corpus research forms the backbone of research on children's language development. Leading researchers in the field present a survey on the history of data collection, different types of data, and the treatment of methodological problems. Morphologically and syntactically parsed corpora allow for the concise explorations of formal phenomena, the quick retrieval of errors, and reliability checks. New probabilistic and connectionist computations investigate how children integrate the multiple sources of information available in the input, and new statistical methods compute rates of acquisition as well as error rates dependent on sample size. Sample analyses show how multi-modal corpora are used to investigate the interaction of discourse and linguistic structure, how cross-linguistic generalizations for acquisition can be formulated and tested, and how individual variation can be explored. Finally, ways in which corpus research interacts with computational linguistics and experimental research are presented.

The Development of Corpus Linguistics to Its Present-day Concept

The Development of Corpus Linguistics to Its Present-day Concept
Author :
Publisher : GRIN Verlag
Total Pages : 29
Release :
ISBN-10 : 9783638762281
ISBN-13 : 3638762289
Rating : 4/5 (81 Downloads)

Synopsis The Development of Corpus Linguistics to Its Present-day Concept by : Bernadette Wonner

Seminar paper from the year 2005 in the subject English Language and Literature Studies - Linguistics, grade: 1, LMU Munich (Institut für Englische Philologie), course: Corpus linguistics and teaching, 10 entries in the bibliography, language: English, abstract: [...] This paper will provide an overview of the different stages that CL has gone through. Early Corpus Linguistics will be presented first, a term that describes all corpus-based work up to the end of the 1950s. That is the time when Noam Chomsky makes the early researchers reflect on their work under certain aspects which neutralize somehow the work which was done up to that point. As an effect corpus research faces a certain discontinuity. Nevertheless, corpus-based work does not totally cease and the improvements in computer technology provide completely new possibilities in corpus research. Over the decades a considerable amount of machine-readable corpora is created for more and more different purposes and they initiate all variations of analysis. After the presenation of the chronological development of CL, the last but one chapter of the paper will finally deal with the concept of modern corpus linguistics and will give the definition of a corpus, which is not yet an definite thing to do. There is still a lot of work going on to improve the corpus linguistic methodology. The last chapter will give an overview of future prospects.

Language Corpora Annotation and Processing

Language Corpora Annotation and Processing
Author :
Publisher : Springer Nature
Total Pages :
Release :
ISBN-10 : 9789811629600
ISBN-13 : 9811629609
Rating : 4/5 (00 Downloads)

Synopsis Language Corpora Annotation and Processing by : Niladri Sekhar Dash

This book addresses the research, analysis, and description of the methods and processes that are used in the annotation and processing of language corpora in advanced, semi-advanced, and non-advanced languages. It provides the background information and empirical data needed to understand the nature and depth of problems related to corpus annotation and text processing and shows readers how the linguistic elements found in texts are analyzed and applied to develop language technology systems and devices. As such, it offers valuable insights for researchers, educators, and students of linguistics and language technology.

History, Features, and Typology of Language Corpora

History, Features, and Typology of Language Corpora
Author :
Publisher : Springer
Total Pages : 311
Release :
ISBN-10 : 9789811074585
ISBN-13 : 9811074585
Rating : 4/5 (85 Downloads)

Synopsis History, Features, and Typology of Language Corpora by : Niladri Sekhar Dash

This book discusses key issues of corpus linguistics like the definition of the corpus, primary features of a corpus, and utilization and limitations of corpora. It presents a unique classification scheme of language corpora to show how they can be studied from the perspective of genre, nature, text type, purpose, and application. A reference to parallel translation corpus is mandatory in the discussion of corpus generation, which the authors thoroughly address here, with a focus on Indian language corpora and English. Web-text corpus, a new development in corpus linguistics, is also discussed with elaborate reference to Indian web text corpora. The book also presents a short history of corpus generation and provides scenarios before and after the advent of computer-generated digital corpora. This book has several important features: it discusses many technical issues of the field in a lucid manner; contains extensive new diagrams and charts for easy comprehension; and presents discussions in simplified English to cater to the needs of non-native English readers. This is an important resource authored by academics who have many years of experience teaching and researching corpus linguistics. Its focus on Indian languages and on English corpora makes it applicable to students of graduate and postgraduate courses in applied linguistics, computational linguistics and language processing in South Asia and across countries where English is spoken as a first or second language.

Advances in Corpus Linguistics

Advances in Corpus Linguistics
Author :
Publisher : Rodopi
Total Pages : 430
Release :
ISBN-10 : 9042017414
ISBN-13 : 9789042017412
Rating : 4/5 (14 Downloads)

Synopsis Advances in Corpus Linguistics by : Karin Aijmer

This book provides an up-to-date survey of current issues and approaches in corpus linguistics in the form of twenty-two recent research articles. The articles cover a wide range of topics illustrating the diversity of research that is characteristic of corpus linguistics today. Central themes are the relationship between theory, intuition and corpus data and the role of corpora in linguistic research. The majority of the articles are empirical studies of specific aspects of English, ranging from lexis and grammar to discourse and pragmatics. Other areas explored are language variation, language change and development, language learning, cross-linguistic comparisons of English and other languages, and the development of linguistic software tools. The contributors to the volume include some of the leading figures in the field such as M.A.K. Halliday, John Sinclair, Geoffrey Leech and Michael Hoey. The theoretical and methodological issues addressed in the volume demonstrate clearly the steady advance of an expanding discipline inspired by an empirical, usage-based approach to the study of language. The volume is essential reading for researchers and students interested in the use of computer corpora in linguistic research.

Corpus Linguistics and Statistics with R

Corpus Linguistics and Statistics with R
Author :
Publisher : Springer
Total Pages : 359
Release :
ISBN-10 : 9783319645728
ISBN-13 : 3319645722
Rating : 4/5 (28 Downloads)

Synopsis Corpus Linguistics and Statistics with R by : Guillaume Desagulier

This textbook examines empirical linguistics from a theoretical linguist’s perspective. It provides both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-on, step-by-step instructions to implement the techniques in the field. The statistical methodology and R-based coding from this book teach readers the basic and then more advanced skills to work with large data sets in their linguistics research and studies. Massive data sets are now more than ever the basis for work that ranges from usage-based linguistics to the far reaches of applied linguistics. This book presents much of the methodology in a corpus-based approach. However, the corpus-based methods in this book are also essential components of recent developments in sociolinguistics, historical linguistics, computational linguistics, and psycholinguistics. Material from the book will also be appealing to researchers in digital humanities and the many non-linguistic fields that use textual data analysis and text-based sensorimetrics. Chapters cover topics including corpus processing, frequencing data, and clustering methods. Case studies illustrate each chapter with accompanying data sets, R code, and exercises for use by readers. This book may be used in advanced undergraduate courses, graduate courses, and self-study.

An Introduction to Corpus Linguistics

An Introduction to Corpus Linguistics
Author :
Publisher : Routledge
Total Pages : 334
Release :
ISBN-10 : 9781317892571
ISBN-13 : 1317892577
Rating : 4/5 (71 Downloads)

Synopsis An Introduction to Corpus Linguistics by : Graeme Kennedy

The use of large, computerized bodies of text for linguistic analysis and description has emerged in recent years as one of the most significant and rapidly-developing fields of activity in the study of language. This book provides a comprehensive introduction and guide to Corpus Linguistics. All aspects of the field are explored, from the various types of electronic corpora that are available to instructions on how to design and compile a corpus. Graeme Kennedy surveys the development of corpora for use in linguistic research, looking back to the pre-electronic age as well as to the massive growth of computer corpora in the electronic age.

Building a National Corpus

Building a National Corpus
Author :
Publisher : Springer Nature
Total Pages : 192
Release :
ISBN-10 : 9783030818586
ISBN-13 : 3030818586
Rating : 4/5 (86 Downloads)

Synopsis Building a National Corpus by : Dawn Knight

This book aims to provide a micro-level, working model of a methodological approach and practical guidelines for building a corpus, informed by the work on the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes - the National Corpus of Contemporary Welsh). It focuses specifically on the development of detailed design frames for corpora across communicative modes (spoken, written and e-language), and the practical processes involved in the planning, collection, transcription, collation and (re)presentation of language data. The book is designed to be of significant value and relevance to those interested in critically engaging with corpus methodology. Although Welsh is the language under discussion, the processes and approaches discussed in the building of CorCenCC can be applied to a lesser or greater extent to other language contexts. This book provides a working model, and an account of how to build a corpus dataset from which step by step guidelines for creating other linguistic corpora in any language can be easily extrapolated. It will be of value to students and scholars of minority languages and corpus linguistics.