Using Comparable Corpora for Under-Resourced Areas of Machine Translation

Using Comparable Corpora for Under-Resourced Areas of Machine Translation
Author :
Publisher : Springer
Total Pages : 326
Release :
ISBN-10 : 9783319990040
ISBN-13 : 3319990047
Rating : 4/5 (40 Downloads)

Synopsis Using Comparable Corpora for Under-Resourced Areas of Machine Translation by : Inguna Skadiņa

This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.

Building and Using Comparable Corpora

Building and Using Comparable Corpora
Author :
Publisher : Springer Science & Business Media
Total Pages : 333
Release :
ISBN-10 : 9783642201288
ISBN-13 : 3642201288
Rating : 4/5 (88 Downloads)

Synopsis Building and Using Comparable Corpora by : Serge Sharoff

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Neural Machine Translation

Neural Machine Translation
Author :
Publisher : Cambridge University Press
Total Pages : 409
Release :
ISBN-10 : 9781108497329
ISBN-13 : 1108497322
Rating : 4/5 (29 Downloads)

Synopsis Neural Machine Translation by : Philipp Koehn

Learn how to build machine translation systems with deep learning from the ground up, from basic concepts to cutting-edge research.

Parallel Corpora for Contrastive and Translation Studies

Parallel Corpora for Contrastive and Translation Studies
Author :
Publisher : John Benjamins Publishing Company
Total Pages : 313
Release :
ISBN-10 : 9789027262844
ISBN-13 : 9027262845
Rating : 4/5 (44 Downloads)

Synopsis Parallel Corpora for Contrastive and Translation Studies by : Irene Doval

This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.

Human Language Technologies

Human Language Technologies
Author :
Publisher : IOS Press
Total Pages : 264
Release :
ISBN-10 : 9781607506409
ISBN-13 : 1607506408
Rating : 4/5 (09 Downloads)

Synopsis Human Language Technologies by : Inguna Skadina

This book contains papers from the Fourth International Conference on Human Language Technologies - the Baltic Perspective (Baltic HLT 2010), held in Riga in October 2010. This conference is the latest in a series which provides a forum for sharing recent advances in human language processing, and promotes cooperation between the computer science and linguistics communities of the Baltic countries and the rest of the world. Bringing together scientists, developers, providers and users, the conference is an opportunity to exchange information, discuss problems, find new synergies, and promote i.

Corpus Use in Cross-linguistic Research

Corpus Use in Cross-linguistic Research
Author :
Publisher : John Benjamins Publishing Company
Total Pages : 245
Release :
ISBN-10 : 9789027249319
ISBN-13 : 9027249318
Rating : 4/5 (19 Downloads)

Synopsis Corpus Use in Cross-linguistic Research by : Marlén Izquierdo

Cross-linguistic research is a fruitful field of language inquiry that has benefited enormously from the use of corpora. As sources of linguistic data of various kinds and as tools for language processing, corpora have shaped the development of cross-linguistic research, enabling both language description and practical applications. This volume contains twelve studies that emphasize the usefulness and usability of parallel corpora in accurately exploring the structure and use of seven under-researched languages and language varieties. The first part emphasizes the role of corpus-based descriptive analyses at the lexicogrammatical and discursive levels, as a first step on the way towards concrete applications like translation or language teaching. The second part focuses on the role of parallel-corpus-based language processing techniques and applications that facilitate professional communication. This book will be of interest to scholars in contrastive linguistics, translation studies, discourse analysis, language teaching, and natural language processing.

Corpora in Translation and Contrastive Research in the Digital Age

Corpora in Translation and Contrastive Research in the Digital Age
Author :
Publisher : John Benjamins Publishing Company
Total Pages : 353
Release :
ISBN-10 : 9789027259684
ISBN-13 : 9027259682
Rating : 4/5 (84 Downloads)

Synopsis Corpora in Translation and Contrastive Research in the Digital Age by : Julia Lavid-López

Corpus-based contrastive and translation research are areas that keep evolving in the digital age, as the range of new corpus resources and tools expands, opening up to different approaches and application contexts. The current book contains a selection of papers which focus on corpora and translation research in the digital age, outlining some recent advances and explorations. After an introductory chapter which outlines language technologies applied to translation and interpreting with a view to identifying challenges and research opportunities, the first part of the book is devoted to current advances in the creation of new parallel corpora for under-researched areas, the development of tools to manage parallel corpora or as an alternative to parallel corpora, and new methodologies to improve existing translation memory systems. The contributions in the second part of the book address a number of cutting-edge linguistic issues in the area of contrastive discourse studies and translation analysis on the basis of comparable and parallel corpora in several languages such as English, German, Swedish, French, Italian, Spanish, Portuguese and Turkish, thus showcasing the richness of the linguistic diversity carried out in these recent investigations. Given the multiplicity of topics, methodologies and languages studied in the different chapters, the book will be of interest to a wide audience working in the fields of translation studies, contrastive linguistics and the automatic processing of language.

Machine Learning in Translation Corpora Processing

Machine Learning in Translation Corpora Processing
Author :
Publisher : CRC Press
Total Pages : 205
Release :
ISBN-10 : 9780429588839
ISBN-13 : 0429588836
Rating : 4/5 (39 Downloads)

Synopsis Machine Learning in Translation Corpora Processing by : Krzysztof Wolk

This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main objective of this volume to develop an automatic and robust Polish-to-English translation system to meet specific translation requirements and to develop bilingual textual resources by mining comparable corpora.

Translation-Driven Corpora

Translation-Driven Corpora
Author :
Publisher : Routledge
Total Pages : 244
Release :
ISBN-10 : 9781317639855
ISBN-13 : 1317639855
Rating : 4/5 (55 Downloads)

Synopsis Translation-Driven Corpora by : Federico Zanettin

Electronic texts and text analysis tools have opened up a wealth of opportunities to higher education and language service providers, but learning to use these resources continues to pose challenges to scholars and professionals alike. Translation-Driven Corpora aims to introduce readers to corpus tools and methods which may be used in translation research and practice. Each chapter focuses on specific aspects of corpus creation and use. An introduction to corpora and overview of applications of corpus linguistics methodologies to translation studies is followed by a discussion of corpus design and acquisition. Different stages and tools involved in corpus compilation and use are outlined, from corpus encoding and annotation to indexing and data retrieval, and the various methods and techniques that allow end users to make sense of corpus data are described. The volume also offers detailed guidelines for the construction and analysis of multilingual corpora. Corpus creation and use are illustrated through practical examples and case studies, with each chapter outlining a set of tasks aimed at guiding researchers, students and translators to practice some of the methods and use some of the resources discussed. These tasks are meant as hands-on activities to be carried out using the materials and links available in an accompanying DVD. Suggested further readings at the end of each chapter are complemented by an extensive bibliography at the end of the volume. Translation-Driven Corpora is designed for use by teachers and students in the classroom or by researchers and professionals for self-learning. It is an invaluable resource for anyone interested in this fast growing area of scholarly and professional activity.

Human Language Technologies

Human Language Technologies
Author :
Publisher : IOS Press
Total Pages : 312
Release :
ISBN-10 : 9781614991328
ISBN-13 : 1614991324
Rating : 4/5 (28 Downloads)

Synopsis Human Language Technologies by : Arvi Tavast

Human language technologies continue to play an important part in the modern information society.This book contains papers presented at the fifth international conference 'Human Language Technologies - The Baltic Perspective (Baltic HLT 2012)', held in Tartu, Estonia, in October 2012.Baltic HLT provides a special venue for new and ongoing work in computational linguistics and related disciplines, both in the Baltic states and in a broader geographical perspective. It brings together scientists, developers, providers and users of HLT, and is a forum for the sharing of new ideas and recent advances in human language processing, promoting cooperation between the research communities of computer science and linguistics from the Baltic countries and the rest of the world.Twenty long papers, as well as the posters or demos accepted for presentation at the conference, are published here. They cover a wide range of topics: morphological disambiguation, dependency syntax and valency, computational semantics, named entities, dialogue modeling, terminology extraction and management, machine translation, corpus and parallel corpus compiling, speech modeling and multimodal communication. Some of the papers also give a general overview of the state of the art of human language technology and language resources in the Baltic states.This book will be of interest to all those whose work involves the use and application of computational linguistics and related disciplines.