Comparable Corpora and Computer-assisted Translation

Comparable Corpora and Computer-assisted Translation
Author :
Publisher : John Wiley & Sons
Total Pages : 221
Release :
ISBN-10 : 9781119002703
ISBN-13 : 1119002702
Rating : 4/5 (03 Downloads)

Synopsis Comparable Corpora and Computer-assisted Translation by : Estelle Maryline Delpech

Computer-assisted translation (CAT) has always used translation memories, which require the translator to have a corpus of previous translations that the CAT software can use to generate bilingual lexicons. This can be problematic when the translator does not have such a corpus, for instance, when the text belongs to an emerging field. To solve this issue, CAT research has looked into the leveraging of comparable corpora, i.e. a set of texts, in two or more languages, which deal with the same topic but are not translations of one another. This work had two primary objectives. The first is to assess the input of lexicons extracted from comparable corpora in the context of a specialized human translation task. The second objective is to identify bilingual-lexicon-extraction methods which best match the translators' needs, determining the current limits of these techniques and suggesting improvements. The author focuses, in particular, on the identification of fertile translations, the management of multiple morphological structures, and the ranking of candidate translations. The experiments are carried out on two language pairs (English–French and English–German) and on specialized texts dealing with breast cancer. This research puts significant emphasis on applicability – methodological choices are guided by the needs of the final users. This book is organized in two parts: the first part presents the applicative and scientific context of the research, and the second part is given over to efforts to improve compositional translation. The research work presented in this book received the PhD Thesis award 2014 from the French association for natural language processing (ATALA).

Building and Using Comparable Corpora

Building and Using Comparable Corpora
Author :
Publisher : Springer Science & Business Media
Total Pages : 333
Release :
ISBN-10 : 9783642201288
ISBN-13 : 3642201288
Rating : 4/5 (88 Downloads)

Synopsis Building and Using Comparable Corpora by : Serge Sharoff

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Using Comparable Corpora for Under-Resourced Areas of Machine Translation

Using Comparable Corpora for Under-Resourced Areas of Machine Translation
Author :
Publisher : Springer
Total Pages : 326
Release :
ISBN-10 : 9783319990040
ISBN-13 : 3319990047
Rating : 4/5 (40 Downloads)

Synopsis Using Comparable Corpora for Under-Resourced Areas of Machine Translation by : Inguna Skadiņa

This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.

Building and Using Comparable Corpora for Multilingual Natural Language Processing

Building and Using Comparable Corpora for Multilingual Natural Language Processing
Author :
Publisher : Springer Nature
Total Pages : 138
Release :
ISBN-10 : 9783031313844
ISBN-13 : 3031313844
Rating : 4/5 (44 Downloads)

Synopsis Building and Using Comparable Corpora for Multilingual Natural Language Processing by : Serge Sharoff

This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. In particular, they provide the basis for the multilingual capabilities of pre-trained models, such as BERT or GPT. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.

Corpus Use and Translating

Corpus Use and Translating
Author :
Publisher : John Benjamins Publishing
Total Pages : 164
Release :
ISBN-10 : 9789027224262
ISBN-13 : 9027224269
Rating : 4/5 (62 Downloads)

Synopsis Corpus Use and Translating by : Allison Beeby

Professional translators are increasingly dependent on electronic resources, and trainee translators need to develop skills that allow them to make the best use of these resources. The aim of this book is to show how CULT (Corpus Use for Learning to Translate) methodologies can be used to prepare learning materials, and how novice translators can become autonomous users of corpora. Readers interested in translation studies, translator training and corpus linguistics will find the book particularly useful. Not only does it include practical, technical advice for using and learning to use corpora, but it also addresses important issues such as the balance between training and education and how CULT methodologies reinforce student autonomy and responsibility. Not only is this a good introduction to CULT, but it also incorporates the latest developments in this field, showing the advantages of using these methodologies in competence-based learning.

Corpora in Translation and Contrastive Research in the Digital Age

Corpora in Translation and Contrastive Research in the Digital Age
Author :
Publisher : John Benjamins Publishing Company
Total Pages : 353
Release :
ISBN-10 : 9789027259684
ISBN-13 : 9027259682
Rating : 4/5 (84 Downloads)

Synopsis Corpora in Translation and Contrastive Research in the Digital Age by : Julia Lavid-López

Corpus-based contrastive and translation research are areas that keep evolving in the digital age, as the range of new corpus resources and tools expands, opening up to different approaches and application contexts. The current book contains a selection of papers which focus on corpora and translation research in the digital age, outlining some recent advances and explorations. After an introductory chapter which outlines language technologies applied to translation and interpreting with a view to identifying challenges and research opportunities, the first part of the book is devoted to current advances in the creation of new parallel corpora for under-researched areas, the development of tools to manage parallel corpora or as an alternative to parallel corpora, and new methodologies to improve existing translation memory systems. The contributions in the second part of the book address a number of cutting-edge linguistic issues in the area of contrastive discourse studies and translation analysis on the basis of comparable and parallel corpora in several languages such as English, German, Swedish, French, Italian, Spanish, Portuguese and Turkish, thus showcasing the richness of the linguistic diversity carried out in these recent investigations. Given the multiplicity of topics, methodologies and languages studied in the different chapters, the book will be of interest to a wide audience working in the fields of translation studies, contrastive linguistics and the automatic processing of language.

Parallel Corpora for Contrastive and Translation Studies

Parallel Corpora for Contrastive and Translation Studies
Author :
Publisher : John Benjamins Publishing Company
Total Pages : 313
Release :
ISBN-10 : 9789027262844
ISBN-13 : 9027262845
Rating : 4/5 (44 Downloads)

Synopsis Parallel Corpora for Contrastive and Translation Studies by : Irene Doval

This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.

Introducing Corpus-based Translation Studies

Introducing Corpus-based Translation Studies
Author :
Publisher : Springer
Total Pages : 258
Release :
ISBN-10 : 9783662482186
ISBN-13 : 3662482185
Rating : 4/5 (86 Downloads)

Synopsis Introducing Corpus-based Translation Studies by : Kaibao Hu

The book addresses different areas of corpus-based translation studies, including corpus-based study of translation features, translator’s style, norms of translation, translation practice, translator training and interpreting. It begins by tracing the development of corpus-based translation studies and introducing the compilation of different types of corpora for translation research. The use of corpora in different research areas is then discussed in detail, and the implications and limitations of corpus-based translation studies are addressed. Featuring the use of figures, tables, illustrations and case studies, as well as discussion of methodological issues, the book offers a practical guide to corpus-based translation. It will be of interest to postgraduate students and professionals who are interested in translation studies, interpreting studies or computer-aided translation.

Multiword Units in Machine Translation and Translation Technology

Multiword Units in Machine Translation and Translation Technology
Author :
Publisher : John Benjamins Publishing Company
Total Pages : 271
Release :
ISBN-10 : 9789027264206
ISBN-13 : 9027264201
Rating : 4/5 (06 Downloads)

Synopsis Multiword Units in Machine Translation and Translation Technology by : Ruslan Mitkov

The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully. This volume provides a general overview of the field with particular reference to Machine Translation and Translation Technology and focuses on languages such as English, Basque, French, Romanian, German, Dutch and Croatian, among others. The chapters of the volume illustrate a variety of topics that address this challenge, such as the use of rule-based approaches, compound splitting techniques, MWU identification methodologies in multilingual applications, and MWU alignment issues.

Topics in Language Resources for Translation and Localisation

Topics in Language Resources for Translation and Localisation
Author :
Publisher : John Benjamins Publishing
Total Pages : 237
Release :
ISBN-10 : 9789027291097
ISBN-13 : 9027291098
Rating : 4/5 (97 Downloads)

Synopsis Topics in Language Resources for Translation and Localisation by : Elia Yuste Rodrigo

Language Resources (LRs) are sets of language data and descriptions in machine readable form, such as written and spoken language corpora, terminological databases, computational lexica and dictionaries, and linguistic software tools. Over the past few decades, mainly within research environments, LRs have been specifically used to create, optimise or evaluate natural language processing (NLP) and human language technologies (HLT) applications, including translation-related technologies. Gradually the infrastructures and exploitation tools of LRs are being perceived as core resources in the language services industries and in localisation production settings. However, some efforts ought yet to be made to raise further awareness about LRs in general, and LRs for translation and localisation in particular to a wider audience in all corners of the world. Topics in Language Resources for Translation and Localisation sets out to establish the state of the art of this ever expanding field and underscores the usefulness that LRs can potentially have in the process of creating, adapting, managing, standardising and leveraging content for more than one language and culture from various perspectives.