The Web as a Parallel Corpus

Philip Resnik, Noah A. Smith

doi:10.1162/089120103322711578

【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
【Updated on June 30, 2025】Suspension and deletion of data provided by Nikkei BP
Regarding the recording of “Research Data” and “Evidence Data”

The Web as a Parallel Corpus

DOI Web Site 2 Citations

Philip Resnik

University of Maryland, Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742.
Noah A. Smith

Johns Hopkins University, Department of Computer Science and Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD 21218.

Search this article

CiNii Books

Description

<jats:p> Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale. Finally, the value of these techniques is demonstrated in the construction of a significant parallel corpus for a low-density language pair. </jats:p>

Journal

Computational Linguistics

Computational Linguistics 29 (3), 349-380, 2003-09

MIT Press - Journals

Citations (2)*help

Details 詳細情報について

CRID

1361418519472291456
DOI

10.1162/089120103322711578
ISSN

15309312

08912017
Web Site

https://www.mitpressjournals.org/doi/pdf/10.1162/089120103322711578
Data Source
- Crossref

The Web as a Parallel Corpus

Search this article

Description

Journal

Citations (2)*help

Details 詳細情報について

Export

Report a problem