Powered by OpenAIRE graph
Found an issue? Give us feedback

Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, Computerlinguïstiek

Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, Computerlinguïstiek

4 Projects, page 1 of 1
  • Funder: Netherlands Organisation for Scientific Research (NWO) Project Code: 612.001.122

    Natural language translation often seems to proceed by the invocation of fixed constructions like conventional phrases, platitudes and idioms. Indeed, the idiomatic approach is currently the workhorse of state-of-the-art translation models. An input sentence is translated by reciting (with statistics) arbitrary-size phrase pairs from the training data. However, the training data is far from exhaustive and novel constructions are frequent. For novel input, the idiomatic model degrades to composing bilingual bits and pieces independently from one another. If idiomaticity is the rule and training data is never exhaustive, how should novel, previously unseen constructions be translated? It would seem that a compositional approach to translation might actually provide a reasonable approximation for many translation cases. However, earlier experience shows that categorical composition leads to brittle systems with inflated risk of wrong translation choices. The present proposal tackles the challenge of a unified, effective modeling of the whole range of translation phenomena between the apparently compositional and the unquestionably idiomatic. We propose a novel statistical model for machine translation that captures idiomaticity and effective composition as the graded outcomes of the same stochastic process. The project concerns developing this model, exploring the variety of representations involved in its rendering, both theoretically and empirically, and contrasting it to state-of-the-art models on standard benchmark data. Should we succeed to learn compositional and idiomatic translation simultaneously, we will make inroads into an outstanding scientific problem, and lay the cornerstone for building far better translation systems.

    more_vert
  • Funder: Netherlands Organisation for Scientific Research (NWO) Project Code: 277-89-002

    If machine translation (MT) systems are to provide human quality translation, they must possess some form of language understanding. But how to build MT systems with language understanding capabilities? A major difficulty in this computational linguistic challenge is to define understanding as a measurable, observable entity. We draw inspiration from school comprehension tests that require students to show evidence of understanding. Rather than asking students to "write down the meaning" of a text, we ask them to "write it in their own words" (paraphrase it). Subsequently, we quantify how well their own paraphrasing preserves the meaning of the original text. This programme is focused on the computational modeling and technological application of meaning preservation, an unexplored, defining property of language understanding. Because correct paraphrasing and translation must preserve meaning, we expect that a meaning preserving MT model must fulfill at least the following desideratum: the model must translate every sentence as correctly as it translates paraphrases of the sentence, and must rank alternative target paraphrases in its own output closely. This VICI programme aims developing a computational model for MT and paraphrasing fulfilling the aforementioned desideratum. The foundational hypothesis underlying this model states that distributions over meaning equivalent phrase categories can be induced statistically from a translation corpus if orthogonal to aligning phrases in the corpus with their translations also monolingual paraphrases across the corpus are aligned together explicitly. Our programme aims at developing this model, building MT and paraphrasing systems based on this model, and exploring their impact as novel help-tools for the translation industry. Success will be measured against benchmark data using evaluation measures tuned specifically to quantifying meaning preservation approximately. If successful, our model will result in far better quality language technology applications, but could also break the status quo in the long-standing linguistic debate regarding the possibility of inducing language understanding components from raw data.

    more_vert
  • Funder: Netherlands Organisation for Scientific Research (NWO) Project Code: 022.006.003
    more_vert
  • Funder: Netherlands Organisation for Scientific Research (NWO) Project Code: SH-343-15

    The task of translating from one language into another using the computer, called Machine Translation (MT), has been one of the central challenges for the natural language processing community for decades now. Recently, neural models for machine translation (MT) have received much attention. This interest is partially fueled by the successes of neural and other representation learning methods in some domains (e.g., image and speech processing, reinforcement learning) but it is also motivated by recognized limitations of traditional MT systems (e.g., these systems do not directly model paraphrasing or semantic similarity). The aim of this (sub-)project is to exploit fast parallel GPU computation to train large neural networks that learn meaningful representations of input sentences informed by hierarchical structure of a sentences so that better translation quality can be achieved.

    more_vert

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

Content report
No reports available
Funder report
No option selected
arrow_drop_down

Do you wish to download a CSV file? Note that this process may take a while.

There was an error in csv downloading. Please try again later.