
Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, Computerlinguïstiek
Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, Computerlinguïstiek
4 Projects, page 1 of 1
assignment_turned_in Project2012 - 2017Partners:Universiteit van Amsterdam, Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, Computerlinguïstiek, Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Institute for Logic, Language and Computation (ILLC)Universiteit van Amsterdam,Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, Computerlinguïstiek,Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Institute for Logic, Language and Computation (ILLC)Funder: Netherlands Organisation for Scientific Research (NWO) Project Code: 612.001.122Natural language translation often seems to proceed by the invocation of fixed constructions like conventional phrases, platitudes and idioms. Indeed, the idiomatic approach is currently the workhorse of state-of-the-art translation models. An input sentence is translated by reciting (with statistics) arbitrary-size phrase pairs from the training data. However, the training data is far from exhaustive and novel constructions are frequent. For novel input, the idiomatic model degrades to composing bilingual bits and pieces independently from one another. If idiomaticity is the rule and training data is never exhaustive, how should novel, previously unseen constructions be translated? It would seem that a compositional approach to translation might actually provide a reasonable approximation for many translation cases. However, earlier experience shows that categorical composition leads to brittle systems with inflated risk of wrong translation choices. The present proposal tackles the challenge of a unified, effective modeling of the whole range of translation phenomena between the apparently compositional and the unquestionably idiomatic. We propose a novel statistical model for machine translation that captures idiomaticity and effective composition as the graded outcomes of the same stochastic process. The project concerns developing this model, exploring the variety of representations involved in its rendering, both theoretically and empirically, and contrasting it to state-of-the-art models on standard benchmark data. Should we succeed to learn compositional and idiomatic translation simultaneously, we will make inroads into an outstanding scientific problem, and lay the cornerstone for building far better translation systems.
more_vert assignment_turned_in Project2013 - 2020Partners:Universiteit van Amsterdam, Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, Computerlinguïstiek, Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Institute for Logic, Language and Computation (ILLC), Applied Logic Laboratory, Universiteit van Amsterdam, Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Institute for Logic, Language and Computation (ILLC)Universiteit van Amsterdam,Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, Computerlinguïstiek,Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Institute for Logic, Language and Computation (ILLC), Applied Logic Laboratory,Universiteit van Amsterdam,Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Institute for Logic, Language and Computation (ILLC)Funder: Netherlands Organisation for Scientific Research (NWO) Project Code: 277-89-002If machine translation (MT) systems are to provide human quality translation, they must possess some form of language understanding. But how to build MT systems with language understanding capabilities? A major difficulty in this computational linguistic challenge is to define understanding as a measurable, observable entity. We draw inspiration from school comprehension tests that require students to show evidence of understanding. Rather than asking students to "write down the meaning" of a text, we ask them to "write it in their own words" (paraphrase it). Subsequently, we quantify how well their own paraphrasing preserves the meaning of the original text. This programme is focused on the computational modeling and technological application of meaning preservation, an unexplored, defining property of language understanding. Because correct paraphrasing and translation must preserve meaning, we expect that a meaning preserving MT model must fulfill at least the following desideratum: the model must translate every sentence as correctly as it translates paraphrases of the sentence, and must rank alternative target paraphrases in its own output closely. This VICI programme aims developing a computational model for MT and paraphrasing fulfilling the aforementioned desideratum. The foundational hypothesis underlying this model states that distributions over meaning equivalent phrase categories can be induced statistically from a translation corpus if orthogonal to aligning phrases in the corpus with their translations also monolingual paraphrases across the corpus are aligned together explicitly. Our programme aims at developing this model, building MT and paraphrasing systems based on this model, and exploring their impact as novel help-tools for the translation industry. Success will be measured against benchmark data using evaluation measures tuned specifically to quantifying meaning preservation approximately. If successful, our model will result in far better quality language technology applications, but could also break the status quo in the long-standing linguistic debate regarding the possibility of inducing language understanding components from raw data.
more_vert assignment_turned_in Project2016 - 2022Partners:Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Institute for Logic, Language and Computation (ILLC), Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Instituut voor Informatica (IVI), Universiteit van Amsterdam, Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, ComputerlinguïstiekUniversiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Institute for Logic, Language and Computation (ILLC),Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Instituut voor Informatica (IVI),Universiteit van Amsterdam,Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, ComputerlinguïstiekFunder: Netherlands Organisation for Scientific Research (NWO) Project Code: 022.006.003more_vert assignment_turned_in Project2015 - 2016Partners:Universiteit van Amsterdam, Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, Computerlinguïstiek, Erasmus MC, Erasmus MC, Thoraxcentrum, Cardiologie, RG Gebouw, Technische Universiteit Delft, Faculteit Elektrotechniek, Wiskunde en Informatica, Microelectronics, Elektronische Instrumentatie +6 partnersUniversiteit van Amsterdam,Universiteit van Amsterdam, Faculteit der Geesteswetenschappen, Computerlinguïstiek,Erasmus MC,Erasmus MC, Thoraxcentrum, Cardiologie, RG Gebouw,Technische Universiteit Delft, Faculteit Elektrotechniek, Wiskunde en Informatica, Microelectronics, Elektronische Instrumentatie,Erasmus MC, Thoraxcentrum, Biomedical Engineering,Technische Universiteit Delft, Faculteit Technische Natuurwetenschappen, Laboratory of Acoustical Imaging & Sound Control,Erasmus MC, Thoraxcentrum, Biomedische Technologie,Technische Universiteit Delft,Technische Universiteit Delft, Faculteit Technische Natuurwetenschappen, Department of Imaging Physics, Medical Imaging (MI),Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (Faculty of Science), Institute for Logic, Language and Computation (ILLC)Funder: Netherlands Organisation for Scientific Research (NWO) Project Code: SH-343-15The task of translating from one language into another using the computer, called Machine Translation (MT), has been one of the central challenges for the natural language processing community for decades now. Recently, neural models for machine translation (MT) have received much attention. This interest is partially fueled by the successes of neural and other representation learning methods in some domains (e.g., image and speech processing, reinforcement learning) but it is also motivated by recognized limitations of traditional MT systems (e.g., these systems do not directly model paraphrasing or semantic similarity). The aim of this (sub-)project is to exploit fast parallel GPU computation to train large neural networks that learn meaningful representations of input sentences informed by hierarchical structure of a sentences so that better translation quality can be achieved.
more_vert