|
|
Speaker: Daniele Pighin
NLPRG, TALP
Technical University of Catalonia, UPC
Date: May 31, 2011
Time: 11:30
Where: Computer Science Faculty, Room 3.2
Title
Automatic Projection of Semantic Structures:
an Application to Pairwise Translation Ranking
Abstract
The ability to automatically assess the quality of translation
hypotheses is a key requirement towards the development of accurate and
dependable translation models. While it is largely agreed that proper
transfer of predicate-argument structures from source to target is a
very strong indicator of translation quality, especially in relation to
adequacy, the incorporation of this kind of information in the
Statistical Machine Translation (SMT) evaluation pipeline is still
limited to few and isolated cases.
We present a model for the inclusion of semantic role annotations in the
framework of confidence estimation for machine translation. The model
has several interesting properties:
1) it only requires a linguistic processor on the (generally
well-formed) source side of the translation;
2) it does not directly rely on properties of the translation model
(hence, it can be applied beyond phrase-based systems);
3) it is inherently extendable to cope with different kinds of
sequential annotations, e.g., POS tags.
These features make it potentially appealing for system ranking,
translation re-ranking and user feedback evaluation. Preliminary
experiments in pairwise hypothesis ranking on five confidence estimation
benchmarks show that the model has the potential to capture salient
aspects of translation quality.
Ixa Group in collaboration with TALP Centre from Technical University of Catalonia is organizing a one-day workshop on
Using Linguistic Information for Hybrid Machine Translation (LIHMT-2011).
The workshop will be held in Barcelona on Friday, November 18, 2011.
Paper submission deadline is September 9 2011. See: http://ixa2.si.ehu.es/lihmt2011/
This is part of the dissemination effort of our OpenMT-2 project.

Speaker: Lluís Màrquez
NLPRG, TALP
Technical University of Catalonia, UPC
Date: May 10, 2011 
Time: 15:30
Where: Computer Science Faculty, Room 3.2
Automatic evaluation in Machine Translation:
Towards combined linguistically-motivated measures
Automatic evaluation plays a very important role in the development and comparison of machine translation systems. In this talk we will overview the current trend of using linguistically-guided evaluation measures based on several linguistic layers and their combination. Also, we will talk about confidence estimation measures, a particular subset of measures to assess output quality without the need of reference translations. Finally, we will overview the role of evaluation measures within the FAUST European project (Feedback Analysis for User Adaptive Statistical Translation; http://www.faust-fp7.eu/),
focusing on the usage of user feedback to guide the combination of measures.
Speakers:Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
Dipartimento di Informatica
Universita' di Bari
Date: May 6, 2011
Time: 16:00
Where: Computer Science Faculty, Room 3.2
"Information Retrieval and Information Filtering:
Two battlefields for NLP techniques"
Part 1: Introduction to basic concepts on:
- Information Retrieval Models: Boolean, Vector space
- Information Filtering tecniques
- Recommender Systems
- Problems with classical information seeking strategies
Speaker: Giovanni Semeraro
Expected duration: 75 min.
Part 2: Intelligent Information Access:
- Semantic Indexing using external knowledge sources: WordNet, Wikipedia
- Semantic Indexing for multilingual access
Speaker: Pasquale Lops
Expected duration: 45 min.
- Knowledge Infusion (KI): creating a knowledge base from open knowledge sources
- KI at work: solving a challenging language game
- KI applications for recommender systems
Speaker: Marco de Gemmis
Expected duration: 45 min.

Three pieces of news related to the OPENMT-2 project (2010-2012):
Gorka Labaka’s PhD thesis
In his PhD thesis (“EUSMT: Incorporating Linguistic Information to Statistical Machine Translation for Basque“) Labaka studied how Statistical Machine Translation (SMT) can handle the translation of Spanish into Basque, a morphologically rich and less-resourced language. He found two ways to enhance the quality of the translation by using linguistic tools:
- The use of morphological tools allowed him to perform translation at word-segments level, so avoiding spareness problems in corpora.
- Complementarily, the syntactic tools enabled the Spanish word-segments to be rearranged into their corresponding order in Basque. This reordering helped the SMT decoder to look for correct translations.
Recent research trends to focus more on statistical systems, and to ignore rule-based attempts. However, according to Gorka Labaka’s evaluation the RBMT and the state-of-the-art basic SMT systems work with a similar quality when translating into Basque. His improved SMT system based on segmentation and re-ordering outperforms both, the RBMT system and the basic SMT system, in more than 10% in HTER metric. Besides, he calculated that a hypothetical oracle system would yield a result even 10% better; this oracle system should select the improved SMT output for 55% of the sentences, the RBMT output for other 41% of them, and EBMT for 4%. So he concluded that, at least in the case of morphologically rich languages with few resources, and hence few parallel corpora, the SMT approach is limited, and the RBMT approach should not be ignored. Currently, we are experimenting with hybrid architectures combining Matxin (rule-based) and EUSMT (statistical) translation-engines.
.
Visiting researcher Lluís Màrquez (NLPRG, Technical University of Catalonia, UPC)
With the aim of collaborating in this research line, Lluis Marquez, the main researcher in the UPC team within the OPENMT-2 project, is going to be in Donostia visiting the Ixa group until summer. He is an expert in integrating Machine Learning techniques in Language Technology. The first experiments on combining MT engines made by Gorka Labaka confirmed there is room for improvement. Now we want to find out the most suitable ways to do it.
.
.
Collaboration on Post-Editing with Basque Wikipedia (eu.wikipedia)
Within this project, a set of 60 long articles of the Spanish Wikipedia (adding up to more than 100.000 words) have been selected, and then translated into Basque language by using Matxin-Opentrad, our open-source rule-based machine translation system. Soon, in 2011 spring, a group of users of Basque Wikipedia will review them using an special interface we have adapted using OmegaT. They will correct the errors they find; this process is also known as post-editing. In this process, changes made by these users will be logged. The fixed articles will be included into Basque Wikipedia, but additionally the resulting post-editing logs will be used to enhance the machine translation process by manually improving the different modules of their MT system, or by implementing an automated statistical post-editing process that is expected to enhance the accuracy in the translation. (paper in Wikimania 2010)
Our colleague Maite Oronoz won last Monday the II. Koldo MItxelena Award for PhD Theses organized by Euskaltzaindia (the Academy of Basque Language) and the University of the Basque Country.
CONGRATULATIONS Maite!
Besides, our colleague Larraitz Uria’s PhD thesis was also nominated for this award.
Both theses face language error detection. Maite’s thesis deals with it from a computational point of view, while Larraitz’ work does it from a linguistic perspective.

Title of Maite’s thesis: Euskarazko errore sintaktikoak detektatzeko eta zuzentzeko baliabideen garapena: datak, postposizio-lokuzioak eta komunztadura.
(Saroi, a system to detect and correct syntactic mistakes: dates, complex postpositions, and agreement.)
Maite’s supervisors: Arantza Diaz de Ilarraza and Koldo Gojenola
Title of Larraitz’ thesis: Euskarazko erroreen eta desbideratzeen analisirako lan-ingurunea. Determinatzaile-erroreen azterketa eta prozesamendua.
(A framework for the analysis of errors and deviations in Basque texts. Analysis and processing of errors on the use of determiners.
Larraitz’ supervisors: Igone Zabala and Montse Maritxalar
Publications:
- Maite Oronoz, Arantza Díaz de Ilarraza, Koldo Gojenola 2010
Design and evaluation of an agreement error detection system: testing the effect of ambiguity, parser and corpus type
7th International Conference on Natural Language Processing, IceTAL 2010, H. Loftsson, E. R ̈gnvaldsson, S. Helgad ́ttir (Eds.): IceTAL 2010, LNAI 6233, pp. 281–292, 2010. Springer-Verlag Berlin Heidelberg 2010, August 16-18, 2010 Reykjavik, Iceland
- Díaz de Ilarraza A., Gojenola K., Oronoz M. 2009
Evaluating the Impact of Morphosyntactic Ambiguity in Grammatical Error Detection
Recent Advances in Natural Language Processing ISSN 1313-8502. Páginas: 155-160
IXA group has been collaborating with CLA for 10 years. One of the fruits of this collaboration is the third edition of the Diccionario Básico Escolar (DBE). This dictionary is coded in XML and has been implemented using leXkit, an application developed by Ixa Group for dictionary managing.
Version in Basque of this new / Berri hau euskaraz
Speaker: Roser Morante
Senior researcher on the BIOGRAPH project led by Walter Daelemans.
CLiPS-Computational Linguistics research group
University of Antwerp,
Date: February 23, 2010
Time: 16:00
Where: Computer Science Faculty, Meeting room (batzar aretoa) .
Modality and negation in natural language processing:
current trends and future directions
Summary:
Research on modality and negation focuses on finding subjective,
uncertain and counterfactual information in texts, be it in scientific
papers, product reviews, or opinions in blogs. This type of +research is
concerned with processing texts at the information level and aims at
deep text understanding. Modality and negation are phenomena relevant
for all applications that are concerned with +some form of text
understanding, including text mining, sentiment analysis, recognizing
textual entailment, information extraction, text summarization, and
question answering. Hence, the adequate +modeling of these phenomena is
of crucial importance to the natural language processing (NLP) community
as a whole.
Whereas from a theoretical perspective, the study of modality has a long
tradition, only in the recent years have these topics attracted the
attention of NLP researchers. Mainly, the development of +sentiment
analysis techniques and the growing need of mining biomedical texts have
been the causes for the interest in these semantic aspects of language.
In this talk I will define modality and +negation from an NLP
perspective, I will motivate the need for processing these phenomena,
and I will summarize existing research on processing modality and
negation, touching on diverse aspects +ranging from task modelling to
feature visualization. Finally, I will speculate about future
developments in this research area.

IXA Group is participating with other 5 partners in a new European project: PATHS (2010-2012).
The PATHS project (Personalised Access To cultural Heritage Spaces) primarily addresses objective ICT-2009.4.1: Digital Libraries and Digital Preservation. It relates to target outcome (d), adaptive cultural experiences, by creating personalised views of various forms of cultural expression, adapting these views to the background and cognitive context of the user and offering meaningful guidance about the interpretation of cultural works. PATHS will make important progress in this direction.

Europeana: Significant amounts of cultural heritage material are now available through online digital library portals. However, this vast amount of cultural heritage material can also be overwhelming for many users who are provided with little or no guidance on how to find and interpret this information.
The PATHS project will create a system that acts as an interactive personalised tour guide through existing digital library collections. The system will offer suggestions about items to look at and assist in their interpretation. Navigation will be based around the metaphor of a path through the collection. A path can be based around any theme, for example artist and media (“paintings by Picasso”), historic periods (“the Cold War”), places (“Venice”) and famous people (“Muhammad Ali”). Users will be able to construct their own paths or follow pre-defined ones.
The PATHS project will provide users with innovative ways to access and utilise the contents of digital libraries that enrich their experiences of these resources. This will be achieved by extending the state-of-the-art in user-driven information access and by applying language technologies to analyse and enrich online content. The project will take a user-centred approach to development to accommodate the needs, interests and preferences of different types of users.

These goals shall be realised through the following objectives :
- Analysis of users’ requirements for access to Cultural Heritage collections
- Organisation and enrichment of Cultural Heritage content for use within a navigation system
- Implementation of a system for navigating Cultural Heritage resources
- Techniques for providing personalised access to Cultural Heritage content
- Porting the navigation system for use on mobile devices and Facebook
- Evaluation with user groups and in field trials
Therefore, the project will research on the following areas:
-
Information Access: The project will develop a user-driven navigation through collections of information, gathering the users’ requirements and modeling it.
- Educational Informatics: Adapting to individual learners in relation to being directed and being allowed the freedom to explore autonomously.
-
Content interpretation and enrichment: Representation and sharing of information about items, and identifying background information related to the items in cultural heritage collections
IXA Group will work mainly in content processing and enrichment. This means that content from Cultural Heritage sources will processed to a multi-layered network and augmented with additional information that will enrich the user’s experience. The additional information will include links between items in the collection and to external sources like Wikipedia or other relevant collections. The resulting multi-layered network will form the basis for the paths used to navigate the collection.
The PATHS consortium contains six partners.
- Two academic institutions:
- Two SMEs:
- Two cultural heritage enterprises
|
|