Loading
Despite advances during the last decade, the recognition of deformable objects seen from different viewpoints remains largely an open problem for computer vision. Aspects of the recognition problem include deciding whether an object category, such as a face or a car, is contained in an image, identifying its location, outlining the image region it occupies and estimating its pose and the pose of its parts. There are few methods that can address all of these tasks, while the problem of performing them simultaneously for hundreds, or thousands of categories,.as humans do, has not been addressed in the literature so far. Our goal in this project is to introduce a hierarchical and probabilistic approach to high-level vision by developing appropriate object and image representations. Our objectives are versatility, namely building models able to deal with several visual tasks and scalability, that is introducing an approach that is extensible to large-scale recognition. We propose to use Hierarchical Compositional Representations (HCRs) that can account for the hierarchical nature of visual objects by modeling them in a recursive manner: structures at each level of the hierarchy are obtained by a probabilistic composition of structures at the level below, until, at the lowest-level of the hierarchy, modeling the image. A successful development of HCRs can address both versatility and scalability: Due to their hierarchical nature HCRs are able to cope with a broad range of vision problems, as their lower levels reach out to the image information -and can thus perform segmentation- while their higher levels can be constructed at a level of abstraction that allows the modeling of whole object categories, instead of simple object instances -so as to perform recognition. At the same time, as HCRs model objects recursively, parts at a certain level can be shared among multiple objects. Part sharing can then result in detection algorithms whose complexity is sub-linear in the number of objects used. We thereby aspire to develop an approach analogous to the one that led to the development of practical, large-scale systems in the problem of speech recognition: extracting a generic low-level signal representation, finding a small set of common mid-level parts and learning to combine them together into high-level structures in a probabilistic manner. We will address all aspects of the problem, including the development of appropriate mid- level representations, learning hierarchical models and detecting objects in images. We will focus on techniques that guarantee the efficiency and scalability of our system, most notably string-based mid-level representations that will be exploited both during inference and learning, efficient inference algorithms relying on combinatorial optimization, and machine learning techniques that can deal with hierarchical representations. We thereby aspire to develop a system that will efficiently recognize multiple object categories simultaneously, while requiring a small number of training images.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::af2bef21ff9f3c38fbaf48b8a39b1682&type=result"></script>');
-->
</script>