Pôle SIS (Signal, Images et Systèmes)

Pôle SIS (Signal, Images et Systèmes)

Image Coding and Indexing
The main research interests of this group include the development of coding algorithms (images, video, surface meshes and 3D animations), and image and video segmentation, content-based indexing and retrieval for various applications like MPEG 4 and MPEG 7.

  Menu SIS > Image Coding and Indexing

RA3.jpg
 
Image Coding and Indexing - NEWS
 

The members, most of them from the Project CReATIVe, are involved in several French national projects (ANR/RIAM, ANR/Projet blanc, ANR/ARA «Masse de données»), industrial contracts (Alcatel, CNES, France Telecom…) and international academic collaborations such as TELECOM Paristech, INRIA Bretagne, Laboratoire J.A. Dieudonné (Nice), INRIA Sophia, but also schools or universities of Brasil, Tunisia, Italy, and US (Stanford, Boston).

Among the key contributions of the members of this group, we can highlight:
  • Image and video coding
  • Multiresolution coding of 3D animations
  • Neural code and spike-based coding
  • Image restoration
  • Segmentation and tracking
  • Content-based retrieval and learning
Main results
  • Publications in renowned journals and leading conferences in the domain;
  • Several international Patents on coding, on-the-fly 3D transform and lattice vector quantization processes;
  • US Patent (in final review) on content-based image retrieval;
  • Development of a prototype and technological industrial transfert of the proposed lattice quantization method;
  • Integration of a proposed tracking method by an industrial partner of international class in an in-house cinematographic postproduction software suite.

Image and video coding.

In the context of still image coding, we introduced a distortion measure based on the conditional differential entropy of the input signal given its quantized value. Indeed, mean squared error has been widely used as a distortion criterion, but tends to favor high-energy coefficients. Although this behavior is relevant at high bit-rate, it does not always lead to a better visual quality in the general case. We investigated the intrinsic properties of the proposed distortion measure and integrated it into optimal scalar and vectorial quantizers. We also proposed a fast bit allocation algorithm based on this distortion measure, which leads to a great visual quality improvement of highly-compressed images while preserving JPEG2000 compatibility [IC-15 T. André, M. Antonini, M. Barlaud, and R. Gray. Entropy-based distortion measure for image
coding. In IEEE International Conference on Image Processing (ICIP), pages 1157 – 1160,
Atlanta, G.A., Etats-Unis, 2006.
].

In the context of video coding, we proposed a scalable motion-compensated wavelet-based video coder. Wavelet transforms bring more flexibility and offer a natural support to scalability, so that it can be implemented with very limited performance loss [IC-1 M. Agostini, T. André, M. Antonini, and M. Barlaud. Modeling the motion coding error for mcwt
video coders. In IEEE International Conference on Acoustics Speech and Signal Processing
(ICASSP), Toulouse, France, 2006. 4 pages.
]. Our main contributions are related to motioncompensated temporal filtering, optimal motion vectors estimation, model-based bit allocation, minimalcost scalability, occlusion management and lossy quantization of the motion vector [IC-1 M. Agostini, T. André, M. Antonini, and M. Barlaud. Modeling the motion coding error for mcwt
video coders. In IEEE International Conference on Acoustics Speech and Signal Processing
(ICASSP), Toulouse, France, 2006. 4 pages.
, IC-77 M. Cagnazzo, M. Agostini, G. Laroche, J. Jung, and M. Antonini. Motion vector quantization
for efficient low bit-rate video coding. In Proc. of SPIE Electronical Imaging, Visual Communications
and Image Processing conference (VCIP), pages 1–8, San Jose, USA, 2009.
]. We also developed a robust video coder for transmission over noisy channels using source/channel coding (multiple description coding) [IC-6 M. Agostini, M. Antonini, and M. Kieffer. MAP estimation of multiple description encoded
video transmitted over noisy channels. In IEEE International Conference on Image Processing
(ICIP), Cairo, Egypt, 2009. 4 pages.
, IC-21 A. Arrifano, M. Pereira, M. Antonini, and M. Freire. Multiple-description video coding based on
JPEG 2000 MQ-coder registers. In IEEE International Symposium on Circuits and Systems,
pages 1–4, Paris, France, 2010.
].

Multiresolution coding of 3D animations.

It is now well known that the wavelet-based coders for semi-regular meshes outperform all the coders for irregular meshes [IJ-90 F. Payan and M. Antonini. Mean square error approximation for wavelet-based semiregular
mesh compression. IEEE Transactions on Visualization and Computer Graphics (TVCG), 12(4):649–657, 2006.
http://hal.archives-ouvertes.fr/hal-00264508/PDF/Payan_TVCG_2006.pdf.
]. One reason is that a semi-regular mesh has a multiresolution structure particularly relevant for wavelet-based spatial filtering, levels of details, view-dependent processing [IC-260 F. Payan, F. Meriaux, and M. Antonini. View-dependent coding of 3D scenes. In IEEE International
Conference on Image Processing (ICIP), Cairo, Egypt, 2009. 4 pages.
], and so on.

Recently, we have introduced wavelets for taking into account the temporal regularity of the 3D animations (defined by sequences of triangular meshes sharing the same connectivity at any frame) [IJ-91] F. Payan and M. Antonini. Temporal wavelet-based geometry coder for 3D animations.
Elsevier Computer & Graphics, 31(1):77–88, 2007.
http://hal.archives-ouvertes.fr/hal-00264503/PDF/Payan_CAG_2007_Preprint.pdf.
]. Motion compensated temporal wavelet transform have also been proposed which permits a better temporal decorrelation of the data [IC-67 Y. Boulfani-Cuisinaud and M. Antonini. Motion-based geometry compensation for dwt compression of 3D mesh sequences. In IEEE International Conference on Image Processing (ICIP), pages I – 217 – I – 220,
San Antonio, Texas, Etats-Unis, 2007.
, IC-68 Y. Boulfani-Cuisinaud, M. Antonini, and F. Payan. Motion-based mesh clustering for MCDWT compression of 3D animated meshes. In EUSIPCO, XV European Signal Processing Conference, Poznan, Pologne, 2007. 5 pages. ].

On the other hand, the wavelet-based spatial filtering has been rarely addressed in this domain because of the irregular sampling of the animations, which makes the spatial filtering complex. Therefore, we have proposed an original framework for coding the 3D animations by using both temporal and spatial semi-regular wavelets. The major idea is to first obtain a semi-regular representation of the input animation by using a remeshing technique [IC-259 F. Payan, A. Kammoun, and M. Antonini. Remeshing and spatio-temporal wavelet filtering for 3D animations. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),
Las Vegas, Etats-Unis, 2008. 4 pages.
]. In a second phase, we proposed a full spatiotemporal wavelet-based compression scheme for the resulting animated semi-regular meshes [IC-187 A. Kammoun, F. Payan, and M. Antonini. Bit allocation for spatio-temporal wavelet coding of animated semi-regular meshes. In 15th International MultiMedia Modeling Conference (MMM),
Sophia Antipolis, France, 2009. 12 pages.
]. This scheme includes a model-based bit allocation for optimizing the quantization of both spatial and temporal wavelet coefficients. Experimental results show that our coder significantly improves the compression performances, when comparing with previous similar approaches.

The neural code and spike-based coding.

The human visual system conveys information as a set of electrical impulses called spikes. Spikes appear very early in the chain of treatment of the human visual system. At the retina level, after a succession of internal filtering mechanisms, ganglion cells convert an analog signal into a non-deterministic series of spikes, called spike trains, forming the neural code. Spikes have the same shape and amplitude which yields a non-deterministic binary-like neural code.

In collaboration with the Mathematical and Computational Neuroscience project team of INRIA Sophia Antipolis, we explored the behavior of the mammalians retina considered as an analog-to-digital converter for the incoming light stimuli. We based our study on a biologically realistic model that reproduces the neural code as generated by the retina. The neural code consists of non-deterministic temporal sequences of uniformly shaped electrical impulses (spikes). Starting from this spike-based code, we described a dynamic quantization scheme that relies on the so-called rate coding hypothesis, proposed a decoding procedure and gave an interpretation of the non-determinism observed in the spike timings. In order to do this, we implemented a three-staged processing system mapping the anatomical architecture of the retina. Modelling the retinal noise as a dither singal, the retina’s behavior can be defined as a non-subtractive dithered quantizer [IC-231 K. Masmoudi, M. Antonini, and P. Kornprobst. Another look at retina as an image dithered scalar quantizer. In International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2010), pages 1–4,
Desenzano del Garda, Italy, 2010.
, CwP-8 K. Masmoudi, M. Antonini, and P. Kornprobst. Encoding and decoding stimuli using a biologically realistic model: The non-determinism in spike timings seen as a dither signal. In Research in Encoding And Decoding of Neural Ensembles (AREADNE 2010), pages 1–1, Santorini Greece, 2010. ]. This yields an original coding/decoding system which evolves dynamically from coarse to fine, and from uniform to non-uniform, offering several interesting features such as time scalability as well as reconstruction error whitening and de-correlation from the input stimuli.

Image restoration.

Progress in denoising methods underwent a significant leap forward with nonlocal, patch-based methods, even compared with wavelet-based denoising and variational approaches calling upon sophisticated regularization. Based on distinct points of view, UINTA (variational approach using an information theory-based energy) and NL-means (filter design) pioneered this field. Given an image patch, degraded by some noise, the main idea is that there is a high probability that very similar patches, degraded by other realizations of the noise, exist in the image (notion of self-similarity). They can be used to denoise the patch.

We re-interpreted the information theory aspect of denoising to provide an objective justification of UINTA-like methods [IC-18 C. Angelino, E. Debreuve, and M. Barlaud. Image restoration using a kNN-variant of the mean-shift. In IEEE International Conference on Image Processing (ICIP),
San Diego CA, États-Unis, 2008. http://hal.archives-ouvertes.fr/hal-00379329/en/, 4 pages.
], and we provided a variational interpretation of NL-means [HDR-1 E. Debreuve. Mesures de similarité statistiques et estimateurs par k plus proches voisins: une association pour gérer des descripteurs de haute dimension en traitement d’images et de vidéos. Hdr, Université de Nice Sophia-Antipolis, 2009. http://tel. archives-ouvertes.fr/tel-00457710/en/. ]. However, although relying on self-similarity of patches, the denoising in UINTA and NL-means is actually performed in a pixelwise manner. Then, in the same vein as the Block-Matching and 3D filtering (BM3D) algorithm, we proposed a fully patch-based denoising method involving a patch aggregation step relying on a denoising confidence (recently accepted for publication in “International Conference on Image Processing (ICIP)”, Sep. 2010).

Segmentation and tracking.

A statistical approach to object motion estimation in a video has the interest of implying some “softness” in the tracking constraints necessary to cope with complex (though limited) object changes (e.g., due to 3-D motion. . . ) that have inevitably been left out of the model [IC-56 S. Boltz, E. Wolsztynski, E. Debreuve, E. Thierry, M. Barlaud, and L. Pronzato. A minimumentropy procedure for robust motion estimation. In IEEE International Conference on Image Processing (ICIP), Atlanta GA, États-Unis, 2006. http://hal.archives-ouvertes.fr/ hal-00389257/en/, 4 pages. , IC-157 A. Herbulot, S. Boltz, E. Debreuve, and M. Barlaud. Robust motion-based segmentation
in video sequences using entropy estimator. In IEEE International Conference on Image
Processing (ICIP), Atlanta GA, États-Unis, 2006.
http://hal.archives-ouvertes.fr/hal-00389247/en/, 4 pages.
, IJ-13 S. Boltz, A. Herbulot, E. Debreuve, M. Barlaud, and G. Aubert. Motion and appearance nonparametric joint entropy for video segmentation. International Journal of Computer Vision, 80(2):242–259, 2008.
http://hal.archives-ouvertes.fr/hal-00329748/en/
]. In particular, we developed a tracking method relying on the Kullback- Leibler divergence between the features+geometry distribution in the user-defined object region in a in a candidate region of a subsequent frame [IC-52 S. Boltz, E. Debreuve, and M. Barlaud. High-dimensional statistical distance for region-ofinterest tracking: Application to combining a soft geometric constraint with radiometry. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis MN, États-Unis, 2007.
http://hal.archives-ouvertes.fr/hal-00389281/en/, 12 pages.
, IJ-12 S. Boltz, E. Debreuve, and M. Barlaud. High-dimensional statistical measure for region-ofinterest tracking. IEEE Transactions on Image Processing, 18(6):1266–1283, 2009. ]. The statistical nature of this region comparison permits involvement of soft geometrical constraints, which represents a trade-off between the strict constraints of residual-based approaches and the absence of constraints of pure feature-based comparisons.

In robotics, the camera is the main sensor for aerial applications and is more and more used in underwater applications. Segmentation and object tracking in underwater videos are challenging tasks due to particular acquisition conditions (diffusion, natural light, artificial light. . . ). We proposed a fully automatic method to detect and segment artificial objects in underwater video frames with a strong requirement of low processing time [IJ-6 C. Barat and R. Phlypo. A fully automated method to detect and segment a manufactured object in an underwater color image. EURASIP Journal on Advances in Signal Processing, 2010:1–10, 2010. , IC-27 C. Barat and M.-J. Rendas. A robust visual attention system for detecting manufactured objects in underwater video. In Proceedings of the IEEE OCEANS’06, pages 1–6, Boston, USA, 2006. ]. For aerial robots, automatic landing based on video demands methods with real time response and robust results. Under these constraints, we developed a method to track the runaway based on active contours [IC-213 F. Le Bras, T. Hamel, C. Barat, and R. Mahony. Nonlinear image-based visual servo controller for automatic landing guidance of a fixed-wing aircraft. In European Control Conference 2009 - ECC’09,
Budapest, Hungary, 2009. 6 pages.
, Inv-32 F. Le Bras, T. Hamel, and C. Barat. Image-based visual servo controller for automatic landing guidance of a fixed-wing aircraft. IEEE-MED-Workshop on UAS civilian applications: Fire, Forest, Protection, Emergency response, Thessaloniki, Greece, 2009. , IC-26 C. Barat and B. Lagadec. A corner tracker snake approach to segment irregular object shape in video image. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, pages 717–720, Las Vegas,USA, 2008. http://dx. doi.org/10.1109/ICASSP.2008.4517710. ].

Content-based retrieval and learning.

Several scenarios of content-based image and video indexing, retrieval, and classification are possible: the (dis)similarity measure between two images or videos is defined a priori, the (dis)similarity measure is learned, or the classification rule is learned.

In the learning-free context, we developed a statistical, multiscale content-based dissimilarity measure between two images or videos [IC-19 S. Anthoine, E. Debreuve, P. Piro, and M. Barlaud. Using Neighborhood Distributions of Wavelet Coefficients for On-the-Fly, Multiscale-Based Image Retrieval. In International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pages 28–31, Klagenfurt, Autriche, 2008. http://hal.archives-ouvertes.fr/hal-00382777/en/. , IC-281 P. Piro, S. Anthoine, E. Debreuve, and M. Barlaud. Sparse Multiscale Patches (SMP) for Image Categorization. In International Multimedia Modeling Conference on Advances in Multimedia Modeling (MMM), pages 227–238, Sophia Antipolis, France, 2008. http: //hal.archives-ouvertes.fr/hal-00382771/en/. , IJ-103 P. Piro, S. Anthoine, E. Debreuve, and M. Barlaud. Combining spatial and temporal patches for scalable video indexing. Journal of Multimedia Tools and Applications, 48(1):89–104, 2010.
http://hal.archives-ouvertes.fr/hal-00420850/en/
] (US patent 12/385.378 pending). Given a query image or video, the images/videos of a database can be sorted in increasing order of their dissimilarity to the query, and presented to the user as the retrieval results. The multiscale aspect of the proposed dissimilarity measure provides a way to deal automatically with the different scales (i.e., sizes) of the objects in the images or video frames. It accounts for intra- and inter-scale correlations. The statistical aspect brings some flexibility in the definition of (dis)similar images necessary in content-based retrieval (the user do not request identical data but rather similar ones).

In the rule learning context, we proposed an AdaBoost boosting technique to learn the optimal classification rule by majority vote among the k nearest neighbors, or kNN, (in the image description space) [IC-284 P. Piro, M. Barlaud, R. Nock, and F. Nielsen. k-NN Boosting Prototype Learning for Object Classification. In International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS),
Desenzano del Garda, Italie, 2010. http://hal.archives-ouvertes. fr/hal-00481725/en/, 4 pages.
, Re-5 P. Piro, R. Nock, F. Nielsen, and M. Barlaud. Boosting k-NN for categorization of natural scenes, arXiv 1001.1221. http://hal.archives-ouvertes.fr/hal-00481712/en/, 15 pages, 2010. ]. The purpose was to use the learning samples not only as labeled samples for the learning process, but also as weak classifiers that will be combined together to build the classification rule. This amounts to leveraging the kNN classification rule by assigning confidence weights to the samples using boosting.


Laboratoire d'Informatique, Signaux et Systèmes de Sophia-Antipolis
I3S - UMR7271 - UNS CNRS
2000, route des Lucioles - Les Algorithmes - bât. Euclide B - BP 121 - 06903 Sophia Antipolis Cedex - France
Tél. +33 4 92 94 27 01 - Fax : +33 4 92 94 28 98 - www.i3s.unice.fr