Data structures for fast Bregman k-NN queries

Abstract

k-nearest neighbor (k-NN) search is a crucial tool in many challenging applications of computer vision, ranging from image and information retrieval, to classification and data mining. E.g., k-NN can be used as the basis for non-parametric kernel density classification. Although many efforts have been made to speed up k-NN retrieval, this computational task still remains unsatisfactorily expensive and critical in applications.
Namely, we have extended two well-known data structures for fast indexing and k-nearest neighbor retrieval to the information-theoretic class of Bregman divergences (symmetrized or not): metric ball tree and vantage point tree.
On the one hand, we propose an effective Bregman ball tree (BB-tree) construction algorithm that adapts locally its “arity” degree to the inherent geometric characteristics of the datasets. As attested by extensive experiments, this allows one to meet more often pruning conditions whenever traversing the tree for exact or approximate NN queries without yet penalizing the construction time. Then, since symmetric measures are usually preferred for applications in content-based information retrieval and categorization, we furthermore extend the BB-tree to the case of symmetrized Bregman divergences, such as the Jensen-Shannon divergence (i.e., the symmetrical Kullback-Leibler divergence). Exact and approximate 1-NN search experiments using high-dimensional data arising from real-world images (SIFT signatures, color histograms) illustrate that our method improves over the state of the art significantly (up to an order of magnitude).
On the other hand, we generalize the vantage point tree (vp-tree) to the same class of distorsion measures. This data structure was originally introduced for information retrieval in metric spaces, and has recently shown very good performances for the retrieval of image patches with respect to the L2 metric.

Related publications

Frank Nielsen, Paolo Piro, Michel Barlaud. "Bregman Vantage Point Trees for Efficient Nearest Neighbor Queries". In ICME '09, New York City, US, July 2009.
 
Frank Nielsen, Paolo Piro, Michel Barlaud. "Tailored Bregman Ball Trees for Effective Nearest Neighbors". In EuroCG '09, Bruxelles, Belgium, March 2009.
 

Code

BBtree++

C source code of our improved Bregman Ball Tree.
Our implementation is based on code provided by L. Cayton and was tested under Linux-32 platform.
 

BVPtree

C source code of our implementation of Bregman Vantage Point Tree.
Tested for compilation under Linux-32 platform.