Linearization Approach for Efficient KNN Search of High-Dimensional Data
説明
K-Nearest Neighbors (KNN) search in high-dimensional feature spaces is an important paradigm for a variety of applications. Despite the continuous efforts in the past years, algorithms of data partitioning methods (DPMs), such as R*-tree, SR-tree and self-organizing maps (SOM), to find the exact KNN answer set at high dimensions are outperformed by a linear scan method. In this paper, we present a ”plug&search” method to greatly speed up the exact KNN search of existing DPMs. The idea is to linearize the data partitions produced by a DPM, rather than the points themselves, based on the distances between the partitions and a selected reference point, thus resulting in a one-dimensional array-index, that is simple, compact and yet fast index structure. Our KNN search algorithm sets conditions to skip irrelevant partitions and early stop the search as soon as the exact KNN answer points are retrieved.