Efficient Distributed Algorithms for the K-Nearest Neighbors Problem

05/15/2020
by   Reza Fathi, et al.
0

The K-nearest neighbors is a basic problem in machine learning with numerous applications. In this problem, given a (training) set of n data points with labels and a query point p, we want to assign a label to p based on the labels of the K-nearest points to the query. We study this problem in the k-machine model,[%s] a model for distributed large-scale data. In this model, we assume that the n points are distributed (in a balanced fashion) among the k machines and the goal is to quickly compute answer given a query point to a machine. Our main result is a simple randomized algorithm in the k-machine model that runs in O(log K) communication rounds with high probability success (regardless of the number of machines k and the number of points n). The message complexity of the algorithm is small taking only O(klog K) messages. Our bounds are essentially the best possible for comparison-based algorithms.[%s] due to the existence of a lower bound of Ω(log n) communication rounds for finding the median of 2n elements distributed evenly among two processors by Rodeh <cit.>. We also implemented our algorithm and show that it performs well compared to an algorithm (used in practice) that sends K nearest points from each machine to a single machine which then computes the answer.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro