Tractability from overparametrization: The example of the negative perceptron

10/28/2021
by   Andrea Montanari, et al.
0

In the negative perceptron problem we are given n data points ( x_i,y_i), where x_i is a d-dimensional vector and y_i∈{+1,-1} is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible negative margin. In other words, we want to find a unit norm vector θ that maximizes min_i≤ ny_i⟨θ, x_i⟩. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which n,d→∞ with n/d→δ, and prove upper and lower bounds on the maximum margin κ_s(δ) or – equivalently – on its inverse function δ_s(κ). In other words, δ_s(κ) is the overparametrization threshold: for n/d≤δ_s(κ)-ε a classifier achieving vanishing training error exists with high probability, while for n/d≥δ_s(κ)+ε it does not. Our bounds on δ_s(κ) match to the leading order as κ→ -∞. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold δ_lin(κ). We observe a gap between the interpolation threshold δ_s(κ) and the linear programming threshold δ_lin(κ), raising the question of the behavior of other algorithms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro