Linearly Separable Data and the Perceptron Learning Algorithm

Consider a binary classification problem such as determining if an image of a handwritten character is the number 2 or if it is a different chracter.

If one can represent each images in a space (e.g., a feature space $F$) such that there exists a line, plane or hyperplane that perfectly separates the two classes (2's and non 2's), then the data is linearly separable. One of the most effect methods for solving this classification problem is the Perceptron Learning Algorithm (PLA).

Let's consider some linearly separable data

In [1]:
from numpy import *
%pylab inline

d = 2 #2 dimensions
n = 100 #data points
x = random.rand(d,n) #randomly place points in 2D
#print(x)
subplot(111,aspect=1)
plot(x[0],x[1],'o')
Populating the interactive namespace from numpy and matplotlib
Out[1]:
[<matplotlib.lines.Line2D at 0x11831e668>]

lets draw a line and split the points into 2 groups, above vs below the line

In [2]:
# lets use the line y = 1/3 + 2/3 x, or 0=-1-2x+3y

y = sign(dot(array([-2,+3.]),x) - 1)
#print(y)

xp = x.T[y>0].T 
xm = x.T[y<0].T 
subplot(111,aspect=1)
plot(xp[0],xp[1],'ro')
plot(xm[0],xm[1],'bo');

Let's modify things so there's only an inner product

In [3]:
#Define a 3D vector that is [1,x_1,x_2]
X = empty((d+1,n))
X[0,:] = 1
X[1:,:] = x

W_true = array([-1,-2,3.])
y = sign(dot(W_true,X))


def draw_points(X):
    subplot(111,aspect=1)
    xp = X[1:,y>0]
    xm = X[1:,y<0] 
    plot(xp[0],xp[1],'ro')
    plot(xm[0],xm[1],'bo');
    
draw_points(X)

let's open up a little space between the classes

In [4]:
# let's open up a little space between the classes
halfgap = 0.05
X[-1,y>0] += halfgap
X[-1,y<0] -= halfgap
draw_points(X)

Now forget that we already know a separating W and the labels $\{y_i\}$

Our task is identify a separating hyperplane, $W$, using just some of the known labels $y_i$ that can classify the labels of all points. This is a binary supervised linear classification problem. We will use the PLA.

Note that we seek to learn a map $g$ such that $$ g:\mathbb{R}^2\to\{-1,1\}$$

Perceptron learning algorithm (PLA)

See this blog and this wiki for more information

In [5]:
# We are given X and y ONLY!!

W = array([-1,-1,5],dtype=float)  # starting guess at separating W

while(True):
    misclassified = sign(dot(W,X)) != y
    if not any( misclassified  ): break
    i = random.choice( arange(n)[misclassified] )  # random misclassified point 
    W += y[i]*X[:,i]  # PLA step
print(W)       
[-3.         -4.07672751  7.16996845]

Let's do it again, but with graphics.

First, let's create a function that draws the guess at W

In [6]:
def drawline(W,color,alpha):
    plot([0,1],[-W[0]/W[2],-(W[0]+W[1])/W[2]],color=color,alpha=alpha)

Define PLA

This time, we will create a fuction

In [7]:
def PLA(X,W0,max_counter,visualize):
    W=W0
    counter = 0
    while(True):
        counter+=1
        misclassified = sign(dot(W,X)) != y
        if not any( misclassified  ): 
            break
        i = random.choice( arange(n)[misclassified] )  # random misclassified point 
        W += y[i]*X[:,i];  # PLA step
        
        if visualize==1: 
            drawline(W,'k',min(.05*counter,1))
            #print(sum(misclassified))

    if visualize==1: 
        drawline(W,'k',1) 
        drawline(W_true,'c',1) #actual

    error=sum(misclassified)# total error
        
    return W,error

Run the PLA

In [8]:
visualize = 1 #should we visualize?
if visualize==1:
    draw_points(X)# draw points
    xlim(0,1)
    ylim(0,1)

max_counter = 100

W0 = array([-1,-1,5],dtype=float)# starting guess at separating plane

W,error = PLA(X,W0,max_counter,visualize)
print('W = '+str(W),'\n', 'Error = '+str(error/n*100)  +'%')    
W = [-2.         -2.59290334  4.81310161] 
 Error = 0.0%

Cross Validation

So far, we have trained the classifier on all datapoints and tested the classifier on all data points. For this reason, and because the data is designed to be linearly seperable, the algorithm eventially converges and eventually there is 0 classification error. In other words, we obtain perfect classification for our data.

However, we are only able to know it's reached perfect classification error because we know all the classification labels. This is unrealistic. In the real world, we will know the labels for some datapoints and not for others. How then can we tell if our classification algorithm is going well ... cross validation.

  1. For the moment, let's restrict our attention to only data in which we know the labels. 2. Cross validation involves splitting the datapoints into 2 sets, the training data and testing data.
  2. Then, one creates a classifier using only the training data. For example, one would implement the PLA to obtain $W$ using only some fraction, say $s$, of the data.
  3. After the classifier has converged (or at least allowed enough iterations), then the classifier is test on the remaining \textit{test} datapoints.
  4. The performance on the test data provides an estimate for how the algorithm is doing at predicting the labels for data in which the labels are actually unknown.

Exercise 1: Implement cross-validation for the PLA

Below is some code and an outline to get you started

Step 1: Create and visualize data

Define 2 functions to create points and draw points

In [9]:
def create_data(n,d,W,halfgap):
    x = random.rand(d,n) #randomly place points in 2D
    X = empty((d+1,n))
    X[0,:] = 1
    X[1:,:] = x    
    y = sign(dot(W,X))    
    X[-1,y>0] += halfgap
    X[-1,y<0] -= halfgap
    return X,y

# function that draws points. The first s-fraction of the data are training, and the rest are testing data
def draw_points2(X,y,s,n):
    subplot(111,aspect=1)
    xp = X[1:,y>0]
    xm = X[1:,y<0] 
    plot(xp[0][0:int(s*n)],xp[1][0:int(s*n)],'ro')# train data
    plot(xm[0][0:int(s*n)],xm[1][0:int(s*n)],'bo')# train data
    plot(xp[0][int(s*n):n],xp[1][int(s*n):n],'rx')# test data
    plot(xm[0][int(s*n):n],xm[1][int(s*n):n],'bx')# test data    

Chose parameters and create/visualize data

Circles are training data and x's are testing data

In [10]:
d = 2 #2 dimensions
n = 200 #data points

s = 0.1 # fraction training data

halfgap = 0.05
X,y = create_data(n,d,W,halfgap);
draw_points2(X,y,s,n)

Step 2: Train the PLA on ONLY the training data

***Hint: I'd split the data X and labels y into 2 parts: X_train/X_test and y_train/y_test. Then you can build a classifier using only the train data and labels.

***Hint: Modularize your code and define functions to allow you to package and re-use code.

Step 3: Use your learned hyperplane W to classify the test data and report the error percentage for a few choices of parameters. Make a table showing the parameter choices (n,s) and error.

Step 4: Modularize/package all your code into a function that takes (n,s) as input, and as output it gives the test error percentage

HW 4

  1. For fixed $n=200$, make a plot of the error rate as a function of the training fraction $s$.
  2. Make a plot that shows the error rate as a function of the training fraction $s$, but it shows 3 curves for 3 values of $n$: $n\in\{200,500,1000\}$.
  3. Repeat 1 and 2, but this time implement 25 trials for each value of $s$ and $n$. That is, repeat each experiment 25 times and plot the average across these trials.