Multi-class Classifification and Neural Networks

这是课程的第三个编程作业，用逻辑回归和神经网络识别手写数字，非常有趣的一个作业。

由于神经网络只学习了模型，并未学习如何训练参数，故这次作业已经提前给了神经网络的参数。

训练集样本如图:

每一张图都是一个手写的数字，每一张图的像素是 20 * 20 ，将每一个像素作为一个特征来训练模型。

数据可视化

function [h, display_array] = displayData(X, example_width)
%DISPLAYDATA Display 2D data in a nice grid
%   [h, display_array] = DISPLAYDATA(X, example_width) displays 2D data
%   stored in X in a nice grid. It returns the figure handle h and the 
%   displayed array if requested.

% Set example_width automatically if not passed in
if ~exist('example_width', 'var') || isempty(example_width) 
	example_width = round(sqrt(size(X, 2)));
end

% Gray Image
colormap(gray);

% Compute rows, cols
[m n] = size(X);
example_height = (n / example_width);

% Compute number of items to display
display_rows = floor(sqrt(m));
display_cols = ceil(m / display_rows);

% Between images padding
pad = 1;

% Setup blank display
display_array = - ones(pad + display_rows * (example_height + pad), ...
                       pad + display_cols * (example_width + pad));

% Copy each example into a patch on the display array
curr_ex = 1;
for j = 1:display_rows
	for i = 1:display_cols
		if curr_ex > m, 
			break; 
		end
		% Copy the patch
		
		% Get the max value of the patch
		max_val = max(abs(X(curr_ex, :)));
		display_array(pad + (j - 1) * (example_height + pad) + (1:example_height), ...
		              pad + (i - 1) * (example_width + pad) + (1:example_width)) = ...
						reshape(X(curr_ex, :), example_height, example_width) / max_val;
		curr_ex = curr_ex + 1;
	end
	if curr_ex > m, 
		break; 
	end
end

% Display Image
h = imagesc(display_array, [-1 1]);

% Do not show axis
axis image off

drawnow;

end

多元逻辑回归

把每一图的400个像素作为400个特征，训练逻辑回归模型，由于这次特征比较多，所以逻辑回归的实现代码一定要向量化，我在上一个作业中已经实现了向量化版的逻辑回归代价函数，所以直接拿来用。

function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
%regularization
%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples
J = 0;
grad = zeros(size(theta));

h = sigmoid(X * theta);
resultSet = -y .* log(h) - (ones(size(y)) - y) .* log(ones(size(y)) - h);
penaltyTerm = (lambda / (2 * m)) * sum(theta(2:size(theta)).^2);
J = (1/m) * sum(resultSet) + penaltyTerm;
penaltyTerm2 = ((lambda / m) .* theta) .* [ 0; ones([size(theta, 1) - 1, 1])];
grad = (1/m) .* (X' * (h - y)) + penaltyTerm2;
% =============================================================
grad = grad(:);
end

学习

数字识别，一共有十个分类，分别对应 1,2,3,….,9,0，依次训练每一个数字的分类器。oneVsAll 函数返回所有分类器对应的参数矩阵。

function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta 
%corresponds to the classifier for label i
%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
%   logistic regression classifiers and returns each of these classifiers
%   in a matrix all_theta, where the i-th row of all_theta corresponds 
%   to the classifier for label i

% Some useful variables
m = size(X, 1);
n = size(X, 2);

% You need to return the following variables correctly 
all_theta = zeros(num_labels, n + 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
options = optimset('GradObj', 'on', 'MaxIter', 200);
initial_theta = zeros(n + 1, 1);
for i = 1:10 % 依次训练每一个分类器，y == i 只有 y 与 i 对应位置相等则为1否则0
    [theta] =  fmincg(@(t)(lrCostFunction(t, X, (y == i), lambda)), ...
        initial_theta, options);
    all_theta(i, :) = theta';
end
% =========================================================================

end

预测

predictOneVsAll 中的参数 X 是一个矩阵，每一行代表一张图片。每一张图片应该经过所有分类器测试，返回概率最大的分类。

向量化计算可以一次性将 X 的所有样例都计算

function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels 
%are in the range 1..K, where K = size(all_theta, 1). 
%  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
%  for each example in the matrix X. Note that X contains the examples in
%  rows. all_theta is a matrix where the i-th row is a trained logistic
%  regression theta vector for the i-th class. You should set p to a vector
%  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
%  for 4 examples) 

m = size(X, 1);
num_labels = size(all_theta, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];
% ====================== YOUR CODE HERE ======================     
test = sigmoid(X * all_theta'); % 向量化计算每一个样例
[m, p] = max(test, [], 2); % p 是最大概率对应的下标
% =========================================================================
end

预测准确度如下

1	`Training Set Accuracy: 96.460000`

神经网络预测函数

使用神经网络也可以完成手写识别。这个作业需要使用3层神经网络, 并且已经给了参数，只要求写出 predict 函数，所以直接套公式即可计算。

实现代码如下：

function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);

p = zeros(size(X, 1), 1);


X = [ones(size(X, 1), 1), X];% append bias to X
a2 = sigmoid(Theta1 * X');
a2 = [ones(1, size(X, 1)); a2];% append bias to a2
a3 = sigmoid(Theta2 * a2);
[t, result] = max(a3, [], 1);
p = result';
% =========================================================================

end