Finied homework05.

remotes/mayun/master
SJ2050 4 years ago
parent 810a213a0f
commit 072f435620

@ -0,0 +1,119 @@
# 第五次作业报告
## 1. 全连接神经网络
> 对于一个全连接神经网络,需要存储每一层上的值以及层与层之间的权重项与偏置项。每一层上使用的激活函数类型以及最终输出层的代价函数可以额外定义。
### 1.1 函数版
当使用函数来实现全连接神经网络时,我们定义了一个字典来存储神经网络的信息,其结构如下:
```python
NN = {'nodes_size': [ ],
'layers': [ ],
'w': [ ],
'b': [ ]}
```
其中`nodes_size`记录每一层的节点个数,`layers`中则记录每一层通过正向计算得到的值,`w`存储各层节点与节点连接的权重矩阵,`b`存储各层节点之间连接的偏置项。
函数版本的实现仅考虑了激活函数为`sigmoid`函数的情形,输出的代价函数选为均方根误差$E_d = \frac{1}{2}\sum\limits_{i\in outputs}(t_i-y_i)^2$。正向传播的公式为:
$$
\vec{a} = f(W \cdot \vec{x})
$$
其中$f$为激活函数,$W$是某一层的权重矩阵,$\vec{x}$为某层的输入向量,$\vec{a}$为某层的输出向量。
反向传播使用梯度下降法更新权重矩阵与偏置项的误差计算公式为:
- 对于输出层节点$i$
$$
\delta_i = y_i(1-y_i)(t_i-y_i)
$$
其中,$\delta_i$是节点$i$的误差项,$y_i$是节点$i$的输出值,$t_i$是样本对应于节点$i$的目标值。
- 对于隐藏层节点$i$
$$
\delta_i = a_i(1-a_i)\sum_{k\in outputs} w_{ki}\delta_i
$$
-
更新权重矩阵与偏置项的公式如下:
$$
\begin{aligned}
w_{ji} &= w_{ji} + \eta\delta_j x_{ji} \\
b_{j} &= b_{j} + \eta\delta_j
\end{aligned}
$$
### 1.2 类版
使用类来实现全连接神经网络与函数版类似,只不过类版本将神经网络的数据结构存储到类属性中。同时,在类版本中,支持将输出层的激活函数指定为$sigmoid$函数或$softmax$函数。当输出层的激活函数为$sigmoid$函数时,输出层的代价函数选为均方根误差,权重矩阵与偏置项的更新公式与函数版一致。当输出层的激活函数为$softmax$时,输出层的代价函数选为$cross entropy$,此时,隐藏层的激活函数依旧为$sigmoid$函数,所以隐藏层的权重矩阵与偏置项的更新公式不变,而输出层的权重函数与偏置项的误差公式发生了一些变化,更改为:
- 对于输出层节点$i$
$$
\delta_i = t_i-y_i
$$
其余公式保持不变。
### 1.3 二分类运行结果
使用moons数据集来测试自己实现的二分类神经网络程序生成的样本数为400其中200个用于训练神经网络剩下的200用于测试神经网络的预测精度神经网络的输入层至输出层各层的节点个数布置为[2, 8, 2],预测结果如下:
![](images/2classification_compare.png)
<center>图1. 二分类预测(左图为真值,右图为预测值)</center>
训练集上的代价函数值$Loss=0.0218$ 精度为$0.975$;测试集上的代价函数值$Loss=0.0357$,精度为$0.95$。从图上以及计算出来的代价函数值以及精度中可以看出,自己实现的神经网络预测精度较好,也验证了程序的可靠性与正确性。
### 1.4 多分类运行结果
多分类选用digits数据集分别使用`sigmoid`作为输出层的激活函数以及均方根误差作为代价函数和使用softmax函数作为输出层的激活函数以及交叉熵作为代价函数训练神经网络并预测输出。将digits数据集中后500个样本作为测试数据集其余样本作为训练数据集。神经网络各层的节点的个数为[64, 100, 10]。当输出层的激活函数为sigmoid函数时confusion matrix如下
![](images/multi_classification_sigmoid.png)
<center>图2. 输出层为sigmoid函数时的confusion matrix</center>
训练集上的精度为$0.984$,测试集上的精度为$0.914$。
采用相同的学习速率以及迭代次数把输出层的激活函数改为softmax函数代价函数改为交叉熵再次运行程序此时得到的confusion matrix如下
![](images/multi_classification_softmax.png)
<center>图3. 输出层为softmax函数时的confusion matrix</center>
训练集上的精度为$1.0$测试集上的精度为0.932。使用softmax作为激活函数交叉熵作为代价函数的神经网络训练速度相比sigmoid作为激活函数均方根误差作为代价的神经网络要快在相同学习率以及迭代系数下训练精度更高。
此外使用softmax作为输出层的激活函数还可以输出类别所属概率。例如对于某个标记为0的样本 其输出的各个类别的概率为:
```json
{
'0': 9.95483984e-01,
'1': 9.19425256e-07,
'2': 4.72022568e-08,
'3': 2.89148627e-09,
'4': 1.12535172e-04,
'5': 1.93048146e-05,
'6': 4.29473472e-03,
'7': 4.89099761e-08,
'8': 8.24803816e-05,
'9': 5.94287548e-06
}
```
可以看到预测结果中,归属为`0`的概率远大于其他的概率,与真值情况相符。
### 1.5 与sklearn的运行结果对比
使用sklearn库实现上述全连接神经网络层数以及节点的布置与上述描述的多分类神经网络相同。同样使用digits数据集进行测试训练集与测试集选取与上述相同最终预测的confusion matrix如下
![](D:\Study\machine-learning\homework_05_nn\homework\images\multi_classification_sklearn.png)
<center>图4. sklearn实现的神经网络预测结果的confusion matrix</center>
预测精度为$0.934$从精度上来看与自己实现的神经网络预测的结果精度基本相同但从速度上来说sklearn实现的神经网络比自己实现的神经网络快一些。

@ -0,0 +1,191 @@
'''
Author: SJ2050
Date: 2022-01-21 16:57:04
LastEditTime: 2022-01-22 10:58:38
Version: v0.0.1
Description: Full connection multi-layer perceptron class declaration.
Copyright © 2022 SJ2050
'''
import numpy as np
from sklearn.metrics import accuracy_score
class FullConnectionMLP:
# nodes_size: each layers size.
# activation_func: activation function of hidden layers
# (only support `sigmoid` now).
# output_activation_func: activation function of output layer.
# If value is `sigmoid`, then error function
# will use mean square loss function. If value
# is `softmax`, then cross entropy loss function
# will be used.
def __init__(self, nodes_size,
activation_func='sigmoid',
output_activation_func='softmax'):
self.nodes_size = nodes_size
self.activation_func = activation_func
self.output_activation_func = output_activation_func
total_layer_num = len(nodes_size)
self.layer_num = total_layer_num
self.layers = [np.array([]) for _ in range(total_layer_num)]
self.w = []
self.b = []
# initialize weights and biases
for i in range(0, total_layer_num-1):
self.w.append(np.random.random((nodes_size[i], nodes_size[i+1]))*2-1)
self.b.append(np.zeros(nodes_size[i+1]))
def __activate_func(self, activation_name):
if activation_name == 'sigmoid':
y_func = lambda x: 1/(1+np.exp(-x))
dy_func = lambda y: y*(1-y)
elif activation_name == 'softmax':
y_func = lambda x: np.exp(x)/np.sum(np.exp(x), axis=1, keepdims=True)
dy_func = lambda yi, yj, equal: yi*(1-yi) if equal else -yi*yj
else:
raise RuntimeError('Unsupported activation name!!')
return (y_func, dy_func)
def forward(self, X_input):
# for input layer
self.layers[0] = X_input
# for hidden layers
X = X_input
(y_func, _) = self.__activate_func(self.activation_func)
for i in range(0, self.layer_num - 2):
w = self.w[i]
b = self.b[i]
X = y_func(np.matmul(X, w)+b)
self.layers[i+1] = X
# for output layer
(y_func, _) = self.__activate_func(self.output_activation_func)
w = self.w[-1]
b = self.b[-1]
X = y_func(np.matmul(X, w)+b)
self.layers[-1] = X
def backward(self, X_input, label_tags, eps = 0.1):
t = label_tags
self.forward(X_input)
D = []
# delta of output layer
x = self.layers[-2]
y = self.layers[-1]
if (self.output_activation_func == 'sigmoid'):
# use mean square loss function
(_, dy_func) = self.__activate_func(self.output_activation_func)
d = dy_func(y)*(t-y)
elif (self.output_activation_func == 'softmax'):
# use cross entropy loss function
(_, dy_func) = self.__activate_func(self.output_activation_func)
d = (t-y)
else:
raise RuntimeError('Unsupported output_activation name!!')
D.insert(0, d)
if (self.activation_func == 'sigmoid'):
(_, dy_func) = self.__activate_func(self.activation_func)
for j in range(self.layer_num-2, 0, -1):
i = j - 1
y = self.layers[j]
w = self.w[j]
d = dy_func(y)*np.matmul(d, w.T)
D.insert(0, d)
else:
raise RuntimeError('activation function of hidden layer only support sigmoid now!')
# update weights and biases
for j in range(self.layer_num-1, 0, -1):
i = j - 1
x = self.layers[i]
d = D[i]
self.w[i] += eps*np.matmul(x.T, d)/x.shape[0]
self.b[i] += eps*np.sum(d, axis=0)/x.shape[0]
def evaluate(self, label_tags):
output_layer = self.layers[-1]
loss = np.mean(0.5*np.linalg.norm(output_layer - label_tags, axis=1)**2)
y_pred = np.argmax(output_layer, axis=1)
y_true = np.argmax(label_tags, axis=1)
acc = accuracy_score(y_true, y_pred)
return (loss, acc)
def train(self, X_input, label_tags, eps=0.1, iter_num=2000, eval_num=100, batch_size=100):
print('Training...')
for i in range(iter_num):
if (batch_size < 0):
batch_input = X_input
batch_tags = label_tags
else:
index = np.random.randint(0, X_input.shape[0], batch_size)
batch_input = X_input[index]
batch_tags = label_tags[index]
self.backward(batch_input, batch_tags, eps)
if (i+1) % eval_num == 0:
self.forward(X_input)
loss, acc = self.evaluate(label_tags)
print(f'{i+1}th training of {iter_num}: Loss={loss}, acc={acc}.')
print('Training finished!')
return self.layers[-1]
def predict(self, X_input):
self.forward(X_input)
return self.layers[-1]
if __name__ == '__main__':
from sklearn.datasets import load_digits
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
# load data
digits = load_digits()
X = digits.data
Y = digits.target
X -= X.min()
X /= X.max()
x_train = X[:-500]
y_train = Y[:-500]
y_train_tags = np.zeros((x_train.shape[0], 10))
for i in range(10):
y_train_tags[np.where(y_train == i), i] = 1
# initialize NN model
nodes_size = [64, 100, 10]
fc_mlp = FullConnectionMLP(nodes_size, activation_func='sigmoid',
output_activation_func='softmax')
pred_train = fc_mlp.train(x_train, y_train_tags, eps=0.1, iter_num=20000,
eval_num=100, batch_size=100)
(loss, acc) = fc_mlp.evaluate(y_train_tags)
print('--------------------------------')
print(f'train: Loss={loss}, acc={acc}.')
x_test = X[-500:]
y_test = Y[-500:]
y_test_tags = np.zeros((x_test.shape[0], 10))
for i in range(10):
y_test_tags[np.where(y_test == i), i] = 1
y_res = fc_mlp.predict(x_test)
y_test_pred = np.argmax(y_res, axis=1)
# print(f'aa {y_test_pred[0]}; {y_res[0]}')
# print(f'bb {y_test_pred[1]}; {y_res[1]}')
(loss, acc) = fc_mlp.evaluate(y_test_tags)
print('--------------------------------')
print(f'predict: Loss={loss}, acc={acc}.')
cm = confusion_matrix(y_test, y_test_pred)
plt.matshow(cm)
plt.title(u'Confusion Matrix')
plt.colorbar()
plt.ylabel(u'Groundtruth')
plt.xlabel(u'Predict')
plt.show()

@ -0,0 +1,155 @@
'''
Author: SJ2050
Date: 2022-01-16 17:16:10
LastEditTime: 2022-01-22 10:09:33
Version: v0.0.1
Description: Forward-Propagation and Back-Propagation algorithms of MLP using function.
Copyright © 2022 SJ2050
'''
import numpy as np
from sklearn.metrics import accuracy_score
# NN = {'nodes_size': [ ],
# 'layers': [ ],
# 'w': [ ],
# 'b': [ ]}
def sigmoid(x):
y = 1 / (1+np.exp(-x))
return y
def sigmoid_derivative(y):
return y * (1-y)
def initialize(nodes_size):
NN = {}
NN['nodes_size'] = nodes_size
total_layer_num = len(nodes_size)
NN['layers'] = [np.array([]) for _ in range(total_layer_num)]
NN['w'] = []
NN['b'] = []
for i in range(0, total_layer_num-1):
NN['w'].append(np.random.random((nodes_size[i], nodes_size[i+1]))*2-1)
NN['b'].append(np.zeros(nodes_size[i+1]))
return NN
def forward_propagation(NN, input_layer):
layer_num = len(NN['nodes_size'])
NN['layers'][0] = input_layer
x = input_layer
for i in range(0, layer_num-1):
w = NN['w'][i]
b = NN['b'][i]
x = sigmoid(np.matmul(x, w)+b)
NN['layers'][i+1] = x
def backward_propagation(NN, input_layer, tags, eps = 0.01):
layer_num = len(NN['nodes_size'])
t = tags
forward_propagation(NN, input_layer)
D = []
# compute delta
for j in range(layer_num-1, 0, -1):
i = j - 1
x = NN['layers'][i]
y = NN['layers'][j]
if j == layer_num - 1:
d = sigmoid_derivative(y)*(t-y)
else:
w = NN['w'][j]
d = sigmoid_derivative(y)*np.matmul(d, w.T)
D.insert(0, d)
# update weights and biases
for j in range(layer_num-1, 0, -1):
i = j - 1
x = NN['layers'][i]
d = D[i]
NN['w'][i] += eps*np.matmul(x.T, d) / x.shape[0]
NN['b'][i] += eps*np.sum(d, axis=0) / x.shape[0]
def train(NN, input_layer, tags, eps, iter_num, eval_num, batch_size):
print('Training...')
for i in range(iter_num):
# print(f'{i+1}th training of {iter_num}.')
if (batch_size < 0):
batch_input = input_layer
batch_tags = tags
else:
index = np.random.randint(0, input_layer.shape[0], batch_size)
batch_input = input_layer[index]
batch_tags = tags[index]
backward_propagation(NN, batch_input, batch_tags, eps)
if (i+1) % eval_num == 0:
forward_propagation(NN, input_layer)
loss, acc = evaluate(NN, tags)
print(f'{i+1}th training of {iter_num}: Loss={loss}, acc={acc}.')
print('Training finished!')
return NN['layers'][-1]
def predict(NN, input_layer):
forward_propagation(NN, input_layer)
return NN['layers'][-1]
def evaluate(NN, tags):
output_layer = NN['layers'][-1]
loss = np.mean(0.5*np.linalg.norm(output_layer - tags, axis=1)**2)
y_pred = np.argmax(output_layer, axis=1)
y_true = np.argmax(tags, axis=1)
acc = accuracy_score(y_true, y_pred)
# print(f'Loss = {loss}, acc = {acc}.')
return (loss, acc)
if __name__ == '__main__':
from sklearn import datasets, linear_model
import matplotlib.pyplot as plt
# generate sample data
np.random.seed(0)
X, y_true = datasets.make_moons(400, noise=0.20)
tags = np.zeros((X.shape[0], 2))
tags[np.where(y_true == 0), 0] = 1
tags[np.where(y_true == 1), 1] = 1
x_train = X[:200]
y_train_tags = tags[:200]
x_test = X[200:]
y_test_tags = tags[200:]
# plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)
# plt.show()
# initialize NN model
nodes_size = [2, 8, 2]
NN = initialize(nodes_size)
# train
y_res = train(NN, x_train, y_train_tags,
eps=0.1, iter_num=100000,
eval_num=1000, batch_size=-1)
y_train_pred = np.argmax(y_res, axis=1)
(loss, acc) = evaluate(NN, y_train_tags)
print('--------------------------------')
print(f'train: Loss={loss}, acc={acc}.')
# predict result
y_res = predict(NN, x_test)
y_test_pred = np.argmax(y_res, axis=1)
# plot
plt.scatter(x_test[:, 0], x_test[:, 1], c=y_true[200:], cmap=plt.cm.Spectral)
plt.title("ground truth")
plt.show()
plt.scatter(x_test[:, 0], x_test[:, 1], c=y_test_pred, cmap=plt.cm.Spectral)
plt.title("predicted")
plt.show()
(loss, acc) = evaluate(NN, y_test_tags)
print('--------------------------------')
print(f'predict: Loss={loss}, acc={acc}.')

@ -0,0 +1,41 @@
'''
Author: SJ2050
Date: 2022-01-21 12:01:47
LastEditTime: 2022-01-22 11:04:40
Version: v0.0.1
Description: Full connection multi-layer perceptron using sklearn.
Copyright © 2022 SJ2050
'''
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
# load data
digits = load_digits()
X = digits.data
Y = digits.target
X -= X.min()
X /= X.max()
x_train = X[:-500]
y_train = Y[:-500]
mlp = MLPClassifier(hidden_layer_sizes=(100), max_iter=10000)
mlp.fit(x_train,y_train)
x_test = X[-500:]
y_test = Y[-500:]
predictions = mlp.predict(x_test)
acc = accuracy_score(y_test, predictions)
print('--------------------------------')
print(f'predict: acc = {acc}.')
cm = confusion_matrix(y_test, predictions)
plt.matshow(cm)
plt.title(u'Confusion Matrix')
plt.colorbar()
plt.ylabel(u'Groundtruth')
plt.xlabel(u'Predict')
plt.show()

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Loading…
Cancel
Save