diff --git a/homework_04_logistic_regression/homework/README.md b/homework_04_logistic_regression/homework/README.md index bbe04d0..b93b1fd 100644 --- a/homework_04_logistic_regression/homework/README.md +++ b/homework_04_logistic_regression/homework/README.md @@ -16,36 +16,40 @@ g(z_i)=\frac{e^{z_{i}}}{\sum\limits_{j=1}^{n} e^{z_{j}}} $$ 其中,n表示多个输出或类别数,$z_j$为第$j$个输出或类别的值,$i$表示当前需要计算的类别。从上述公式中可以看出,Softmax函数的计算结果落在$[0,\ 1]$中,且所有类别的Softmax函数值之和等于1。 -​ 在输入到输出之间引入一层函数映射,取$\mathbf{\Theta}^T\cdot\mathbf{x}=\mathbf{z}$,其中$\mathbf{\Theta}=[\theta_1,\ \theta_2,\ ,...,\ \theta_n]$为权重系数,$\theta$为权重向量,$\mathbf{x}$为输入向量,$\mathbf{z}$为输出向量,则Softmax函数可以写成: +​ 在输入到输出之间引入一层函数映射,取$\mathbf{\Theta}^T\cdot\mathbf{x}+\mathbf{b}=\mathbf{z}$,其中$\mathbf{\Theta}=[\theta_1,\ \theta_2,\ ,...,\ \theta_n]$为权重系数,$\theta$为权重向量,$\mathbf{x}$为输入向量,$\mathbf{z}$为输出向量,则Softmax函数可以写成: $$ -g(z_i)=g(\theta_i^T \mathbf{x})=\frac{e^{\theta_i^T\mathbf{x}}}{\sum\limits_{j=1}^{n} e^{\theta_j^T \mathbf{x}}}=h_{\theta_i}(\mathbf{x}) +g(z_i)=g(\theta_i^T \mathbf{x}+b_i)=\frac{e^{\theta_i^T\mathbf{x}+b_i}}{\sum\limits_{j=1}^{n} e^{\theta_j^T \mathbf{x}+b_j}}=h_{\theta_i, b_i}(\mathbf{x}) $$ 构造似然函数,若有$m$个训练样本: $$ \begin{aligned} -L(\Theta)&=p(\mathbf{y}|\mathbf{X};\Theta) \\\\ -& = \prod\limits_{i=1}^{m} p(y^{i}|\mathbf{x}^{i};\Theta) \\\\ -& = \prod_{i=1}^m h_{\theta_i}(\mathbf{x}) +L(\Theta;\mathbf{b})&=p(\mathbf{y}|\mathbf{X};\Theta,\mathbf{b}) \\\\ +& = \prod\limits_{i=1}^{m} p(y^{i}|\mathbf{x}^{i};\Theta,\mathbf{b}) \\\\ +& = \prod_{i=1}^m h_{\theta_i,b_i}(\mathbf{x}) \end{aligned} $$ 对似然函数取对数,转换为: $$ -l(\Theta)=log(L(\Theta))=\sum\limits_{i=1}^m log(h_{\theta_i}(\mathbf{x})) +l(\Theta,\mathbf{b})=log(L(\Theta),\mathbf{b})=\sum\limits_{i=1}^m log(h_{\theta_i,b_i}(\mathbf{x})) $$ -对$log(h_{\theta_i}(\mathbf{x}))$求导得到: +对$log(h_{\theta_i,b_i}(\mathbf{x}))$对$z_k$求导得到: $$ -\frac{\partial{log(h_{\theta_i}(\mathbf{x}))}}{\partial{z_k}}=\begin{cases} -1-h_{\theta_k}(\mathbf{x}) & \text{ if } k=i \\\\ --h_{\theta_k}(\mathbf{x}) & else +\frac{\partial{log(h_{\theta_i,b_i}(\mathbf{x}))}}{\partial{z_k}}=\begin{cases} +1-h_{\theta_k,b_k}(\mathbf{x}) & \text{ if } k=i \\\\ +-h_{\theta_k,b_k}(\mathbf{x}) & else \end{cases} $$ 转换后的似然函数对$\theta$求偏导,在这里我们以只有一个训练样本的情况为例: $$ \begin{aligned} -\frac{\partial}{\partial\theta_k}l(\Theta)&=\frac{\partial l(\Theta)}{\partial{z_k}}\cdot \frac{\partial z_k}{\partial \theta_k} \\\\ -&=(y_k-h_{\theta_k}(\mathbf{x}))\mathbf{x} +\frac{\partial}{\partial\theta_k}l(\Theta,\mathbf{b})&=\frac{\partial l(\Theta,\mathbf{b})}{\partial{z_k}}\cdot \frac{\partial z_k}{\partial \theta_k} \\\\ +&=(y_k-h_{\theta_k,b_k}(\mathbf{x}))\mathbf{x} \end{aligned} $$ +对偏置项$b$求偏导与上述类似: +$$ +\frac{\partial}{\partial b_k}l(\Theta,\mathbf{b})=y_k-h_{\theta_k,b_k}(\mathbf{x}) +$$ 上式中$y_k$的表达式如下: $$ y_k=\begin{cases} @@ -53,8 +57,40 @@ y_k=\begin{cases} 0 & else \end{cases} $$ -此时,我们就可以写出最大化似然函数的更新方向,$\theta_k$的迭代表示为: +此时,我们就可以写出最大化似然函数的更新方向,$\theta_k$与$b_k$的迭代表示为: +$$ +\theta_k=\theta_k+\eta(\sum\limits_{i=1}^{m}(y_k-h_{\theta_k,b_k}(\mathbf{x}^i))\cdot \mathbf{x}^i) $$ -\theta_k=\theta_k+\eta(\sum\limits_{i=1}^{m}(y_k-h_{\theta_k}(\mathbf{x}^i))\cdot \mathbf{x}^i) $$ +b_k = b_k+\eta (\sum\limits_{i=1}^{m}(y_k-h_{\theta_k,b_k})) +$$ + 其中$\eta$为学习率,可以看到,当输出向量的维度等于2时,即二分类时,上式与二分类中权重向量的迭代公式相等。 + + + +## 三、运行结果 + +​ 使用的数据是sklearn中的digital数据,其每一个样本由64个像素组成,输出结果是0-9中的一个数。由于输入和输出都是一个高维向量,最后结果采用confusion matrix可视化出来,其主对角线上的个数为预测正确的数目,其余位置上的元素为预测失败的样本个数。 + +​ 由于sklearn中的digital数据有1700多个样本数据,我们将前1200多个样本作为训练数据,最后500个作为测试数据,分别采用自己实现的softmax回归方法以及sklearn内置的OVR多分类方法进行训练并预测。 + +softmax回归的confusion matrix: + +![](images/predict_result_softmax.png) + +softmax回归在训练数据上的预测精度以及在测试数据上的预测精度为: + +![](images/accuracy_softmax.png) + + + +使用sklearn内置的多分类方法运行结果的confusion matrix: + +![](images/predict_result_sklearn.png) + +sklearn内置的多分类方法在训练数据上的预测精度以及在测试数据上的预测精度为: + +![](images/accuracy_sklearn.png) + +从上面的运行结果中来看,自己实现的softmax方法以及sklearn内置的多分类方法最后在测试数据上的预测精度都达到了0.9以上,二者的差距非常的小。 diff --git a/homework_04_logistic_regression/homework/code/multi_classification.py b/homework_04_logistic_regression/homework/code/multi_classification.py new file mode 100644 index 0000000..49bd204 --- /dev/null +++ b/homework_04_logistic_regression/homework/code/multi_classification.py @@ -0,0 +1,39 @@ +''' +Author: SJ2050 +Date: 2021-11-21 17:22:02 +LastEditTime: 2021-11-21 22:05:09 +Version: v0.0.1 +Description: Use softmax regression method to solve multiclass classification problems. +Copyright © 2021 SJ2050 +''' + +import matplotlib.pyplot as plt +from sklearn.datasets import load_digits +from sklearn.linear_model import LogisticRegression +from sklearn.metrics import confusion_matrix +from sklearn.metrics import accuracy_score +from softmax_regression import SoftmaxRegression + +# load data +digits = load_digits() +x_train = digits.data[:-500] +y_train = digits.target[:-500] +softmax_reg = SoftmaxRegression() +softmax_reg.train(x_train, y_train) + +# plot confusion matrix +x_test = digits.data[-500:] +y_test = digits.target[-500:] +pred_train = softmax_reg.predict(x_train) +pred_test = softmax_reg.predict(x_test) + +print(f'accuracy train = {accuracy_score(y_train, pred_train)}') +print(f'accuracy test = {accuracy_score(y_test, pred_test)}') + +cm = confusion_matrix(y_test, pred_test) +plt.matshow(cm) +plt.title(u'Confusion Matrix') +plt.colorbar() +plt.ylabel(u'Groundtruth') +plt.xlabel(u'Predict') +plt.show() diff --git a/homework_04_logistic_regression/homework/code/sklearn_regression.py b/homework_04_logistic_regression/homework/code/sklearn_regression.py new file mode 100644 index 0000000..95263a1 --- /dev/null +++ b/homework_04_logistic_regression/homework/code/sklearn_regression.py @@ -0,0 +1,38 @@ +''' +Author: SJ2050 +Date: 2021-11-21 18:24:41 +LastEditTime: 2021-11-21 18:50:47 +Version: v0.0.1 +Description: Use sklearn to solve logistic regression problems. +Copyright © 2021 SJ2050 +''' +import matplotlib.pyplot as plt +from sklearn.datasets import load_digits +from sklearn.linear_model import LogisticRegression +from sklearn.metrics import confusion_matrix +from sklearn.metrics import accuracy_score + +# load data +digits = load_digits() +x_train = digits.data[:-500] +y_train = digits.target[:-500] + +log_reg=LogisticRegression() +log_reg.fit(x_train, y_train) + +# plot confusion matrix +x_test = digits.data[-500:] +y_test = digits.target[-500:] +pred_train = log_reg.predict(x_train) +pred_test = log_reg.predict(x_test) + +print(f'accuracy train = {accuracy_score(y_train, pred_train)}') +print(f'accuracy test = {accuracy_score(y_test, pred_test)}') + +cm = confusion_matrix(y_test, pred_test) +plt.matshow(cm) +plt.title(u'Confusion Matrix') +plt.colorbar() +plt.ylabel(u'Groundtruth') +plt.xlabel(u'Predict') +plt.show() diff --git a/homework_04_logistic_regression/homework/code/softmax_regression.py b/homework_04_logistic_regression/homework/code/softmax_regression.py new file mode 100644 index 0000000..3e5e125 --- /dev/null +++ b/homework_04_logistic_regression/homework/code/softmax_regression.py @@ -0,0 +1,106 @@ +''' +Author: SJ2050 +Date: 2021-11-21 17:06:31 +LastEditTime: 2021-11-21 22:29:52 +Version: v0.0.1 +Description: Softmax regerssion. +Copyright © 2021 SJ2050 +''' +import numpy as np + +def softmax(Z): + assert(len(Z.shape) == 2 and Z.shape[1] == 1, 'Z should be a column vector!') + Z_exp = np.exp(Z) + return Z_exp/Z_exp.sum(0, keepdims=True) + +class SoftmaxRegression(): + def __init__(self): + self.is_trained = False + pass + + def train(self, train_data, train_label, num_iterations=150, alpha=0.01): + self.train_data = train_data + self.train_label = train_label + self.classes = np.unique(self.train_label) + self.out_dim = len(self.classes) + + train_data_num, self.inp_dim = np.shape(self.train_data) + self.weights = np.random.random((self.inp_dim, self.out_dim)) + self.b = np.random.random((self.out_dim, 1)) + + y = lambda k, cls: 1 if k == cls else 0 + weights_grad = [[] for i in range(self.out_dim)] + for j in range(num_iterations): + # print(f'iteration: {j}') + data_index = list(range(train_data_num)) + for i in range(train_data_num): + rand_index = int(np.random.uniform(0, len(data_index))) + # x_vec = np.vstack(self.train_data[rand_index]) + x_vec = self.train_data[rand_index].reshape(-1, 1) + softmax_values = softmax(np.dot(self.weights.T, x_vec)+self.b)[:, 0] + label =self.train_label[rand_index] + cls = np.argwhere(self.classes == label)[0][0] + error = lambda k: y(k, cls)-softmax_values[k] + + for k in range(self.out_dim): + err = error(k) + # self.weights += np.pad(alpha*err*x_vec, ((0, 0), (k, self.out_dim-1-k)), \ + # 'constant', constant_values=0) + weights_grad[k] = (alpha*err*x_vec)[:, 0] + # print(self.weights) + self.b[k, 0] += alpha*err + self.weights += np.array(weights_grad).T + + del(data_index[rand_index]) + + self.is_trained = True + + + def predict(self, predict_data): + if self.is_trained: + predict_num = len(predict_data) + result = np.empty(predict_num) + for i in range(predict_num): + # x_vec = np.vstack(predict_data[i]) + x_vec = predict_data[i].reshape(-1, 1) + result[i] = self.classes[np.argmax(softmax(np.dot(self.weights.T, x_vec)+self.b))] + + return result + else: + print('Need training before predicting!!') + +if __name__ == '__main__': + # test binary classsfication + import matplotlib.pyplot as plt + import sklearn.datasets + from sklearn.metrics import accuracy_score + + def plot_decision_boundary(predict_func, data, label): + """画出结果图 + Args: + pred_func (callable): 预测函数 + data (numpy.ndarray): 训练数据集合 + label (numpy.ndarray): 训练数据标签 + """ + x_min, x_max = data[:, 0].min() - .5, data[:, 0].max() + .5 + y_min, y_max = data[:, 1].min() - .5, data[:, 1].max() + .5 + h = 0.01 + + xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) + + Z = predict_func(np.c_[xx.ravel(), yy.ravel()]) + Z = Z.reshape(xx.shape) + + plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral) #画出登高线并填充 + plt.scatter(data[:, 0], data[:, 1], c=label, cmap=plt.cm.Spectral) + plt.show() + + data, label = sklearn.datasets.make_moons(200, noise=0.30) + plt.scatter(data[:,0], data[:,1], c=label) + plt.title("Original Data") + + softmax_reg = SoftmaxRegression() + softmax_reg.train(data, label, 200) + plot_decision_boundary(lambda x: softmax_reg.predict(x), data, label) + y_train = softmax_reg.predict(data) + print(f'accuracy train = {accuracy_score(label, y_train)}') diff --git a/homework_04_logistic_regression/homework/images/accuracy_sklearn.png b/homework_04_logistic_regression/homework/images/accuracy_sklearn.png new file mode 100644 index 0000000..4f56c24 Binary files /dev/null and b/homework_04_logistic_regression/homework/images/accuracy_sklearn.png differ diff --git a/homework_04_logistic_regression/homework/images/accuracy_softmax.png b/homework_04_logistic_regression/homework/images/accuracy_softmax.png new file mode 100644 index 0000000..5b29fdd Binary files /dev/null and b/homework_04_logistic_regression/homework/images/accuracy_softmax.png differ diff --git a/homework_04_logistic_regression/homework/images/predict_result_sklearn.png b/homework_04_logistic_regression/homework/images/predict_result_sklearn.png new file mode 100644 index 0000000..4e5a8cf Binary files /dev/null and b/homework_04_logistic_regression/homework/images/predict_result_sklearn.png differ diff --git a/homework_04_logistic_regression/homework/images/predict_result_softmax.png b/homework_04_logistic_regression/homework/images/predict_result_softmax.png new file mode 100644 index 0000000..234f32c Binary files /dev/null and b/homework_04_logistic_regression/homework/images/predict_result_softmax.png differ