Add some references

dev
Shuhui Bu 7 years ago
parent 881a67acf5
commit 2a37c01599

@ -20,13 +20,20 @@ git pull upstream master
## 作业
1. [Python基础](homework_01_python/README.md)
2. [numpy & matplotlib](homework_02_numpy_matplotlib/README.md)
## 报告
1. [交通事故理赔审核预测](report_01_交通事故理赔审核预测/)
3. [Titanic](report_03_Titanic/)
## 使用帮助
* Git
* [使用码云提交作业的说明](help/gitee_homework_usage.md)
* [Git使用教程](help/Git使用教程_PILAB.pdf)
* [Git快速入门 - Git初体验](https://my.oschina.net/dxqr/blog/134811)
* [在win7系统下使用TortoiseGit(乌龟git)简单操作Git](https://my.oschina.net/longxuu/blog/141699)
* [Git系统学习 - 廖雪峰的Git教程](https://my.oschina.net/dxqr/blog/134811)
* Markdown
* [Markdown——入门指南](https://www.jianshu.com/p/1e402922ee32)

Binary file not shown.

@ -0,0 +1,66 @@
## 1. 数值计算 numpy
### 1对于一个存在在数组如何添加一个用0填充的边界?
例如对一个二维矩阵
```
10, 34, 54, 23
31, 87, 53, 68
98, 49, 25, 11
84, 32, 67, 88
```
变换成
```
0, 0, 0, 0, 0, 0
0, 10, 34, 54, 23, 0
0, 31, 87, 53, 68, 0
0, 98, 49, 25, 11, 0
0, 84, 32, 67, 88, 0
0, 0, 0, 0, 0, 0
```
### 2 创建一个 5x5的矩阵并设置值1,2,3,4落在其对角线下方位置
### 3 创建一个8x8 的矩阵并且设置成国际象棋棋盘样式黑可以用0, 白可以用1
### 4求解线性方程组
给定一个方程组,如何求出其的方程解。有多种方法,分析各种方法的优缺点(最简单的方式是消元方)。
例如
```
3x + 4y + 2z = 10
5x + 3y + 4z = 14
8x + 2y + 7z = 20
```
编程写出求解的程序
### 5 翻转一个数组(第一个元素变成最后一个)
### 6 产生一个十乘十随机数组,并且找出最大和最小值
## Matplotlib
## (1) 画出一个二次函数,同时画出梯形法求积分时的各个梯形
例如:
![matplot_ex1](images/matplot_ex1.png)
## 2 绘制函数 $f(x) = sin^2(x - 2) e^{-x^2}$
需要画出标题xy轴。x的取值范围是[0, 2]
![matplot_ex2](images/matplot_ex2.png)
## Reference
* [100 numpy exercises](https://github.com/rougier/numpy-100)

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

File diff suppressed because one or more lines are too long

@ -0,0 +1,169 @@
# -*- coding: utf-8 -*-
# ---
# jupyter:
# jupytext_format_version: '1.2'
# kernelspec:
# display_name: Python 3
# language: python
# name: python3
# language_info:
# codemirror_mode:
# name: ipython
# version: 3
# file_extension: .py
# mimetype: text/x-python
# name: python
# nbconvert_exporter: python
# pygments_lexer: ipython3
# version: 3.5.2
# ---
# # Exercise - 交通事故理赔审核预测
#
#
# 这个比赛的链接http://sofasofa.io/competition.php?id=2
#
#
# * 任务类型:二元分类
#
# * 背景介绍在交通摩擦事故发生后理赔员会前往现场勘察、采集信息这些信息往往影响着车主是否能够得到保险公司的理赔。训练集数据包括理赔人员在现场对该事故方采集的36条信息信息已经被编码以及该事故方最终是否获得理赔。我们的任务是根据这36条信息预测该事故方没有被理赔的概率。
#
# * 数据介绍训练集中共有200000条样本预测集中有80000条样本。
# ![data_description](images/data_description.png)
#
# * 评价方法Precision-Recall AUC
#
# ## Demo code
#
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
# %matplotlib inline
# read data
homePath = "data"
trainPath = os.path.join(homePath, "train.csv")
testPath = os.path.join(homePath, "test.csv")
submitPath = os.path.join(homePath, "sample_submit.csv")
trainData = pd.read_csv(trainPath)
testData = pd.read_csv(testPath)
submitData = pd.read_csv(submitPath)
# 参照数据说明CaseID这列是没有意义的编号因此这里将他丢弃。
#
# ~drop()函数axis指沿着哪个轴0为行1为列inplace指是否在原数据上直接操作
#
# 去掉没有意义的一列
trainData.drop("CaseId", axis=1, inplace=True)
testData.drop("CaseId", axis=1, inplace=True)
# # 快速了解数据
#
# ~head()默认显示前5行数据可指定显示多行例如.head(15)显示前15行
#
trainData.head(15)
# 显示数据简略信息,可以每列有多少非空的值,以及每列数据对应的数据类型。
#
#
trainData.info()
# ~hist():绘制直方图参数figsize可指定输出图片的尺寸。
#
trainData.hist(figsize=(20, 20))
# 想要了解特征之间的相关性,可计算相关系数矩阵。然后可对某个特征来排序。
#
#
corr_matrix = trainData.corr()
corr_matrix["Evaluation"].sort_values(ascending=False) # ascending=False 降序排列
# 从训练集中分离标签
y = trainData['Evaluation']
trainData.drop("Evaluation", axis=1, inplace=True)
# 使用K-Means训练模型
#
# KMeans()
# * `n_clusters`指要预测的有几个类;
# * `init`指初始化中心的方法,默认使用的是`k-means++`方法而非经典的K-means方法的随机采样初始化当然你可以设置为random使用随机初始化
# * `n_jobs`指定使用CPU核心数-1为使用全部CPU。
# +
# do k-means
from sklearn.cluster import KMeans
est = KMeans(n_clusters=2, init="k-means++", n_jobs=-1)
est.fit(trainData, y)
y_train = est.predict(trainData)
y_pred = est.predict(testData)
# 保存预测的结果
submitData['Evaluation'] = y_pred
submitData.to_csv("submit_data.csv", index=False)
# +
# calculate accuracy
from sklearn.metrics import accuracy_score
acc_train = accuracy_score(y, y_train)
print("acc_train = %f" % (acc_train))
# -
# ## 随机森林
#
# 使用K-means可能得到的结果没那么理想。在官网上举办方给出了两个标杆模型效果最好的是随机森林。以下是代码读者可以自己测试。
#
#
# +
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# 读取数据
train = pd.read_csv("data/train.csv")
test = pd.read_csv("data/test.csv")
submit = pd.read_csv("data/sample_submit.csv")
# 删除id
train.drop('CaseId', axis=1, inplace=True)
test.drop('CaseId', axis=1, inplace=True)
# 取出训练集的y
y_train = train.pop('Evaluation')
# 建立随机森林模型
clf = RandomForestClassifier(n_estimators=100, random_state=0)
clf.fit(train, y_train)
y_pred = clf.predict_proba(test)[:, 1]
# 输出预测结果至my_RF_prediction.csv
submit['Evaluation'] = y_pred
submit.to_csv('my_RF_prediction.csv', index=False)
# +
# freature importances
print(clf.feature_importances_)
# Train accuracy
from sklearn.metrics import accuracy_score
y_train_pred = clf.predict(train)
print(y_train_pred)
acc_train = accuracy_score(y_train, y_train_pred)
print("acc_train = %f" % (acc_train))

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

@ -0,0 +1,6 @@
{
"cells": [],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,71 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Titanic\n",
"\n",
"## Competition Description\n",
"The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.\n",
"\n",
"One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.\n",
"\n",
"In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.\n",
"\n",
"## Practice Skills\n",
"* Binary classification\n",
"* Python & SKLearn\n",
"\n",
"## Data\n",
"The data has been split into two groups:\n",
"\n",
"* training set (train.csv)\n",
"* test set (test.csv)\n",
"\n",
"The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the `ground truth`) for each passenger. Your model will be based on `features` like passengers' gender and class. You can also use feature engineering to create new features.\n",
"\n",
"The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.\n",
"\n",
"We also include `gender_submission.csv`, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.\n",
"\n",
"### Data description\n",
"![data description1](images/data_description1.png)\n",
"![data description2](images/data_description2.png)\n",
"\n",
"\n",
"### Variable Notes\n",
"pclass: A proxy for socio-economic status (SES)\n",
"* 1st = Upper\n",
"* 2nd = Middle\n",
"* 3rd = Lower\n",
"\n",
"\n",
"## Links\n",
"* [Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
},
"main_language": "python"
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,58 @@
# ---
# jupyter:
# jupytext_format_version: '1.2'
# kernelspec:
# display_name: Python 3
# language: python
# name: python3
# language_info:
# codemirror_mode:
# name: ipython
# version: 3
# file_extension: .py
# mimetype: text/x-python
# name: python
# nbconvert_exporter: python
# pygments_lexer: ipython3
# version: 3.5.2
# ---
# # Titanic
#
# ## Competition Description
# The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.
#
# One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.
#
# In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.
#
# ## Practice Skills
# * Binary classification
# * Python & SKLearn
#
# ## Data
# The data has been split into two groups:
#
# * training set (train.csv)
# * test set (test.csv)
#
# The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the `ground truth`) for each passenger. Your model will be based on `features` like passengers' gender and class. You can also use feature engineering to create new features.
#
# The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.
#
# We also include `gender_submission.csv`, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.
#
# ### Data description
# ![data description1](images/data_description1.png)
# ![data description2](images/data_description2.png)
#
#
# ### Variable Notes
# pclass: A proxy for socio-economic status (SES)
# * 1st = Upper
# * 2nd = Middle
# * 3rd = Lower
#
#
# ## Links
# * [Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic)

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.3 KiB

Loading…
Cancel
Save