This page looks best with JavaScript enabled

CS231 Convolutional Neural Networks for Visual Recognition——Stanford

 ·  ☕ 2 min read · 👀... views

Where is article comes from?
An idea of a pornography detection project for ciscn 2021. And when searching on Google, I came accross a report from cs231n named Combining CNNs for detecting pornography in the absence of labeled training data.

历史介绍

人们提出一个问题:我们是否具备了识别真实世界中的每一个物体的能力。这个问题也是由机器学习中大部分的机器学习算法当我们的训练数据量不够时,很可能很快就会会过拟合的现象驱动。我们开展了一个叫ImageNet的项目,包含世界万物 组成一个尽可能大的数据集。下载了很多很多图片,organized by the dictionary called WordNet.

The winning algorithm of the 2012’s ImageNet Large-Scale VIsual Recognition Challenge is a convolutional neural network model.

And in 2015 it got really crazy and paper from MS called Residual Networks(残差网络) were 152 layers.

Image Classification pipeline

Nearest Neighbor(最近邻):

Predict the label of the most similar training image

如何判断最近?

L1 distance【也叫曼哈顿距离,街区距离】: $d_1(I_1,I_2) = \sum_p |I_1 ^P - I_2 ^P|$ 每个像素差绝对值然后求和

最近邻复杂度:Train O(1) predict O(n)

这是不好的,我们希望预测的快一点,训练的时候慢一点没关系

K-Nearest Neighbors(K最近邻):

最近邻得到的是突兀的边界,因为他取的是单个点和单个点之间的垂直平分线。

Instead of copying label from nearest neighbor, take majority vote from K closest points.

这样可以找到一个泛化能力更好的平滑边界

L2 distance 【两点距离公式】:$d_2(I_1,I_2) = \sqrt {\sum_p(I_1 ^P-I_2 ^P) ^2}$ 这是一个圆。而L1距离是正方形。

L1距离对坐标轴的旋转比较敏感(比较适合坐标轴定义明确的东西)

vision.stanford.edu/teaching/cs231n-demos/knn/

超参数:是由我们设置的而不是机器学习的

超参数的选择:1. 全是数据集 2.训练集 测试集 3.训练集 验证集 测试集(better)4.交叉验证(小数据集有用,深度学习用的不多)

KNN从来不在图像上使用,因为速度很慢,而且图片进行一些操作指标还是一模一样(distance metrics on pixels are not informative)维数灾难 指数增加(curse of dimensionality)

线性分类器在神经网络中很常用

visual viewpoint

https://playground.tensorflow.org/

http://vision.stanford.edu/teaching/cs231n-demos/linear-classify/

线性分类、损失函数和梯度下降

the SVM loss has the form:

$L_i = \sum_{j\neq y_i}max(0,s_j-s_{y_i}+1)$

正则化:通过$\lambda$ 控制参数更小,以减少过拟合

Simple examples:
L2 regularization: $R(W) = \sum_k\sum_lW_{k,l} ^2$
L1 regularization: $R(W) = \sum_k\sum_l|W_{k,l}|$
Elastic net(L1+L2): $R(W) = \sum_k\sum_l \beta W_{k,l} ^2+|W_{k,l}|$

More complex:
Dropout
Batch normalization
Stochastic depth, fractional pooling, etc

Softmax Classifier(多分类逻辑回归)

Softmax function: $P(Y=k|X=x_i) = \frac {e ^sk}{\sum_je ^sj}$

先exp然后归一化, 分数改变为概率

之后接入交叉熵损失函数
maximum likelihood estimation:
choose weights to maximize the likelihood of the observed data

梯度下降

数值解和解析解 解析解更快但容易错

提取图片特征输入而非直接像素点输入,如HOG(Histogram of Oriented Gradients), 坐标变化(直角坐标转化为极坐标 可以解决同心圆问题)

Share on

ruokeqx
WRITTEN BY
ruokeqx