Support Vector Machine (Classifier)

suggest readings:

  • Alpaydin: 10.3, 13.1, 13.2
  • Murphy: 14.5.2.2
  • Geron: chapter 5, appendix C

Alpaydin 阅读笔记

Geometry of the Linear Discriminant (线性可区分的图像)

  • 从一个简单的 two classes 情况开始解析,在这种情况下,单个的区分方程已经足够:
  • Discriminant between two classes: $$ \begin{equation}\label{Discriminant_between_two_classes} \begin{split} g(\chi) &= g_1(\chi) - g_2(\chi)\\ &= ((w_1)^T\chi + w_{10}) - ((w_2)^T\chi + w_{20})\\ &= (w_1 - w_2)^T\chi + (w_{10} - w_{20})\\ &= w^T\chi + w_0 \end{split} \end{equation} $$
  • we choose $$ \left. \begin{matrix} C_1,\text{ }g(\chi) > 0\\ C_2,\text{ }otherwise \end{matrix} \right\} $$

weight vector threshold

  • defines a hyperplane where $w$ is the weight vector and $w_0$ is the threshold.

Decision rule

  • choose $C_1$ if $w^T\chi > -w_0$, and choose $C_2$ otherwise. The hyperplane divides the input space into two half-spaces: decision region $R_1$ for $C_1$ and $R_2$ for $C_2$. Any $\chi$ in $R_1$ is on the positive side of the hyperplane and any $\chi$ in $R_2$ is on the negative side.

Decision bounduary equation

  • Let’s take 2 points on the decision surface; that is, $g(\chi_1) = g(\chi_2) = 0$, then $$ \begin{equation} \begin{split} w^T\chi_1 + w_0 &= w^T\chi_2 + w_0\\ w^T(\chi_1 - \chi_2) &= 0 \end{split} \end{equation} $$ and we see that $w$ is normal to any vector lying on the hyperplane. Let’s rewrite $\chi$ as $$ \begin{equation} \chi = \chi_p + r\frac{w}{|w|} \end{equation} $$
  • $\chi_p$: normal projection of $\chi$ onto the hyperplane
  • $r$: the distance from $\chi$ to the hyperplane
    • $<0$ if $\chi$ is on the negative side
    • $>0$ if $\chi$ is on the positive side