矩阵运算

矩阵运算

基础

相加

只要矩阵的形状一样,我们可以把两个矩阵相加。两个矩阵相加是指对应位置的元素相加,比如 $C{i, j}=A{i, j}+B_{i, j}$。

转置(Transpose)

转置(transpose)是矩阵的重要操作之一。矩阵的转置是以对角线为轴的镜像,这条从左上角到右下角的对角线被称为主对角线(main diagonal)。

(A)i,j=Aj,i\left(A^{\top}\right)_{i, j}=A_{j, i}

向量可以看作只有一列的矩阵。对应地,向量的转置可以看作是只有一行的矩阵。有时,我们通过将向量元素作为行矩阵写在文本行中,然后使用转置操作将其变为标准的列向量,来定义一个向量,比如:

x=[x1,x2,x3]\boldsymbol{x}=\left[x_{1}, x_{2}, x_{3}\right]^{\top}

标量可以看作是只有一个元素的矩阵。因此,标量的转置等于它本身,$a=a^{\top}$

标准乘积

标量和矩阵相乘,或是和矩阵相加时,我们只需将其与矩阵的每个元素相乘或相加,比如 $D=a \cdot B+c$,其中 $D{i, j}=a \cdot B{i, j}+c$。两个矩阵 $A$ 和 $B$ 的矩阵乘积 (matrix product)是第三个矩阵 $C$。,矩阵 $A$ 的列数必须和矩阵 $B$ 的行数相等。如果矩阵 $A$ 的形状是 $m \times n$ ,矩阵 $B$ 的形状是 $n \times p$,那么矩阵 $C$ 的形状就是 $m \times p$。

C=ABCi,j=kAi,kBk,jC=A B \\ C_{i, j}=\sum_{k} A_{i, k} B_{k, j}

矩阵乘积服从分配律、结合律,以及简单的转置:

A(B+C)=AB+ACA(BC)=(AB)C(AB)=BAA(B+C)=A B+A C \\ A(B C)=(A B) C \\ (A B)^{\top}=B^{\top} A^{\top}

但是不满足与交换律。

Hadamard Product | 逐元素积

AB=[a1,1b1,1a1,2b1,2a1,nb1,na2,1b2,1a2,2b2,2a2,nb2,nam,1bm,1am,2bm,2am,nbm,n]\mathbf{A} \circ \mathbf{B}=\left[\begin{array}{cccc}{a_{1,1} b_{1,1}} & {a_{1,2} b_{1,2}} & {\cdots} & {a_{1, n} b_{1, n}} \\ {a_{2,1} b_{2,1}} & {a_{2,2} b_{2,2}} & {\cdots} & {a_{2, n} b_{2, n}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {a_{m, 1} b_{m, 1}} & {a_{m, 2} b_{m, 2}} & {\cdots} & {a_{m, n} b_{m, n}}\end{array}\right]

Kronnecker Product | 克罗内积

AB=[a1,1Ba1,2Ba1,nBa2,1Ba2,2Ba2,nBam,1Bam,2Bam,nB]\mathbf{A} \otimes \mathbf{B}=\left[\begin{array}{cccc}{a_{1,1} \mathbf{B}} & {a_{1,2} \mathbf{B}} & {\cdots} & {a_{1, n} \mathbf{B}} \\ {a_{2,1} \mathbf{B}} & {a_{2,2} \mathbf{B}} & {\cdots} & {a_{2, n} \mathbf{B}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {a_{m, 1} \mathbf{B}} & {a_{m, 2} \mathbf{B}} & {\cdots} & {a_{m, n} \mathbf{B}}\end{array}\right]

矩阵逆

单位矩阵

范数与迹

F 范数

矩阵 $\mathbf{A}=\left(a{i, j}\right){m \times n}$,那么其 $F$ 范数的定义为:

AF=i,jai,j2\|\mathbf{A}\|_{F}=\sqrt{\sum_{i, j} a_{i, j}^{2}}

$F$ 范数可以看做向量的 $L_2$ 范数的推广。

矩阵 $\mathbf{A}=\left(a{i, j}\right){m \times n}$,那么 $A$ 的迹为:

tr(A)=iai,i\operatorname{tr}(\mathbf{A})=\sum_{i} a_{i, i}

一般来说,迹会满足如下性质:

  • $A$ 的 F 范数等于 $AA^T$ 的迹的平方根:$|\mathbf{A}|_{F}=\sqrt{\operatorname{tr}\left(\mathbf{A} \mathbf{A}^{T}\right)}$

  • $A$ 的迹等于 $A^T$ 的迹,$\operatorname{tr}(\mathbf{A})=\operatorname{tr}\left(\mathbf{A}^{T}\right)$

  • 交换律:假设 $\mathbf{A} \in \mathbb{R}^{m \times n}, \mathbf{B} \in \mathbb{R}^{n \times m}$,则有:$\operatorname{tr}(\mathbf{A} \mathbf{B})=\operatorname{tr}(\mathbf{B} \mathbf{A})$

  • 结合律:$\operatorname{tr}(\mathbf{A} \mathbf{B} \mathbf{C})=\operatorname{tr}(\mathbf{C} \mathbf{A} \mathbf{B})=\operatorname{tr}(\mathbf{B} \mathbf{C} \mathbf{A})$

函数与导数

函数计算

如果 $f$ 是一元函数,则其逐向量函数为:

f(x)=(f(x1),f(x2),,f(xn))Tf(\overrightarrow{\mathbf{x}})=\left(f\left(x_{1}\right), f\left(x_{2}\right), \cdots, f\left(x_{n}\right)\right)^{T}

其逐矩阵函数为:

f(X)=[f(x1,1)f(x1,2)f(x1,n)f(x2,1)f(x2,2)f(x2,n)f(xm,1)f(xm,2)f(xm,n)]f(\mathbf{X})=\left[\begin{array}{cccc}{f\left(x_{1,1}\right)} & {f\left(x_{1,2}\right)} & {\cdots} & {f\left(x_{1, n}\right)} \\ {f\left(x_{2,1}\right)} & {f\left(x_{2,2}\right)} & {\cdots} & {f\left(x_{2, n}\right)} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {f\left(x_{m, 1}\right)} & {f\left(x_{m, 2}\right)} & {\cdots} & {f\left(x_{m, n}\right)}\end{array}\right]

其逐元导数分别为:

f(x)=(f(x1),f(x2),,f(xn))Tf^{\prime}(\overrightarrow{\mathbf{x}})=\left(f^{\prime}(x 1), f^{\prime}(x 2), \cdots, f^{\prime}\left(x_{n}\right)\right)^{T}
f(X)=[f(x1,1)f(x1,2)f(x1,n)f(x2,1)f(x2,2)f(x2,n)f(xm,1)f(xm,2)f(xm,n)]f^{\prime}(\mathbf{X})=\left[\begin{array}{cccc}{f^{\prime}\left(x_{1,1}\right)} & {f^{\prime}\left(x_{1,2}\right)} & {\cdots} & {f^{\prime}\left(x_{1, n}\right)} \\ {f^{\prime}\left(x_{2,1}\right)} & {f^{\prime}\left(x_{2,2}\right)} & {\cdots} & {f^{\prime}\left(x_{2, n}\right)} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {f^{\prime}\left(x_{m, 1}\right)} & {f^{\prime}\left(x_{m, 2}\right)} & {\cdots} & {f^{\prime}\left(x_{m, n}\right)}\end{array}\right]

偏导数

  • 标量对标量的偏导数:$\frac{\partial u}{\partial v}$

  • 标量对向量($n$ 维向量)的偏导数:

uv=(uv1,uv2,,uvn)T\frac{\partial u}{\partial \overrightarrow{\mathbf{v}}}=\left(\frac{\partial u}{\partial v_{1}}, \frac{\partial u}{\partial v_{2}}, \cdots, \frac{\partial u}{\partial v_{n}}\right)^{T}
  • 标量对矩阵 $m \times n$ 的偏导数:

uV=[uV1,1uV1,2uV1,nuV2,1uV2,2uV2,nuVm,1uVm,2uVm,n]\frac{\partial u}{\partial \mathbf{V}}=\left[\begin{array}{cccc}{\frac{\partial u}{\partial V_{1,1}}} & {\frac{\partial u}{\partial V_{1,2}}} & {\cdots} & {\frac{\partial u}{\partial V_{1, n}}} \\ {\frac{\partial u}{\partial V_{2,1}}} & {\frac{\partial u}{\partial V_{2,2}}} & {\cdots} & {\frac{\partial u}{\partial V_{2, n}}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {\frac{\partial u}{\partial V_{m, 1}}} & {\frac{\partial u}{\partial V_{m, 2}}} & {\cdots} & {\frac{\partial u}{\partial V_{m, n}}}\end{array}\right]
  • $m$ 维向量对标量的偏导数:

uv=(u1v,u2v,,umv)T\frac{\partial \overrightarrow{\mathbf{u}}}{\partial v}=\left(\frac{\partial u_{1}}{\partial v}, \frac{\partial u_{2}}{\partial v}, \cdots, \frac{\partial u_{m}}{\partial v}\right)^{T}
  • $m$ 维向量对 $n$ 维向量的偏导数(雅可比矩阵,行优先):

uv=[u1v1u1v2u1vnu2v1u2v2u2vnumv1umv2umvn]\frac{\partial \overrightarrow{\mathbf{u}}}{\partial \overrightarrow{\mathbf{v}}}=\left[\begin{array}{cccc}{\frac{\partial u_{1}}{\partial v_{1}}} & {\frac{\partial u_{1}}{\partial v_{2}}} & {\cdots} & {\frac{\partial u_{1}}{\partial v_{n}}} \\ {\frac{\partial u_{2}}{\partial v_{1}}} & {\frac{\partial u_{2}}{\partial v_{2}}} & {\cdots} & {\frac{\partial u_{2}}{\partial v_{n}}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {\frac{\partial u_{m}}{\partial v_{1}}} & {\frac{\partial u_{m}}{\partial v_{2}}} & {\cdots} & {\frac{\partial u_{m}}{\partial v_{n}}}\end{array}\right]
  • $m \times n$ 阶矩阵对于标量的偏导数:

Uv=[U1,1vU1,2vU1,nvU2,1vU2,2vU2,nvUm,1vUm,2vUm,nv]\frac{\partial \mathbf{U}}{\partial v}=\left[\begin{array}{cccc}{\frac{\partial U_{1,1}}{\partial v}} & {\frac{\partial U_{1,2}}{\partial v}} & {\cdots} & {\frac{\partial U_{1, n}}{\partial v}} \\ {\frac{\partial U_{2,1}}{\partial v}} & {\frac{\partial U_{2,2}}{\partial v}} & {\cdots} & {\frac{\partial U_{2, n}}{\partial v}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {\frac{\partial U_{m, 1}}{\partial v}} & {\frac{\partial U_{m, 2}}{\partial v}} & {\cdots} & {\frac{\partial U_{m, n}}{\partial v}}\end{array}\right]