Derivatives of Vectors and Matrices
Context (title): On the Derivatives of Vectors and Matrices
In machine learning algorithms, you’ll encounter a lot of differentiation and derivatives related to matrices. Here, we introduce some common derivative formulas related to matrices and vectors.
Common data types can be divided into scalars, vectors, and matrices. The common differentiation operations include derivatives of scalars with respect to scalars/vectors/matrices, vectors with respect to scalars/vectors, and matrices with respect to scalars. Other derivative calculations result in tensors, which we will not discuss here.
For representing derivative results, there are two notations: Numerator-layout notation and Denominator-layout notation. Both notations are correct, and there is currently no standard. Recently, in “Convex Optimization,” the numerator-layout notation is commonly used.
In the case of numerator-layout, the dimensions of the relevant derivatives are shown in the table below:
If the derivative is between a vector and a scalar, or a matrix and a scalar, the result is a matrix or vector obtained by pointwise differentiation of the matrix or vector with respect to the scalar. The derivative between vectors results in a matrix known as the Jacobian matrix.
When differentiating a vector with respect to a scalar:
When differentiating a scalar with respect to a vector:
The differentiation between a matrix and a scalar is similar to the above, and when differentiating a scalar with respect to a matrix, transposition is also required.
When differentiating between two vectors, a Jacobian matrix is obtained:
Some common matrix derivative formulas are as follows: