Derivatives of Vectors and Matrices

Context (title): On the Derivatives of Vectors and Matrices

In machine learning algorithms, you’ll encounter a lot of differentiation and derivatives related to matrices. Here, we introduce some common derivative formulas related to matrices and vectors.

Common data types can be divided into scalars, vectors, and matrices. The common differentiation operations include derivatives of scalars with respect to scalars/vectors/matrices, vectors with respect to scalars/vectors, and matrices with respect to scalars. Other derivative calculations result in tensors, which we will not discuss here.

For representing derivative results, there are two notations: Numerator-layout notation and Denominator-layout notation. Both notations are correct, and there is currently no standard. Recently, in “Convex Optimization,” the numerator-layout notation is commonly used.

In the case of numerator-layout, the dimensions of the relevant derivatives are shown in the table below: Table of vector and matrix derivatives

If the derivative is between a vector and a scalar, or a matrix and a scalar, the result is a matrix or vector obtained by pointwise differentiation of the matrix or vector with respect to the scalar. The derivative between vectors results in a matrix known as the Jacobian matrix.

When differentiating a vector with respect to a scalar: Vector with respect to scalar derivative When differentiating a scalar with respect to a vector: Scalar with respect to vector derivative The differentiation between a matrix and a scalar is similar to the above, and when differentiating a scalar with respect to a matrix, transposition is also required.

When differentiating between two vectors, a Jacobian matrix is obtained: Vector with respect to vector derivative

Some common matrix derivative formulas are as follows: Vector with respect to vector derivative Vector with respect to scalar derivative Scalar with respect to vector derivative