Differentiating Matrix Expressions
DiffMatic
operates on expressions represented in Ricci-notation [1], which enables a precise treatment of tensor derivatives. After computing derivatives in Ricci-notation, the resulting expression can be converted back to standard notation. Derivatives written in standard matrix notation are compact and suitable for use with common libraries for linear algebra. DiffMatic
can also generate code for evaluating derivatives directly.
Matrices and Vectors as Tensors
Matrix expressions written in Julia syntax are automatically converted to Ricci-notation when using the types provided by DiffMatic
. Matrices, vectors and scalars are created using the corresponding macros:
using DiffMatic
@matrix A B C
@vector x y z
@scalar a b c
All of the above variables are represented using the same type. The modes of a tensor can be seen by printing the variable:
A
# output
A¹₂
Matrices have one upper and one lower index. Column vectors have one upper index:
x
# output
x¹
Scalars have no indices.
In Ricci-notation, and also Einstein notation, indices are usually written with letters. We use numbers for convenience.
Creating Tensor Expressions
Many common functions are overloaded for the tensor type. This enables creationg of expressions in Ricci-notation using common standard Julia syntax. Tensor indices are automatically updated as needed.
A * x # standard matrix-vector multiplication
# output
A¹₄x⁴
The same index appearing both as an upper and as a lower index indicates a contraction over that index. Multiplying with a scalar does not result in a contraction:
c * A
# output
cA¹₂
Supported operators and functions when creating expressions:
- Basic operators
+
,-
,'
,*
,^
,abs
,sin
,cos
andlog
- Element-wise operators
sin.
,cos.
,abs.
,.*
,.^
andlog.
- Vector 1-norm and 2-norm can be computed with
LinearAlgebra.norm(..., 1)
andLinearAlgebra.norm(..., 2)
- Sums of vectors can be computed with
sum
. - Matrix traces can be computed with
LinearAlgebra.tr
.
See Creating Expressions for more examples.
Computing Derivatives
Matrix derivatives are computed on expressions in Ricci-notation directly. DiffMatic
provides specialized functions for computing gradients, Jacobians and Hessians respectively.
gradient
and hessian
require a scalar as input:
expr = 2 * x' * A * B * x
g = gradient(expr, x) # The second argument denotes the differentiation variable.
# output
2A₄⁶B₆⁸x⁴ + 2B₆⁷x₇A⁸⁶
H = hessian(expr, x)
# output
2A₉⁶B₆⁸ + 2B₆₉A⁸⁶
jacobian
requires a column vector as input:
J = jacobian(x' * x * A * x, x)
# output
x₃x³A¹₆ + 2A¹₅x⁵x₆
See Derivatives in Standard Notation for more examples.
Converting Tensor Expressions to Standard Notation
An expression in Ricci-notation can be converted back to standard notation using to_std
. DiffMatic
can output a standard expression in string format or as a Julia function.
String Output
An expression can be converted to standard matrix notation and retrieved as a string by passing a StdStr
instance as a keyword argument format
to to_std
:
to_std(J; format = StdStr())
# output
"xᵀxA + 2Axxᵀ"
Special notation used in the output:
- "⊙": Element-wise multiplication.
- "⊘": Element-wise division.
- "diag(x)": Diagonal matrix with "x" on the diagonal.
- "vec(1)": Vector consisting of 1:s.
- "sgn(x)": The signum function applied element-wise.
- "tr(X)": Trace of the matrix "X".
- "abs(x)": Element-wise absolute value of "x".
Julia Function
A Julia function is generated by passing a JuliaFunc
instance as a keyword argument format
to to_std
. The function is returned in the form of an expression:
to_std(g; format = JuliaFunc())
# output
quote
#= ... =#
function (A, B, x)
#= ... =#
#= ... =#
return 2 * ((transpose(B) * transpose(A)) * x) + 2 * (A * (B * x))
end
end
The generated function can be compiled using eval
:
g_fun = eval(to_std(g; format = JuliaFunc()))
An = Float64[1 1 1; 2 2 2; 3 3 3]
Bn = Float64[4 4 4; 5 5 5; 6 6 6]
xn = Float64[1; 1; 1]
g_fun(An, Bn, xn)
# output
3-element Vector{Float64}:
270.0
360.0
450.0
The JuliaFunc
generator may use functions and primitives defined in LinearAlgebra
, such as e.g. diagm
and I
. These functions need to be explicitly imported by the user prior to usage:
g = gradient(log.(A * x)' * x, x)
g_fun = eval(to_std(g; format = JuliaFunc()))
An = exp(3) * ones(3,3)
xn = Float64[1; 0; 0]
g_fun(An, xn)
# output
ERROR: UndefVarError: `diagm` not defined
using LinearAlgebra: diagm
g_fun(An, xn)
# output
3-element Vector{Float64}:
4.0
4.0
4.0
The order of the arguments of the generated function can be specified through JuliaFunc
:
to_std(g; format = JuliaFunc([x, B, A]))
# output
quote
#= ... =#
function (x, B, A)
#= ... =#
#= ... =#
return 2 * ((transpose(B) * transpose(A)) * x) + 2 * (A * (B * x))
end
end
- [1]
- S. Laue, M. Mitterreiter and J. Giesen. Computing Higher Order Derivatives of Matrix and Tensor Expressions. In: Advances in Neural Information Processing Systems, Vol. 31 (Curran Associates, Inc., 2018).