Enzyme
Documentation for Enzyme.jl, the Julia bindings for Enzyme.
Enzyme performs automatic differentiation (AD) of statically analyzable LLVM. It is highly-efficient and its ability to perform AD on optimized code allows Enzyme to meet or exceed the performance of state-of-the-art AD tools.
Getting started
Enzyme.jl can be installed in the usual way Julia packages are installed:
] add Enzyme
The Enzyme binary dependencies will be installed automatically via Julia's binary artifact system.
The Enzyme.jl API revolves around the function autodiff
. For some common operations, Enzyme additionally wraps autodiff
in several convenience functions; e.g., gradient
and jacobian
.
The tutorial below covers the basic usage of these functions. For a complete overview of Enzyme's functionality, see the API reference documentation. Also see Implementing pullbacks on how to implement back-propagation for functions with non-scalar results.
We will try a few things with the following functions:
julia> rosenbrock(x, y) = (1.0 - x)^2 + 100.0 * (y - x^2)^2
rosenbrock (generic function with 1 method)
julia> rosenbrock_inp(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
rosenbrock_inp (generic function with 1 method)
Reverse mode
The return value of reverse mode autodiff
is a tuple that contains as a first value the derivative value of the active inputs and optionally the primal return value.
julia> autodiff(Reverse, rosenbrock, Active, Active(1.0), Active(2.0))
((-400.0, 200.0),)
julia> autodiff(ReverseWithPrimal, rosenbrock, Active, Active(1.0), Active(2.0))
((-400.0, 200.0), 100.0)
julia> x = [1.0, 2.0]
2-element Vector{Float64}:
1.0
2.0
julia> dx = [0.0, 0.0]
2-element Vector{Float64}:
0.0
0.0
julia> autodiff(Reverse, rosenbrock_inp, Active, Duplicated(x, dx))
((nothing,),)
julia> dx
2-element Vector{Float64}:
-400.0
200.0
Both the inplace and "normal" variant return the gradient. The difference is that with Active
the gradient is returned and with Duplicated
the gradient is accumulated in place.
Forward mode
The return value of forward mode with a Duplicated
return is a tuple containing as the first value the primal return value and as the second value the derivative.
In forward mode Duplicated(x, 0.0)
is equivalent to Const(x)
, except that we can perform more optimizations for Const
.
julia> autodiff(Forward, rosenbrock, Duplicated, Const(1.0), Duplicated(3.0, 1.0))
(400.0, 400.0)
julia> autodiff(Forward, rosenbrock, Duplicated, Duplicated(1.0, 1.0), Const(3.0))
(400.0, -800.0)
Of note, when we seed both arguments at once the tangent return is the sum of both.
julia> autodiff(Forward, rosenbrock, Duplicated, Duplicated(1.0, 1.0), Duplicated(3.0, 1.0))
(400.0, -400.0)
We can also use forward mode with our inplace method.
julia> x = [1.0, 3.0]
2-element Vector{Float64}:
1.0
3.0
julia> dx = [1.0, 1.0]
2-element Vector{Float64}:
1.0
1.0
julia> autodiff(Forward, rosenbrock_inp, Duplicated, Duplicated(x, dx))
(400.0, -400.0)
Note the seeding through dx
.
Vector forward mode
We can also use vector mode to calculate both derivatives at once.
julia> autodiff(Forward, rosenbrock, BatchDuplicated, BatchDuplicated(1.0, (1.0, 0.0)), BatchDuplicated(3.0, (0.0, 1.0)))
(400.0, (var"1" = -800.0, var"2" = 400.0))
julia> x = [1.0, 3.0]
2-element Vector{Float64}:
1.0
3.0
julia> dx_1 = [1.0, 0.0]; dx_2 = [0.0, 1.0];
julia> autodiff(Forward, rosenbrock_inp, BatchDuplicated, BatchDuplicated(x, (dx_1, dx_2)))
(400.0, (var"1" = -800.0, var"2" = 400.0))
Convenience functions
While the convenience functions discussed below use autodiff
internally, they are generally more limited in their functionality. Beyond that, these convenience functions may also come with performance penalties; especially if one makes a closure of a multi-argument function instead of calling the appropriate multi-argument autodiff
function directly.
Key convenience functions for common derivative computations are gradient
(and its inplace variant gradient!
) and jacobian
. Like autodiff
, the mode (forward or reverse) is determined by the first argument.
The functions gradient
and gradient!
compute the gradient of function with vector input and scalar return.
julia> gradient(Reverse, rosenbrock_inp, [1.0, 2.0])
2-element Vector{Float64}:
-400.0
200.0
julia> # inplace variant
dx = [0.0, 0.0];
gradient!(Reverse, dx, rosenbrock_inp, [1.0, 2.0])
2-element Vector{Float64}:
-400.0
200.0
julia> dx
2-element Vector{Float64}:
-400.0
200.0
julia> gradient(Forward, rosenbrock_inp, [1.0, 2.0])
(-400.0, 200.0)
julia> # in forward mode, we can also optionally pass a chunk size
# to specify the number of derivatives computed simulateneously
# using vector forward mode
chunk_size = Val(2)
gradient(Forward, rosenbrock_inp, [1.0, 2.0], chunk_size)
(-400.0, 200.0)
The function jacobian
computes the Jacobian of a function vector input and vector return.
julia> foo(x) = [rosenbrock_inp(x), prod(x)];
julia> output_size = Val(2) # here we have to provide the output size of `foo` since it cannot be statically inferred
jacobian(Reverse, foo, [1.0, 2.0], output_size)
2×2 Matrix{Float64}:
-400.0 200.0
2.0 1.0
julia> chunk_size = Val(2) # By specifying the optional chunk size argument, we can use vector inverse mode to propogate derivatives of multiple outputs at once.
jacobian(Reverse, foo, [1.0, 2.0], output_size, chunk_size)
2×2 Matrix{Float64}:
-400.0 200.0
2.0 1.0
julia> jacobian(Forward, foo, [1.0, 2.0])
2×2 Matrix{Float64}:
-400.0 200.0
2.0 1.0
julia> # Again, the optinal chunk size argument allows us to use vector forward mode
jacobian(Forward, foo, [1.0, 2.0], chunk_size)
2×2 Matrix{Float64}:
-400.0 200.0
2.0 1.0