Enzyme

Documentation for Enzyme.jl, the Julia bindings for Enzyme.

Enzyme performs automatic differentiation (AD) of statically analyzable LLVM. It is highly-efficient and its ability to perform AD on optimized code allows Enzyme to meet or exceed the performance of state-of-the-art AD tools.

Getting started

Enzyme.jl can be installed in the usual way Julia packages are installed:

] add Enzyme

The Enzyme binary dependencies will be installed automatically via Julia's binary artifact system.

The Enzyme.jl API revolves around the function autodiff. For some common operations, Enzyme additionally wraps autodiff in several convenience functions; e.g., gradient and jacobian.

The tutorial below covers the basic usage of these functions. For a complete overview of Enzyme's functionality, see the API reference documentation. Also see Implementing pullbacks on how to implement back-propagation for functions with non-scalar results.

We will try a few things with the following functions:

julia> rosenbrock(x, y) = (1.0 - x)^2 + 100.0 * (y - x^2)^2
rosenbrock (generic function with 1 method)

julia> rosenbrock_inp(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
rosenbrock_inp (generic function with 1 method)

Reverse mode

The return value of reverse mode autodiff is a tuple that contains as a first value the derivative value of the active inputs and optionally the primal return value.

julia> autodiff(Reverse, rosenbrock, Active, Active(1.0), Active(2.0))
((-400.0, 200.0),)

julia> autodiff(ReverseWithPrimal, rosenbrock, Active, Active(1.0), Active(2.0))
((-400.0, 200.0), 100.0)
julia> x = [1.0, 2.0]
2-element Vector{Float64}:
 1.0
 2.0

julia> dx = [0.0, 0.0]
2-element Vector{Float64}:
 0.0
 0.0

julia> autodiff(Reverse, rosenbrock_inp, Active, Duplicated(x, dx))
((nothing,),)

julia> dx
2-element Vector{Float64}:
 -400.0
  200.0

Both the inplace and "normal" variant return the gradient. The difference is that with Active the gradient is returned and with Duplicated the gradient is accumulated in place.

Forward mode

The return value of forward mode with a Duplicated return is a tuple containing as the first value the primal return value and as the second value the derivative.

In forward mode Duplicated(x, 0.0) is equivalent to Const(x), except that we can perform more optimizations for Const.

julia> autodiff(Forward, rosenbrock, Duplicated, Const(1.0), Duplicated(3.0, 1.0))
(400.0, 400.0)

julia> autodiff(Forward, rosenbrock, Duplicated, Duplicated(1.0, 1.0), Const(3.0))
(400.0, -800.0)

Of note, when we seed both arguments at once the tangent return is the sum of both.

julia> autodiff(Forward, rosenbrock, Duplicated, Duplicated(1.0, 1.0), Duplicated(3.0, 1.0))
(400.0, -400.0)

We can also use forward mode with our inplace method.

julia> x = [1.0, 3.0]
2-element Vector{Float64}:
 1.0
 3.0

julia> dx = [1.0, 1.0]
2-element Vector{Float64}:
 1.0
 1.0

julia> autodiff(Forward, rosenbrock_inp, Duplicated, Duplicated(x, dx))
(400.0, -400.0)

Note the seeding through dx.

Vector forward mode

We can also use vector mode to calculate both derivatives at once.

julia> autodiff(Forward, rosenbrock, BatchDuplicated, BatchDuplicated(1.0, (1.0, 0.0)), BatchDuplicated(3.0, (0.0, 1.0)))
(400.0, (var"1" = -800.0, var"2" = 400.0))

julia> x = [1.0, 3.0]
2-element Vector{Float64}:
 1.0
 3.0

julia> dx_1 = [1.0, 0.0]; dx_2 = [0.0, 1.0];

julia> autodiff(Forward, rosenbrock_inp, BatchDuplicated, BatchDuplicated(x, (dx_1, dx_2)))
(400.0, (var"1" = -800.0, var"2" = 400.0))

Convenience functions

Note

While the convenience functions discussed below use autodiff internally, they are generally more limited in their functionality. Beyond that, these convenience functions may also come with performance penalties; especially if one makes a closure of a multi-argument function instead of calling the appropriate multi-argument autodiff function directly.

Key convenience functions for common derivative computations are gradient (and its inplace variant gradient!) and jacobian. Like autodiff, the mode (forward or reverse) is determined by the first argument.

The functions gradient and gradient! compute the gradient of function with vector input and scalar return.

julia> gradient(Reverse, rosenbrock_inp, [1.0, 2.0])
2-element Vector{Float64}:
 -400.0
  200.0

julia> # inplace variant
       dx = [0.0, 0.0];
       gradient!(Reverse, dx, rosenbrock_inp, [1.0, 2.0])
2-element Vector{Float64}:
 -400.0
  200.0

julia> dx
2-element Vector{Float64}:
 -400.0
  200.0

julia> gradient(Forward, rosenbrock_inp, [1.0, 2.0])
(-400.0, 200.0)

julia> # in forward mode, we can also optionally pass a chunk size
       # to specify the number of derivatives computed simulateneously
       # using vector forward mode
       chunk_size = Val(2)
       gradient(Forward, rosenbrock_inp, [1.0, 2.0], chunk_size)
(-400.0, 200.0)

The function jacobian computes the Jacobian of a function vector input and vector return.

julia> foo(x) = [rosenbrock_inp(x), prod(x)];

julia> output_size = Val(2) # here we have to provide the output size of `foo` since it cannot be statically inferred
       jacobian(Reverse, foo, [1.0, 2.0], output_size) 
2×2 Matrix{Float64}:
 -400.0  200.0
    2.0    1.0

julia> chunk_size = Val(2) # By specifying the optional chunk size argument, we can use vector inverse mode to propogate derivatives of multiple outputs at once.
       jacobian(Reverse, foo, [1.0, 2.0], output_size, chunk_size)
2×2 Matrix{Float64}:
 -400.0  200.0
    2.0    1.0

julia> jacobian(Forward, foo, [1.0, 2.0])
2×2 Matrix{Float64}:
 -400.0  200.0
    2.0    1.0

julia> # Again, the optinal chunk size argument allows us to use vector forward mode
       jacobian(Forward, foo, [1.0, 2.0], chunk_size)
2×2 Matrix{Float64}:
 -400.0  200.0
    2.0    1.0