The LogSumExp (LSE) (also called RealSoftMax or multivariable softplus) function is a smooth maximum – a smooth function approximation to the maximum function, mainly used by machine learning algorithms. It is defined as the logarithm of the sum of the exponentials of the arguments:
Properties
The LogSumExp function domain is
, the real coordinate space, and its codomain is
, the
real line.
It is an approximation to the maximum
with the following bounds
The first inequality is strict unless
. The second inequality is strict unless all arguments are equal.
(Proof: Let
. Then
. Applying the logarithm to the inequality gives the result.)
In addition, we can scale the function to make the bounds tighter. Consider the function . Then
(Proof: Replace each with for some in the inequalities above, to give
and, since
finally, dividing by gives the result.)
Also, if we multiply by a negative number instead, we of course find a comparison to the function:
The LogSumExp function is convex function, and is strictly increasing everywhere in its domain. It is not strictly convex, since it is affine function (linear plus a constant) on the diagonal and parallel lines:
Other than this direction, it is strictly convex (the
Hessian matrix has rank ), so for example restricting to a
hyperplane that is transverse to the diagonal results in a strictly convex function. See
, below.
Writing the partial derivatives are:
which means the gradient of LogSumExp is the softmax function.
The convex conjugate of LogSumExp is the negative entropy.
log-sum-exp trick for log-domain calculations
The LSE function is often encountered when the usual arithmetic computations are performed on a logarithmic scale, as in
log probability.
Similar to multiplication operations in linear-scale becoming simple additions in log-scale, an addition operation in
linear-scale becomes the LSE in log-scale:
A common purpose of using log-domain computations is to increase accuracy and avoid underflow and overflow problems
when very small or very large numbers are represented directly (i.e. in a linear domain) using limited-precision
floating point numbers.
Unfortunately, the use of LSE directly in this case can again cause overflow/underflow problems. Therefore, the
following equivalent must be used instead (especially when the accuracy of the above 'max' approximation is not sufficient).
where
Many math libraries such as IT++ provide a default routine of LSE and use this formula internally.
A strictly convex log-sum-exp type function
LSE is convex but not strictly convex.
We can define a strictly convex log-sum-exp type function
by adding an extra argument set to zero:
This function is a proper Bregman generator (strictly convex and differentiable).
It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.
In tropical analysis, this is the sum in the log semiring.
See also