Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures. The parameter in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about have largely been smoothed away in the scale-space level at scale .
The main type of scale space is the linear (Gaussian) scale space, which has wide applicability as well as the attractive property of being possible to derive from a small set of scale-space axioms. The corresponding scale-space framework encompasses a theory for Gaussian derivative operators, which can be used as a basis for expressing a large class of visual operations for computerized systems that process visual information. This framework also allows visual operations to be made scale invariant, which is necessary for dealing with the size variations that may occur in image data, because real-world objects may be of different sizes and in addition the distance between the object and the camera may be unknown and may vary depending on the circumstances.
where the semicolon in the argument of implies that the convolution is performed only over the variables , while the scale parameter after the semicolon just indicates which scale level is being defined. This definition of works for a continuum of scales , but typically only a finite discrete set of levels in the scale-space representation would be actually considered.
The scale parameter is the variance of the Gaussian filter and as a limit for the filter becomes an impulse function such that that is, the scale-space representation at scale level is the image itself. As increases, is the result of smoothing with a larger and larger filter, thereby removing more and more of the details that the image contains. Since the standard deviation of the filter is , details that are significantly smaller than this value are to a large extent removed from the image at scale parameter , see the following figures and for graphical illustrations.
The conclusion from several different axiomatic derivations that have been presented is that the Gaussian scale space constitutes the canonical way to generate a linear scale space, based on the essential requirement that new structures must not be created when going from a fine scale to any coarser scale.
Conditions, referred to as scale-space axioms, that have been used for deriving the uniqueness of the Gaussian kernel include linearity, shift invariance, semi-group structure, non-enhancement of local extrema, scale invariance and rotational invariance.
In the works, the uniqueness claimed in the arguments based on scale invariance has been criticized, and alternative self-similar scale-space kernels have been proposed. The Gaussian kernel is, however, a unique choice according to the scale-space axiomatics based on causality or non-enhancement of local extrema.
with initial condition . This formulation of the scale-space representation L means that it is possible to interpret the intensity values of the image f as a "temperature distribution" in the image plane and that the process that generates the scale-space representation as a function of t corresponds to heat diffusion in the image plane over time t (assuming the thermal conductivity of the material equal to the arbitrarily chosen constant ). Although this connection may appear superficial for a reader not familiar with differential equations, it is indeed the case that the main scale-space formulation in terms of non-enhancement of local extrema is expressed in terms of a sign condition on partial derivatives in the 2+1-D volume generated by the scale space, thus within the framework of partial differential equations. Furthermore, a detailed analysis of the discrete case shows that the diffusion equation provides a unifying link between continuous and discrete scale spaces, which also generalizes to nonlinear scale spaces, for example, using anisotropic diffusion. Hence, one may say that the primary way to generate a scale space is by the diffusion equation, and that the Gaussian kernel arises as the Green's function of this specific partial differential equation.
Another motivation to the scale-space concept originates from the process of performing a physical measurement on real-world data. In order to extract any information from a measurement process, one has to apply operators of non-infinitesimal size to the data. In many branches of computer science and applied mathematics, the size of the measurement operator is disregarded in the theoretical modelling of a problem. The scale-space theory on the other hand explicitly incorporates the need for a non-infinitesimal size of the image operators as an integral part of any measurement as well as any other operation that depends on a real-world measurement.
There is a close link between scale-space theory and biological vision. Many scale-space operations show a high degree of similarity with receptive field profiles recorded from the mammalian retina and the first stages in the visual cortex.
In these respects, the scale-space framework can be seen as a theoretically well-founded paradigm for early vision, which in addition has been thoroughly tested by algorithms and experiments.
Due to the commutative property between the derivative operator and the Gaussian smoothing operator, such scale-space derivatives can equivalently be computed by convolving the original image with Gaussian derivative operators. For this reason they are often also referred to as Gaussian derivatives:
The uniqueness of the Gaussian derivative operators as local operations derived from a scale-space representation can be obtained by similar axiomatic derivations as are used for deriving the uniqueness of the Gaussian kernel for scale-space smoothing.
When Gaussian derivative operators and differential invariants are used in this way as basic feature detectors at multiple scales, the uncommitted first stages of visual processing are often referred to as a visual front-end. This overall framework has been applied to a large variety of problems in computer vision, including feature detection, feature classification, image segmentation, image matching, motion estimation, computation of shape cues and object recognition. The set of Gaussian derivative operators up to a certain order is often referred to as the N-jet and constitutes a basic type of feature within the scale-space framework.
that satisfy the following sign condition on a third-order differential invariant:
Similarly, multi-scale blob detection at any given fixed scale can be obtained from local maxima and local minima of either the Laplacian operator (also referred to as the Laplacian of Gaussian)
or the determinant of the Hessian matrix
Scale-space operations have also been frequently used for expressing coarse-to-fine methods, in particular for tasks such as image matching and for multi-scale image segmentation.
Recent work has shown that also more complex operations, such as scale-invariant object recognition can be performed in this way,
by computing local image descriptors (N-jets or local histograms of gradient directions) at scale-adapted interest points obtained from scale-space extrema of the normalized Laplacian operator (see also scale-invariant feature transform) or the determinant of the Hessian (see also SURF); see also the Scholarpedia article on the scale-invariant feature transform for a more general outlook of object recognition approaches based on receptive field responses in terms Gaussian derivative operators or approximations thereof.
In a scale-space representation, the existence of a continuous scale parameter makes it possible to track zero crossings over scales leading to so-called deep structure.
For features defined as of differential invariants, the implicit function theorem directly defines trajectory across scales, and at those scales where occur, the local behaviour can be modelled by singularity theory.
Extensions of linear scale-space theory concern the formulation of non-linear scale-space concepts more committed to specific purposes. These non-linear scale-spaces often start from the equivalent diffusion formulation of the scale-space concept, which is subsequently extended in a non-linear fashion. A large number of evolution equations have been formulated in this way, motivated by different specific requirements (see the abovementioned book references for further information). However, not all of these non-linear scale-spaces satisfy similar "nice" theoretical requirements as the linear Gaussian scale-space concept. Hence, unexpected artifacts may sometimes occur and one should be very careful of not using the term "scale-space" for just any type of one-parameter family of images.
A first-order extension of the isotropic Gaussian scale space is provided by the affine (Gaussian) scale space. One motivation for this extension originates from the common need for computing image descriptors subject for real-world objects that are viewed under a perspective camera model. To handle such non-linear deformations locally, partial invariance (or more correctly covariance) to local affine deformations can be achieved by considering affine Gaussian kernels with their shapes determined by the local image structure, see the article on affine shape adaptation for theory and algorithms. Indeed, this affine scale space can also be expressed from a non-isotropic extension of the linear (isotropic) diffusion equation, while still being within the class of linear partial differential equations.
There exists a more general extension of the Gaussian scale-space model to affine and spatio-temporal scale-spaces. In addition to variabilities over scale, which original scale-space theory was designed to handle, this generalized scale-space theory also comprises other types of variabilities caused by geometric transformations in the image formation process, including variations in viewing direction approximated by local affine transformations, and relative motions between objects in the world and the observer, approximated by local Galilean transformations. This generalized scale-space theory leads to predictions about receptive field profiles in good qualitative agreement with receptive field profiles measured by cell recordings in biological vision.
There are strong relations between scale-space theory and wavelets, although these two notions of multi-scale representation have been developed from somewhat different premises.
There has also been work on other multi-scale approaches, such as pyramids and a variety of other kernels, that do not exploit or require the same requirements as true scale-space descriptions do.
Regarding biological hearing there are receptive field profiles in the inferior colliculus and the primary auditory cortex that can be well modelled by spectra-temporal receptive fields that can be well modelled by Gaussian derivates over logarithmic frequencies and windowed Fourier transforms over time with the window functions being temporal scale-space kernels.
For an earlier approach to handling temporal scales in a time-causal way, by performing Gaussian smoothing over a logarithmically transformed temporal axis, however, not having any known memory-efficient time-recursive implementation as the time-causal limit kernel has, see,
Alternative definition
Motivations
Gaussian derivatives
Visual front end
Detector examples
should assume a local maximum in the gradient direction
By working out the differential geometry, it can be shown that this differential edge detector can equivalently be expressed from the zero-crossings of the second-order differential invariant
In an analogous fashion, corner detectors and ridge and valley detectors can be expressed as local maxima, minima or zero-crossings of multi-scale differential invariants defined from Gaussian derivatives. The algebraic expressions for the corner and ridge detection operators are, however, somewhat more complex and the reader is referred to the articles on corner detection and ridge detection for further details.
Scale selection
where is a parameter that is related to the dimensionality of the image feature. This algebraic expression for scale normalized Gaussian derivative operators originates from the introduction of -normalized derivatives according to
It can be theoretically shown that a scale selection module working according to this principle will satisfy the following scale covariance property: if for a certain type of image feature a local maximum is assumed in a certain image at a certain scale , then under a rescaling of the image by a scale factor the local maximum over scales in the rescaled image will be transformed to the scale level .
Scale invariant feature detection
Related multi-scale representations
Relations to biological vision and hearing
Deep learning and scale space
Time-causal temporal scale space
Implementation issues
See also
Further reading
External links
|
|