Type inference, sometimes called type reconstruction,
refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical , but also natural languages in some branches of computer science and linguistics.
To exclude unwanted, but materially possible uses, the concept of types is defined and applied in many variations. In mathematics, Russell's paradox sparked early versions of type theory. In programming languages, typical examples are "type errors", e.g. ordering a computer to sum values that are not numbers. While materially possible, the result would no longer be meaningful and perhaps disastrous for the overall process.
In a typing, an expression is opposed to a type. For example, $4$, $2+2$, and $2\backslash cdot\; 2$ are all separate terms with the type $\backslash mathrm\{nat\}$ for . Traditionally, the expression is followed by a colon and its type, such as $2\; :\; \backslash mathrm\{nat\}$. This means that the value $2$ is of type $\backslash mathrm\{nat\}$. This form is also used to declare new names, e.g. $n\; :\; \backslash mathrm\{nat\}$, much like introducing a new character to a scene by the words "detective Decker".
Contrary to a story, where the designations slowly unfold, the objects in formal languages often have to be defined with their type from very beginning. Additionally, if the expressions are ambiguous, types may be needed to make the intended use explicit. For instance, the expression $2$ might have a type $\backslash mathrm\{nat\}$ but could also be read as a rational number or real number number or even as a plain text.
As a consequence, programs or proofs can become so encumbered with types, that it is desirable to deduce them from the context. This can be possible by collecting the uses of untyped expression (including undefined names). If, for instance, a yet undefined name n is used in an expression $n\; +\; 2$, one could conclude, that n is at least a number. The process of deducing the type from an expression and its context is type inference.
In general not only objects, but also activities have types and may be introduced simply by their use. For a Star Trek story, such an unknown activity could be "beaming", which for sake of the story's flow is just executed and never formally introduced. Nevertheless, one can deduce its type (transport) following what happens. Additionally, both objects and activities can be constructed from their parts. In such a setting, type inference cannot only become more complex, but also more helpful, as it allows to collect a complete description of everything in a composed scene, while still being able to detect conflicting or unintended uses.
In this setting, the following questions are of particular interest:
For the simply typed lambda calculus, all three questions are decidable. The situation is not as comfortable when more expressive types are allowed.
In some programming languages, all values have a data type explicitly declared at compile time, limiting the values a particular expression can take on at runtime. Increasingly, justintime compilation blurs the distinction between run time and compile time. However, historically, if the type of a value is known only at runtime, these languages are dynamically typed. In other languages, the type of an expression is known only at compile time; these languages are statically typed. In most statically typed languages, the input and output types of functions and ordinarily must be explicitly provided by type annotations. For example, in ANSI C:
int result; /* declare integer result */
result = x + 1; return result;} The Type signature of this function definition, int add_one(int x), declares that add_one is a function that takes one argument, an integer, and returns an integer. int result; declares that the local variable result is an integer. In a hypothetical language supporting type inference, the code might be written like this instead:
var result; /* inferredtype variable result */ var result2; /* inferredtype variable result #2 */
result = x + 1; result2 = x + 1.0; /* this line won't work (in the proposed language) */ return result;}
In the imaginary language in which the last example is written, the compiler would assume that, in the absence of information to the contrary, + takes two integers and returns one integer. (This is how it works in, for example, OCaml.) From this, the type inferencer can infer that the type of x + 1 is an integer, which means result is an integer and thus the return value of add_one is an integer. Similarly, since + requires both of its arguments be of the same type, x must be an integer, and thus, add_one accepts one integer as an argument.
However, in the subsequent line, result2 is calculated by adding a decimal 1.0 with floatingpoint arithmetic, causing a conflict in the use of x for both integer and floatingpoint expressions. The correct typeinference algorithm for such a situation has been known since 1958 and has been known to be correct since 1982. It revisits the prior inferences and uses the most general type from the outset: in this case floatingpoint. This can however have detrimental implications, for instance using a floatingpoint from the outset can introduce precision issues that would have not been there with an integer type.
Frequently, however, degenerate typeinference algorithms are used that cannot backtrack and instead generate an error message in such a situation. This behavior may be preferable as type inference may not always be neutral algorithmically, as illustrated by the prior floatingpoint precision issue.
An algorithm of intermediate generality implicitly declares result2 as a floatingpoint variable, and the addition implicitly converts x to a floating point. This can be correct if the calling contexts never supply a floating point argument. Such a situation shows the difference between type inference, which does not involve type conversion, and implicit type conversion, which forces data to a different data type, often without restrictions.
Finally, a significant downside of complex typeinference algorithm is that the resulting type inference resolution is not going to be obvious to humans (notably because of the backtracking), which can be detrimental as code is primarily intended to be comprehensible to humans.
The recent emergence of justintime compilation allows for hybrid approaches where the type of arguments supplied by the various calling context is known at compile time, and can generate a large number of compiled versions of the same function. Each compiled version can then be optimized for a different set of types. For instance, JIT compilation allows there to be at least two compiled versions of add_one:
To obtain the information required to infer the type of an expression, the compiler either gathers this information as an aggregate and subsequent reduction of the type annotations given for its subexpressions, or through an implicit understanding of the type of various atomic values (e.g. true : Bool; 42 : Integer; 3.14159 : Real; etc.). It is through recognition of the eventual reduction of expressions to implicitly typed atomic values that the compiler for a type inferring language is able to compile a program completely without type annotations.
In complex forms of higherorder programming and polymorphism, it is not always possible for the compiler to infer as much, and type annotations are occasionally necessary for disambiguation. For instance, type inference with polymorphic recursion is known to be undecidable. Furthermore, explicit type annotations can be used to optimize code by forcing the compiler to use a more specific (faster/smaller) type than it had inferred.
Some methods for type inference are based on constraint satisfactionTalpin, JeanPierre, and Pierre Jouvelot. " Polymorphic type, region and effect inference." Journal of functional programming 2.3 (1992): 245271. or satisfiability modulo theories.
Type inference on the map function proceeds as follows. map is a function of two arguments, so its type is constrained to be of the form a → b → c. In Haskell, the patterns [] and (first:rest) always match lists, so the second argument must be a list type: b = [d] for some type d. Its first argument f is applied to the argument first, which must have type d, corresponding with the type in the list argument, so f :: d → e (:: means "is of type") for some type e. The return value of map f, finally, is a list of whatever f produces, so [e].
Putting the parts together leads to map :: (d → e) → [d] → [e]. Nothing is special about the type variables, so it can be relabeled as
It turns out that this is also the most general type, since no further constraints apply. As the inferred type of map is parametrically polymorphic, the type of the arguments and results of f are not inferred, but left as type variables, and so map can be applied to functions and lists of various types, as long as the actual types match in each invocation.
The origin of this algorithm is the type inference algorithm for the simply typed lambda calculus that was devised by Haskell Curry and Robert Feys in 1958. In 1969 J. Roger Hindley extended this work and proved that their algorithm always inferred the most general type. In 1978 Robin Milner, independently of Hindley's work, provided an equivalent algorithm, Algorithm W. In 1982 Luis Damas finally proved that Milner's algorithm is complete and extended it to support systems with polymorphic references.

