In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.
Functions may be defined within Computer program, or separately in libraries that can be used by many programs. In different programming languages, a function may be called a routine, subprogram, subroutine, method, or procedure. Technically, these terms all have different definitions, and the nomenclature varies from language to language. The generic umbrella term callable unit is sometimes used.
A function is often coded so that it can be started several times and from several places during one execution of the program, including from other functions, and then branch back ( Return statement) to the next instruction after the call, once the function's task is done.
The idea of a subroutine was initially conceived by John Mauchly during his work on ENIAC, and recorded in a January 1947 Harvard symposium on "Preparation of Problems for EDVAC-type Machines". Maurice Wilkes, David Wheeler, and Stanley Gill are generally credited with the formal invention of this concept, which they termed a closed sub-routine, contrasted with an open subroutine or macro. However, Alan Turing had discussed subroutines in a paper of 1945 on design proposals for the NPL ACE, going so far as to invent the concept of a Call stack. reprinted in
Functions are a powerful programming tool, and the syntax of many programming languages includes support for writing and using subroutines. Judicious use of functions (for example, through the structured programming approach) will often substantially reduce the cost of developing and maintaining a large program, while increasing its quality and reliability. Functions, often collected into libraries, are an important mechanism for sharing and trading software. The discipline of object-oriented programming is based on objects and methods (which are functions attached to these objects or object classes).
A function may be written so that it expects to obtain one or more data values from the calling program (to replace its parameters or formal parameters). The calling program provides actual values for these parameters, called arguments. Different programming languages may use different conventions for passing arguments:
Default in most Algol-like languages after Algol 60, such as Pascal, Delphi, Simula, CPL, PL/M, Modula, Oberon, Ada, and many others. C, C++, Java (References to objects and arrays are also passed by value) |
Selectable in most Algol-like languages after Algol 60, such as Algol 68, Pascal, Delphi, Simula, CPL, PL/M, Modula, Oberon, Ada, and many others. C++, Fortran, PL/I |
Ada OUT parameters |
Algol, Swift in-out parameters |
Algol, Scala |
PL/I NONASSIGNABLE parameters, Ada IN parameters |
A function can be coded so that it may call itself recursively, at one or more places, to perform its task. This method allows direct implementation of functions defined by mathematical induction and recursive divide and conquer algorithms.
A function whose purpose is to compute one boolean-valued function (that is, to answer a yes/no question) is sometimes called a predicate. In logic programming languages, often all functions are called predicates, since they primarily determine success or failure.
A function that returns no value or returns a null value is sometimes called a procedure. Procedures usually modify their arguments and are a core part of procedural programming.
Some programming languages, such as Pascal, Fortran, Ada and many dialects of BASIC, distinguish between functions or function subprograms, which provide an explicit return value to the calling program, and subroutines or procedures, which do not. In those languages, function calls are normally embedded in expressions (e.g., a sqrt function may be called as y = z + sqrt(x)). Procedure calls either behave syntactically as statements (e.g., a print procedure may be called as if x > 0 then print(x) or are explicitly invoked by a statement such as CALL or GOSUB (e.g., call print(x)). Other languages, such as C and Lisp, do not distinguish between functions and subroutines.
In strictly functional programming languages such as Haskell, subprograms can have no side effects, which means that various internal states of the program will not change. Functions will always return the same result if repeatedly called with the same arguments. Such languages typically only support functions that return a value, since functions that do not return a value have no use unless they can cause a side effect.
In programming languages such as C, C++, and C#, functions that return a value and functions that return no value are both called "functions" (not to be confused with mathematical functions or functional programming, which are different concepts).
A language's compiler will usually translate procedure calls and returns into machine instructions according to a well-defined calling convention, so that functions can be compiled separately from the programs that call them. The instruction sequences corresponding to call and return statements are called the procedure's prologue and epilogue.
A function typically requires standard housekeeping code – both at the entry to, and exit from, the function (function prologue and epilogue – usually saving general purpose registers and return address as a minimum).
Subroutines were implemented in Konrad Zuse's Z4 in 1945.
In 1945, Alan M. Turing used the terms "bury" and "unbury" as a means of calling and returning from subroutines. (NB. Presented on 1946-03-19 before the Executive Committee of the National Physical Laboratory (Great Britain).) (11 pages)
In January 1947 John Mauchly presented general notes at 'A Symposium of Large Scale Digital Calculating Machinery' under the joint sponsorship of Harvard University and the Bureau of Ordnance, United States Navy. Here he discusses serial and parallel operation suggesting
Kay McNulty had worked closely with John Mauchly on the ENIAC team and developed an idea for subroutines for the ENIAC computer she was programming during World War II. She and the other ENIAC programmers used the subroutines to help calculate missile trajectories.
(see pg 163 of the pdf for the relevant page)
Some very early computers and microprocessors, such as the IBM 1620, the Intel 4004 and Intel 8008, and the PIC microcontrollers, have a single-instruction subroutine call that uses a dedicated hardware stack to store return addresses—such hardware supports only a few levels of subroutine nesting, but can support recursive subroutines. Machines before the mid-1960s—such as the UNIVAC I, the PDP-1, and the IBM 1130—typically use a calling convention which saved the instruction counter in the first memory location of the called subroutine. This allows arbitrarily deep levels of subroutine nesting but does not support recursive subroutines. The IBM System/360 had a subroutine call instruction that placed the saved instruction counter value into a general-purpose register; this can be used to support arbitrarily deep subroutine nesting and recursive subroutines. The PDP-11 (1970) is one of the first computers with a stack-pushing subroutine call instruction; this feature also supports both arbitrarily deep subroutine nesting and recursive subroutines. Guy Lewis Steele Jr. AI Memo 443. 'Debunking the "Expensive Procedure Call" Myth; or, Procedure call implementations considered harmful". Section "C. Why Procedure Calls Have a Bad Reputation".
One of the first programming languages to support user-written subroutines and functions was FORTRAN II. The IBM FORTRAN II compiler was released in 1958. ALGOL 58 and other early programming languages also supported procedural programming.
Many early computers loaded the program instructions into memory from a punched tape. Each subroutine could then be provided by a separate piece of tape, loaded or spliced before or after the main program (or "mainline" ); and the same subroutine tape could then be used by many different programs. A similar approach applied in computers that used for their main input. The name subroutine library originally meant a library, in the literal sense, which kept indexed collections of tapes or card-decks for collective use.
On those computers, instead of modifying the function's return jump, the calling program would store the return address in a variable so that when the function completed, it would execute an indirect jump that would direct execution to the location given by the predefined variable.
In the IBM System/360, for example, the branch instructions BAL or BALR, designed for procedure calling, would save the return address in a processor register specified in the instruction, by convention register 14. To return, the subroutine had only to execute an indirect branch instruction (BR) through that register. If the subroutine needed that register for some other purpose (such as calling another subroutine), it would save the register's contents to a private memory location or a register stack.
In systems such as the HP 2100, the JSB instruction would perform a similar task, except that the return address was stored in the memory location that was the target of the branch. Execution of the procedure would actually begin at the next memory location. In the HP 2100 assembly language, one would write, for example
to call a subroutine called MYSUB from the main program. The subroutine would be coded as
The JSB instruction placed the address of the NEXT instruction (namely, BB) into the location specified as its operand (namely, MYSUB), and then branched to the NEXT location after that (namely, AA = MYSUB + 1). The subroutine could then return to the main program by executing the indirect jump JMP MYSUB, I which branched to the location stored at location MYSUB.
Compilers for Fortran and other languages could easily make use of these instructions when available. This approach supported multiple levels of calls; however, since the return address, parameters, and return values of a subroutine were assigned fixed memory locations, it did not allow for recursive calls.
Incidentally, a similar method was used by Lotus 1-2-3, in the early 1980s, to discover the recalculation dependencies in a spreadsheet. Namely, a location was reserved in each cell to store the return address. Since circular references are not allowed for natural recalculation order, this allows a tree walk without reserving space for a stack in memory, which was very limited on small computers such as the IBM PC.
The call sequence can be implemented by a sequence of ordinary instructions (an approach still used in reduced instruction set computing (RISC) and very long instruction word (VLIW) architectures), but many traditional machines designed since the late 1960s have included special instructions for that purpose.
The call stack is usually implemented as a contiguous area of memory. It is an arbitrary design choice whether the bottom of the stack is the lowest or highest address within this area, so that the stack may grow forwards or backwards in memory; however, many architectures chose the latter.
Some designs, notably some Forth implementations, used two separate stacks, one mainly for control information (like return addresses and loop counters) and the other for data. The former was, or worked like, a call stack and was only indirectly accessible to the programmer through other language constructs while the latter was more directly accessible.
When stack-based procedure calls were first introduced, an important motivation was to save precious memory. With this scheme, the compiler does not have to reserve separate space in memory for the private data (parameters, return address, and local variables) of each procedure. At any moment, the stack contains only the private data of the calls that are currently active (namely, which have been called but haven't returned yet). Because of the ways in which programs were usually assembled from libraries, it was (and still is) not uncommon to find programs that include thousands of functions, of which only a handful are active at any given moment. For such programs, the call stack mechanism could save significant amounts of memory. Indeed, the call stack mechanism can be viewed as the earliest and simplest method for automatic memory management.
However, another advantage of the call stack method is that it allows recursive function calls, since each nested call to the same procedure gets a separate instance of its private data.
In a multi-threaded environment, there is generally more than one stack. An environment that fully supports or lazy evaluation may use data structures other than stacks to store their activation records.
This overhead is most obvious and objectionable in leaf procedures or leaf functions, which return without making any procedure calls themselves. To reduce that overhead, many modern compilers try to delay the use of a call stack until it is really needed. For example, the call of a procedure P may store the return address and parameters of the called procedure in certain processor registers, and transfer control to the procedure's body by a simple jump. If the procedure P returns without making any other call, the call stack is not used at all. If P needs to call another procedure Q, it will then use the call stack to save the contents of any registers (such as the return address) that will be needed after Q returns.
The function does not return a value and has to be called as a stand-alone function, e.g., Function1();
return 5;
}
char selection[] = {'S', 'M', 'T', 'W', 'T', 'F', 'S'};
return selection[number];
}
This function converts a number between 0 and 6 into the initial letter of the corresponding day of the week, namely 0 to 'S', 1 to 'M', ..., 6 to 'S'. The result of calling it might be assigned to a variable, e.g., num_day = Function3(number);.
(*pointer_to_var)++;
}
This function does not return a value but modifies the variable whose address is passed as the parameter; it would be called with Function4(&variable_to_increment);.
Sub Example ' Begins the subroutine
TextWindow.WriteLine("This is an example of a subroutine in Microsoft Small Basic.") ' What the subroutine doesEndSub ' Ends the subroutine
In the example above, Example() calls the subroutine. To define the actual subroutine, the Sub keyword must be used, with the subroutine name following Sub. After content has followed, EndSub must be typed.
' Some Code Here
End Function
The function does not return a value and has to be called as a stand-alone function, e.g., Function1
Function2 = 5
End Function
This function returns a result (the number 5), and the call can be part of an expression, e.g., x + Function2()
Dim strArray(6) as String
strArray = Array("M", "T", "W", "T", "F", "S", "S")
Function3 = strArray(intValue)
End Function
This function converts a number between 0 and 6 into the initial letter of the corresponding day of the week, namely 0 to 'M', 1 to 'T', ..., 6 to 'S'. The result of calling it might be assigned to a variable, e.g., num_day = Function3(number).
intValue = intValue + 1
End Function
This function does not return a value but modifies the variable whose address is passed as the parameter; it would be called with "Function4(variable_to_increment)".
change_sign: procedure(array);declare array(*,*) float; array = -array;end change_sign;
This could be called with various arrays as follows:
/* first array bounds from -5 to +10 and 3 to 9 */ declare array1 (-5:10, 3:9)float; /* second array bounds from 1 to 16 and 1 to 16 */ declare array2 (16,16) float; call change_sign(array1); call change_sign(array2);
print('Hello world!') print('Wikipedia')simple_function()
def func(name)
print("welcome "+name)print(func("martin"))
A subprogram may have any number and nature of call sites. If recursion is supported, a subprogram may even call itself, causing its execution to suspend while another nested execution of the same subprogram occurs. Recursion is a useful means to simplify some complex algorithms and break down complex problems. Recursive languages generally provide a new copy of local variables on each call. If the programmer desires the value of local variables to stay the same between calls, they can be declared static in some languages, or global values or common areas can be used. Here is an example of a recursive function in C/C++ to find Fibonacci numbers:
if (n <= 1) {
return n;
}
return Fib(n - 1) + Fib(n - 2);
}
Early languages like Fortran did not initially support recursion because variables were statically allocated, as well as the location for the return address. Early computer instruction sets made storing return addresses and variables on a stack difficult. Machines with index registers or general-purpose registers, e.g., CDC 6000 series, PDP-6, GE 635, System/360, UNIVAC 1100 series, could use one of those registers as a stack pointer.
Modern languages after ALGOL such as PL/I and C almost invariably use a stack, usually supported by most modern computer instruction sets to provide a fresh activation record for every execution of a subprogram. That way, the nested execution is free to modify its local variables without concern for the effect on other suspended executions in progress. As nested calls accumulate, a call stack structure is formed, consisting of one activation record for each suspended subprogram. In fact, this stack structure is virtually ubiquitous, and so activation records are commonly termed .
Some languages such as Pascal, PL/I, and Ada also support , which are functions callable only within the scope of an outer (parent) function. Inner functions have access to the local variables of the outer function that called them. This is accomplished by storing extra context information within the activation record, also termed a display.
If a subprogram can be executed properly even when another execution of the same subprogram is already in progress, that subprogram is said to be reentrant. A recursive subprogram must be reentrant. Reentrant subprograms are also useful in multi-threaded situations since multiple threads can call the same subprogram without fear of interfering with each other. In the IBM CICS transaction processing system, quasi-reentrant was a slightly less restrictive, but similar, requirement for application programs that were shared by many threads.
In object-oriented programming, when a series of functions with the same name can accept different parameter profiles or parameters of different types, each of the functions is said to be overloaded.
Here is an example of function overloading in C++, demonstrating the implementation of two functions with the same name (Area) but different parameters:
double Area(double h, double w) { return h * w; }
/* The first Area function is for finding the area of a rectangle,
double Area(double r) { return r * r * 3.14; }
/* The second Area function is for finding the area of a circle,
int main() {
* so it accepts two numbers as parameters, for the height and width. */
* so it only accepts one number as a parameter, for the radius. */
double rectangle_area = Area(3, 4); // This calls the first Area function, because two parameters are provided.
double circle_area = Area(5); // This calls the second Area function, because only one parameter is provided.
std::cout << "Area of a rectangle is " << rectangle_area << std::endl;
std::cout << "Area of a circle is " << circle_area << std::endl;
}
As another example, a function might construct an object that will accept directions, and trace its path to these points on screen. There are a plethora of parameters that could be passed in to the constructor (colour of the trace, starting x and y co-ordinates, trace speed). If the programmer wanted the constructor to be able to accept only the color parameter, then he could call another constructor that accepts only color, which in turn calls the constructor with all the parameters passing in a set of default values for all the other parameters (X and Y would generally be centered on screen or placed at the origin, and the speed would be set to another value of the coder's choosing).
PL/I has the GENERIC attribute to define a generic name for a set of entry references called with different types of arguments. Example:
DECLARE gen_name GENERIC(Multiple argument definitions may be specified for each entry. A call to "gen_name" will result in a call to "name" when the argument is FIXED BINARY, "flame" when FLOAT", etc. If the argument matches none of the choices "pathname" will be called.name WHEN(FIXED BINARY), flame WHEN(FLOAT), pathname OTHERWISE );
Some programmers suggest that a function should perform only one task, and if a function does perform more than one task, it should be split up into more functions. They argue that functions are key components in code maintenance, and their roles in the program must remain distinct.
Proponents of modular programming (modularizing code) advocate that each function should have minimal dependency on other pieces of code. For example, the use of is generally deemed unwise by advocates for this perspective, because it adds tight coupling between the function and these global variables. If such coupling is not necessary, their advice is to code refactoring functions to accept passed parameters instead. However, increasing the number of parameters passed to functions can affect code readability.
In the IBM System/360, where return code was expected from the subroutine, the return value was often designed to be a multiple of 4—so that it could be used as a direct branch table index into a branch table often located immediately after the call instruction to avoid extra conditional tests, further improving efficiency. In the System/360 assembly language, one would write, for example:
BAL 14, SUBRTN01 go to a subroutine, storing return address in R14
B TABLE(15) use returned value in reg 15 to index the branch table,
TABLE B OK return code =00 GOOD }
B BAD return code =04 Invalid input } Branch table
B ERROR return code =08 Unexpected condition }
There are some seemingly obvious optimizations of procedure calls that cannot be applied if the procedures may have side effects. For example, in the expression (f(x)-1)/(f(x)+1), the function f must be called twice, because the two calls may return different results. Moreover, the value of x must be fetched again before the second call, since the first call may have changed it. Determining whether a subprogram may have a side effect is very difficult (indeed, undecidable by virtue of Rice's theorem). So, while those optimizations are safe in purely functional programming languages, compilers of typical imperative programming usually have to assume the worst.
|
|