printf is a C standard library function that Content format Plain text and writes it to standard output. The function accepts a format c-string argument and a variadic number of value arguments that the function serialization per the format string. Mismatch between the format specifiers and count and data type of values results in undefined behavior and possibly program crash or other vulnerability.
The format string is encoded as a template language consisting of verbatim text and format specifiers that each specify how to serialize a value. As the format string is processed left-to-right, a subsequent value is used for each format specifier found. A format specifier starts with a percent sign character and has one or more following characters that specify how to serialize a value.
The standard library provides other, similar functions that form a family of printf-like functions. The functions share the same formatting capabilities but provide different behavior such as output to a different destination or safety measures that limit exposure to vulnerabilities. Functions of the printf-family have been implemented in other programming contexts (i.e. languages) with the same or similar syntax and semantics.
The scanf C standard library function complements printf by providing formatted input (a.k.a. Lexical analysis, a.k.a. parsing) via a similar format string syntax.
The name, printf, is short for print formatted where print refers to output to a computer printer although the function is not limited to printer output. Today, print refers to output to any text-based environment such as a terminal or a computer file.
PRINT 601, IA, IB, AREA 601 FORMAT (4H A= ,I5,5H B= ,I5,8H AREA= ,F10.2, 13H SQUARE UNITS)
Hereby:
An output with input arguments , , and might look like this:
A= 100 B= 200 AREA= 1500.25 SQUARE UNITS
Hereby:
In 1968, ALGOL 68 had a more function-like API, but still used special syntax (the delimiters surround special formatting syntax):
"red", 123456, 89, BIN 255, 3.14, 250));
In contrast to Fortran, using normal function calls and data types simplifies the language and compiler, and allows the implementation of the input/output to be written in the same language.
These advantages were thought to outweigh the disadvantages (such as a complete lack of type safety in many instances) up until the 2000s, and in most newer languages of that era I/O is not part of the syntax.
People have since learned that this potentially results in consequences, ranging from security exploits to hardware failures (e.g., phone's networking capabilities being permanently disabled after trying to connect to an access point named "%p%s%s%s%s%n"). Modern languages, such as C++20 and later, tend to include format specifications as a part of the language syntax, which restore type safety in formatting to an extent, and allow the compiler to detect some invalid combinations of format specifiers and data types at compile time.
The option of GCC allows compile-time checks to calls, enabling the compiler to detect a subset of invalid calls (and issue either a warning or an error, stopping the compilation altogether, depending on other flags).
Since the compiler is inspecting format specifiers, enabling this effectively extends the C++ syntax by making formatting a part of it.
As the format specification has become a part of the language syntax, a C++ compiler is able to prevent invalid combinations of types and format specifiers in many cases. Unlike the option, this is not an optional feature.
The format specification of and is, in itself, an extensible "mini-language" (referred to as such in the specification), an example of a domain-specific language. As such, , completes a historical cycle; bringing the state-of-the-art (as of 2024) back to what it was in the case of Fortran's first implementation in the 1950s.
%[''parameter''][''flags''][''width''][.''precision''][''length'']''type''
n is the index of the value parameter to serialization using this format specifier |
This field allows for using the same value multiple times in a format string instead of having to pass the value multiple times. If a specifier includes this field, then subsequent specifiers must also.
For example,
This field is particularly useful for localizing messages to different that use different .
In Windows API, support for this feature is via a different function, .
Left-align the output of this placeholder; default is to right-align the output | |
Prepends a plus sign for a positive value; by default a positive value does not have a prefix | |
(space) | Prepends a space character for a positive value; ignored if the flag exists; by default a positive value does not have a prefix |
When the 'width' option is specified, prepends zeros for numeric types; by prepends spaces; for example, produces " 3", while produces | |
The integer or exponent of a decimal has the thousands grouping separator applied | |
Alternate form: For and types, trailing zeros are not removed For , , , , , types, the output always contains a decimal point For , , types, the text , , , respectively, is prepended to non-zero numbers |
For example, specifies a width of 3 and outputs with a space on the left to output 3 characters. The call outputs which is 4 characters long since that is the minimum width for that value even though the width specified is 3.
If the width field is omitted, the output is the minimum number of characters for the value.
If the field is specified as , then the width value is read from the list of values in the call. For example, outputs 10 where the second parameter, , is the width (matches with ) and is the value to serialization (matches with ).
Though not part of the width field, a leading zero is interpreted as the zero-padding flag mentioned above, and a negative value is treated as the positive value in conjunction with the left-alignment flag also mentioned above.
The width field can be used to format values as a table (tabulated output). But, columns do not align if any value is larger than fits in the width specified. For example, notice that the last line value () does not fit in the first column of width 3 and therefore the column is not aligned.
1 1
12 12
123 123
1234 123
The precision field may be omitted, or a numeric integer value, or a dynamic value when passed as another argument when indicated by an asterisk (). For example, outputs .
For integer types, causes to expect an -sized integer argument which was promoted from a . | |
For integer types, causes to expect an -sized integer argument which was promoted from a . | |
For integer types, causes to expect a -sized integer argument. For floating-point types, this is ignored. arguments are always promoted to when used in a varargs call. | |
For integer types, causes to expect a -sized integer argument. | |
For floating-point types, causes to expect a argument. | |
For integer types, causes to expect a -sized integer argument. | |
For integer types, causes to expect a -sized integer argument. | |
For integer types, causes to expect a -sized integer argument. |
Platform-specific length options came to exist prior to widespread use of the ISO C99 extensions, including:
Win32/Win64 | |
Win32/Win64 | |
Win32/Win64 | |
BSD |
ISO C99 includes the inttypes.h header file that includes a number of macros for platform-independent coding. For example: specifies decimal format for a 64-bit signed integer. Since the macros evaluate to a string literal, and the compiler concatenates adjacent string literals, the expression compiles to a single string.
Macros include:
Typically equivalent to ( Win32/Win64) or | |
Typically equivalent to ( Win32/Win64), ( 32-bit platforms) or ( 64-bit platforms) | |
Typically equivalent to ( Win32/Win64) or | |
Typically equivalent to ( Win32/Win64), ( 32-bit platforms) or ( 64-bit platforms) | |
Typically equivalent to ( Win32/Win64) or | |
Typically equivalent to ( Win32/Win64), ( 32-bit platforms) or ( 64-bit platforms) | |
Typically equivalent to ( Win32/Win64) or | |
Typically equivalent to ( Win32/Win64), ( 32-bit platforms) or ( 64-bit platforms) |
Output a literal character; does not accept flags, width, precision or length fields | |
, | (signed) formatted as decimal; and are synonymous except when used with scanf |
formatted as decimal. | |
, | formatted as fixed-point; and only differs in how the strings for an infinite number or NaN are printed (, and for ; , and for ) |
, | formatted as in exponential notation ; results in rather than to introduce the exponent; the exponent always contains at least two digits; if the value is zero, the exponent is ; in Windows, the exponent contains three digits by default, e.g. , but this can be altered by Microsoft-specific function |
, | formatted as either fixed-point or exponential notation, whichever is more appropriate for its magnitude; uses lower-case letters, uses upper-case letters; this type differs slightly from fixed-point notation in that insignificant zeroes to the right of the decimal point are not included, and that the precision field specifies the total number of significant digits rather than the digits after the decimal; the decimal point is not included on whole numbers |
, | formatted as hexadecimal; uses lower-case letters and uses upper-case |
formatted as octal | |
null-terminated string | |
Pointer formatted in an implementation-defined way | |
, | in hexadecimal notation, starting with or . uses lower-case letters, uses upper-case letters "printf" ( added in C99) |
Outputs nothing but writes the number of characters written so far into an integer pointer parameter; in Java this prints a newline |
Some printf-like functions allow extensions to the escape-character-based mini-language, thus allowing the programmer to use a specific formatting function for non-builtin types. One is the (now deprecated) glibc's . However, it is rarely used due to the fact that it conflicts with static format string checking. Another is Vstr custom formatters, which allows adding multi-character format names.
Some applications (like the Apache HTTP Server) include their own printf-like function, and embed extensions into it. However these all tend to have the same problems that has.
The Linux kernel [[printk]] function supports a number of ways to display kernel structures using the generic specification, by additional format characters. For example, prints an IPv4 address in dotted-decimal form. This allows static format string checking (of the portion) at the expense of full compatibility with normal printf.
Some compilers, like the GNU Compiler Collection, will statically check the format strings of printf-like functions and warn about problems (when using the flags or ). GCC will also warn about user-defined printf-style functions if the non-standard "format" is applied to the function.
The functionality also makes accidentally Turing-complete even with a well-formed set of arguments. A game of tic-tac-toe written in the format string is a winner of the 27th IOCCC.
outputs to a file instead of standard output.
writes to a [[string buffer]] instead of standard output.
provides a level of safety over since the caller provides a length ''n'' that is the length of the output buffer in bytes (including space for the trailing nul).
provides for safety by accepting a string handle (char**) argument. The function allocates a buffer of sufficient size to contain the formatted text and outputs the buffer via the handle.
For each function of the family, including printf, there is also a variant that accepts a single va list argument rather than a variable list of arguments. Typically, these variants start with "v". For example: , , .
Generally, printf-like functions return the number of bytes output or -1 to indicate failure.
|
|