In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character ('\0', called NUL in ASCII). Alternative names are C string, which refers to the C programming language and ASCIIZ (note that C strings do not imply the use of ASCII).
The length of a C string is found by searching for the (first) NUL byte. This can be slow as it takes O( n) (linear time) with respect to the string length. It also means that a NUL cannot be inside the string, as the only NUL is the one marking the end.
At the time C (and the languages that it was derived from) was developed, memory was extremely limited, so using only one byte of overhead to store the length of a string was attractive. The only popular alternative at that time, usually called a "Pascal string" (a more modern term is "length-prefixed"), used a leading byte to store the length of the string. This allows the string to contain NUL and made finding the length need only one memory access (O(1) constant time), but limited string length to 255 characters (on a machine using 8-bit bytes). C designer Dennis Ritchie chose to follow the convention of NUL-termination, already established in BCPL, to avoid the limitation on the length of a string and because maintaining the count seemed, in his experience, less convenient than using a terminator.Dennis M. Ritchie (1993). The. Proc. 2nd History of Programming Languages Conf.
This had some influence on CPU instruction set design. Some CPUs in the 1970s and 1980s, such as the Zilog Z80 and the DEC VAX, had dedicated instructions for handling length-prefixed strings. However, as the NUL-terminated string gained traction, CPU designers began to take it into account, as seen for example in IBM's decision to add the "Logical String Assist" instructions to the ES/9000 520 in 1992.
FreeBSD developer Poul-Henning Kamp, writing in ACM Queue, would later refer to the victory of null-terminated strings over a 2-byte (not one-byte) length as "the most expensive one-byte mistake" ever.
The NUL termination has historically created security problems. A NUL byte inserted into the middle of a string will truncate it unexpectedly. A common bug was to not allocate the additional space for the NUL, so it was written over adjacent memory. Another was to not write the NUL at all, which was often not detected during testing because a NUL was already there by chance from previous use of the same block of memory. Due to the expense of finding the length, many programs did not bother before copying a string to a fixed-size Data buffer, causing a buffer overflow if it was too long.
The inability to store a NUL requires that string data and binary data be kept distinct and handled by different functions (with the latter requiring the length of the data to also be supplied). This can lead to code redundancy and errors when the wrong function is used.
The speed problems with finding the length can usually be mitigated by combining it with another operation that is O( n) anyway, such as in [[strlcpy]]. However, this does not always result in an intuitive API.
It is not possible to store every possible ASCII or UTF-8 string in a null-terminated string, as the encoding of the Null character is a zero byte. However, it is common to store the subset of ASCII or UTF-8 -- every character except the NUL character -- in null-terminated strings. Some systems use "modified UTF-8" which encodes the NUL character as two non-zero bytes (0xC0, 0x80) and thus allow all possible strings to be stored. (this is not allowed by the UTF-8 standard as it is a security risk. A C0,80 NUL might be seen as a string terminator in security validation and as a character when used)
UTF-16 uses 2-byte integers and as either byte may be zero, cannot be stored in a null-terminated byte string. However, some languages implement a string of 16-bit UTF-16 characters, terminated by a 16-bit NUL character. (Again the NUL character, which encodes as a single zero code unit, is the only character that cannot be stored. UTF-16 does not have any alternative encoding of zero).
On modern systems memory usage is less of a concern, so a multi-byte length is acceptable (if there are so many small strings that the space used by this length is a concern, and if there are enough duplicates then even a hash table will use less memory). Most replacements for C strings use a 32-bit or larger length value. Examples include the C++ Standard Template Library [[String (C++)|std::string]], the Qt QString, the MFC CString, and the C-based implementation CFString from Core Foundation as well as its Objective-C sibling NSString from Foundation Kit, both by Apple. More complex structures may also be used to store strings such as the rope.