A universally unique identifier ( UUID) is a 128-bit number used to identify information in computer systems. The term globally unique identifier ( GUID) is also used.
When generated according to the standard methods, UUIDs are for practical purposes unique, without depending for their uniqueness on a central registration authority or coordination between the parties generating them, unlike most other . While the probability that a UUID will be duplicated is not zero, it is close enough to zero to be negligible.
Thus, anyone can create a UUID and use it to identify something with near certainty that the identifier does not duplicate one that has already been, or will be, created to identify something else. Information labeled with UUIDs by independent parties can therefore be later combined into a single database, or transmitted on the same channel, without needing to resolve conflicts between identifiers.
Adoption of UUIDs and GUIDs is widespread, with many computing platforms providing support for generating them, and for parsing their textual representation.
The four bits of digit M indicate the UUID version, and the one to three most significant bits of digit N indicate the UUID variant. In the example, M is 1 and N is a (10xx), meaning that the UUID is a variant 1, version 1 UUID; that is, a time-based DCE/RFC 4122 UUID.
|+ UUID record layout|
|time_low||4||8||integer giving the low 32 bits of the time|
|time_mid||2||4||integer giving the middle 16 bits of the time|
|time_hi_and_version||2||4||4-bit "version" in the most significant bits, followed by the high 12 bits of the time|
|clock_seq_hi_and_res clock_seq_low||2||4||1-3 bit "variant" in the most significant bits, followed by the 13-15 bit clock sequence|
|node||6||12||the 48-bit node id|
These fields correspond to those in version 1 and 2 time-based UUIDs, but the same 8-4-4-4-12 representation is used for all UUIDs, even for UUIDs which are constructed differently.
Microsoft GUIDs are sometimes represented with surrounding braces:
This format should not be confused with "Windows Registry format", which refers to the format within the curly braces.
For example, 00112233-4455-6677-8899-aabbccddeeff is encoded as the bytes 00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff.
Other systems, notably Microsoft's marshalling of UUIDs in their COM/OLE libraries, use a mixed-endian format, whereby the first three components of the UUID are little-endian, and the last two are big-endian.
For example, 00112233-4455-6677-8899-aabbccddeeff is encoded as the bytes 33 22 11 00 55 44 77 66 88 99 aa bb cc dd ee ff.
The other two variants, variants 1 and 2, are used by the current UUID specifications. Variant 1 UUIDs (10xx N=8..b, 2 bits) are referred to as RFC 4122, or "Leach-Salz" UUIDs, after the authors of the original Internet Draft. Variant 2 (110x N=c..d, 3 bits) is characterized in the RFC as "reserved, Microsoft Corporation backward compatibility", and was used for early GUIDs on the Microsoft Windows platform. Variant bits aside, the two variants are the same except that when reduced to a binary form for storage or transmission, variant 1 UUIDs use "network" (big-endian) byte order, while variant 2 GUIDs use "native" (little-endian) byte order. In their textual representations, variants 1 and 2 are the same except for the variant bits.
When byte swapping is required to convert between the big-endian byte order of variant 1 and the little-endian byte order of variant 2, the fields above define the swapping. The first three fields are unsigned 32- and 16-bit integers and are subject to swapping, while the last two fields consist of uninterpreted bytes, not subject to swapping. This byte swapping applies even for version 3, 4, and 5 UUID's where the canonical fields do not correspond to the content of the UUID.
Note that while some important GUIDs, such as the identifier for the Component Object Model IUnknown interface, are nominally variant 2 UUIDs, many identifiers generated and used in Microsoft Windows software and referred to as "GUIDs" are standard variant 1 RFC 4122/DCE 1.1 UUIDs network byte-order UUIDs, rather than little-endian variant 2 UUIDs. The current version of the Microsoft guidgen tool produces standard variant 1 UUIDs. Some Microsoft documentation states that "GUID" is a synonym for "UUID", as standardized in RFC 4122/DCE 1.1. RFC 4122 itself states that UUIDs "are also known as GUIDs". All this suggests that "GUID", while originally referring to a variant of UUID used by Microsoft, has become simply an alternative name for UUID, with both variant 1 and variant 2 GUIDs being extant.
Version 1 UUIDs are generated from a time and a node id (usually the MAC address); version 2 UUIDs are generated from an identifier (usually a group or user id), time, and a node id; versions 3 and 5 produce deterministic UUIDs generated by hashing a namespace identifier and name; and version 4 UUIDs are generated using a Randomness or Pseudorandomness number.
A 13- or 14-bit "uniquifying" clock sequence extends the timestamp in order to handle cases where the processor clock does not advance fast enough, or where there are multiple processors and UUID generators per node. With each version 1 UUID corresponding to a single point in space (the node) and time (intervals and clock sequence), the chance of two properly-generated version 1 UUID's being unintentionally the same is practically nil. Since the time and clock sequence total 74 bits, 274 (1.8x1022 or 18 sextillion) version 1 UUIDs can be generated per node id, at a maximum average rate of 163 billion per second per node id.
In contrast to other UUID versions, version 1 and 2 UUIDs based on MAC addresses from rely for their uniqueness in part on an identifier issued by a central registration authority, namely the Organizationally Unique Identifier (OUI) part of the MAC address, which is issued by the IEEE to manufacturers of networking equipment. The uniqueness of version 1 and 2 UUIDs based on network card MAC addresses also depends on network card manufacturers properly assigning unique MAC addresses to their cards, which like other manufacturing processes is subject to error.
Usage of the node's network card MAC address for the node id means that a version 1 UUID can be tracked back to the computer that created it. Documents can sometimes be traced to the computers where they were created or edited through UUIDs embedded into them by word processing software. This privacy hole was used when locating the creator of the Melissa virus.
RFC 4122 does allow the MAC address in a version 1 (or 2) UUID to be replaced by a random 48-bit node id, either because the node does not have a MAC address, or because it is not desirable to expose it. In that case, the RFC requires that the least significant bit of the first octet of the node id should be set to 1. This corresponds to the multicast bit in MAC addresses and setting it serves to differentiate UUIDs where the node id is randomly-generated from those based on MAC addresses from network cards, which typically have unicast MAC addresses.
Version 2 UUIDs are similar to version 1, except that the least significant 8 bits of the clock sequence are replaced by a "local domain" number, and the least significant 32 bits of the timestamp are replaced by an integer identifier meaningful within the specified local domain. On POSIX systems, local domain numbers 0 and 1 are for user ids (UIDs), and group ids (GIDs), respectively, and other local domain numbers are site-defined. On non-POSIX systems, all local domain numbers are site-defined.
The ability to include a 40-bit domain/identifier in the UUID comes with a tradeoff. On the one hand, 40 bits allow about 1 trillion domain/identifier values per node id. On the other hand, with the clock value truncated to the 28 most significant bits, compared to 60 bits in version 1, the clock in a version 2 UUID will "tick" only once every 429.49 seconds, a little more than 7 minutes, as opposed to every 100 nanoseconds for version 1. And with a clock sequence of only 6 bits, compared to 14 bits in version 1, only 64 unique UUID's per node/domain/identifier can be generated per 7 minute clock tick, compared to 16,384 clock sequence values for version 1. Thus, Version 2 may not be suitable for cases where UUIDs are required, per node/domain/identifier, at a rate exceeding about 1 per 7 seconds.
The namespace identifier is itself a UUID. The specification provides UUIDs to represent the namespaces for URLs, fully qualified domain names, object identifiers, and X.500 LDAP; but any desired UUID may be used as a namespace designator.
To determine the version 3 UUID corresponding to a given namespace and name, the UUID of the namespace is transformed to a string of bytes, concatenated with the input name, then hashed with MD5, yielding 128 bits. Six or seven bits are then replaced by fixed values, the 4-bit version (e.g. 0011 for version 3), and the 2- or 3-bit UUID "variant" (e.g. 10 indicating a RFC 4122 UUIDs, or 110 indicating a legacy Microsoft GUID). Since 6 or 7 bits are thus predetermined, only 121 or 122 bits contribute to the uniqueness of the UUID.
Version 5 UUIDs are similar, but SHA1 is used instead of MD5. Since SHA1 generates 160-bit digests, the digest is truncated to 128-bits before the version and variant bits are inserted.
Version 3 and 5 UUIDs have the property that the same namespace and name will map to the same UUID. However, neither the namespace nor name can be determined from the UUID, given the other, except by brute-force search.
Some pseudorandom number generators lack necessary entropy to produce sufficiently pseudorandom numbers. For example, the WinAPI GUID generator, which uses a pseudorandom number generator, has been shown to produce UUIDs which follow a predictable pattern. RFC 4122 advises that "distributed applications generating UUIDs at a variety of hosts must be willing to rely on the random number source at all hosts. If this is not feasible, the namespace variant should be used."
In contrast with version 1 and 2 UUIDs using randomly-generated node ids, hash-based version 3 and 5 UUIDs, and random version 4 UUIDs, collisions can occur even without implementation problems, albeit with a probability so small that it can normally be ignored. This probability can be computed precisely based on analysis of the birthday problem.
For example, the number of random version 4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion, computed as follows:
This number is equivalent to generating 1 billion UUIDs per second for about 85 years, and a file containing this many UUIDs, at 16 bytes per UUID, would be about 45 exabytes, many times larger than the largest databases currently in existence, which are on the order of hundreds of petabytes.http://www.computerworld.com/article/2960642/cloud-storage/cerns-data-stores-soar-to-530m-gigabytes.html
The smallest number of version 4 UUIDs which must be generated for the probability of finding a collision to be p is approximated by the formula:
Thus, for there to be a one in a billion chance of duplication, 103 trillion version 4 UUIDs must be generated.
One of the uses of UUIDs in Solaris (using Open Software Foundation implementation) is identification of a running operating system instance for the purpose of pairing crash dump data with Fault Management Event in the case of kernel panic.
The random nature of standard version 3, 4, and 5 UUIDs and the ordering of the fields within standard version 1 and 2 UUIDs may create problems with database locality or performance when UUIDs are used as . For example, in 2002 Jimmy Nilsson reported a significant improvement in performance with Microsoft SQL Server when the version 4 UUIDs being used as keys were modified to include a non-random suffix based on system time. This so-called "COMB" (combined time-GUID) approach made the UUIDs non-standard and significantly more likely to be duplicated, as Nilsson acknowledged, but Nilsson only required uniqueness within the application.