An email address identifies an email box to which email messages are delivered. A wide variety of formats were used in early email systems, but only a single format is used today, following the standards developed for Internet mail systems since the 1980s. This article uses the term email address to refer to the addr-spec defined in RFC 5322, not to the address that is commonly used; the difference is that an address may contain a display name, a comment, or both.
An email address such as John.Smith@example.com is made up of a local-part, an @ symbol, then a case-insensitive domain. Although the standard requires the local part to be case-sensitive, it also urges that receiving hosts deliver messages in a case-independent fashion, e.g., that the mail system at example.com treat John.Smith as equivalent to john.smith; some mail systems even treat them as equivalent to johnsmith. "...you can add or remove the dots from a Gmail address without changing the actual destination address; and they'll all go to your inbox...", Google.com Mail systems often limit their users' choice of name to a subset of the technically valid characters, and in some cases also limit which addresses it is possible to send mail to.
With the introduction of internationalized domain names, efforts are progressing to permit non-ASCII characters in email addresses.
The general format of an email address is local-part@domain, and a specific example is jsmith@example.com. An address consists of two parts. The part before the @ symbol (local-part) identifies the name of a mailbox. This is often the username of the recipient, e.g., jsmith. The part after the @ symbol (domain) is a domain name that represents the administrative realm for the mail box, e.g., a company's domain name, example.com.
When delivering email, an SMTP client, e.g., Mail User Agent (MUA), Mail Transfer Agent (MTA), uses the domain name system (DNS) to look up a Resource Record (RR) for the recipient's domain (the part of the email address on the right of @); if there is a mail exchanger Resource Record (MX record) then the returned MX record contains the name of the recipient's mailserver, otherwise the SMTP client uses an address record (A record or AAAA record). The MTA next connects to this server as an SMTP client. The local part of an email address has no significance for intermediate mail relay systems other than the final mailbox host. Email senders and intermediate relay systems must not assume it to be case-insensitive, since the final mailbox host may or may not treat it as such. A single mailbox may receive mail for multiple email addresses, if configured by the administrator. Conversely, a single email address may be the alias to a distribution list to many mailboxes. , electronic mailing lists, sub-addressing, and Email filtering addresses, the latter being mailboxes that receive messages regardless of the local part, are common patterns for achieving a variety of delivery goals.
The addresses found in the header fields of an email message are not directly used by mail exchangers to deliver the message. An email message also contains a message envelope that contains the information for mail routing. While envelope and header addresses may be equal, forged email addresses are often seen in email spam, phishing, and many other Internet-based scams. This has led to several initiatives which aim to make such forgeries easier to spot.
To indicate the message recipient, an email address also may have an associated display name for the recipient, which is followed by the address specification surrounded by angled brackets, for example: John Smith
Earlier forms of email addresses on other networks than the Internet included other notations, such as that required by X.400, and the UUCP bang path notation, in which the address was given in the form of a sequence of computers through which the message should be relayed. This was widely used for several years, but was superseded by the Internet standards promulgated by the Internet Engineering Task Force (IETF).
Note that some mail servers wildcard local parts, typically the characters following a plus and less often the characters following a minus, so fred+bah@domain and fred+foo@domain might end up in the same inbox as fred+@domain or even as fred@domain. This can be useful for tagging emails for sorting, see below, and for spam control. Braces { and } are also used in that fashion, although less often.
In addition to the above ASCII characters, international characters above U+007F, encoded as UTF-8, are permitted by RFC 6531, though even mail systems that support SMTPUTF8 and 8BITMIME may restrict which characters to use when assigning local-parts.
A local part is either a Dot-string or a Quoted-string; it cannot be a combination. Quoted strings and characters however, are not commonly used. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".
The local-part postmaster is treated specially—it is case-insensitive, and should be forwarded to the domain email administrator. Technically all other local-parts are case-sensitive, therefore jsmith@example.com and JSmith@example.com specify different mailboxes; however, many organizations treat uppercase and lowercase letters as equivalent. Indeed, RFC 5321 warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where ... the Local-part is case-sensitive".
Despite the wide range of special characters which are technically valid, organisations, mail services, mail servers and mail clients in practice often do not accept all of them. For example, Windows Live Hotmail only allows creation of email addresses using alphanumerics, dot (.), underscore (_) and hyphen (-).. However, the phrase is hidden, thus one has to either check the availability of an invalid ID, e.g. me#1, or resort to alternative displaying, e.g. no-style or source view, in order to read it. Common advice is to avoid using some special characters to avoid the risk of rejected emails.
Comments are allowed in the domain as well as in the local-part; for example, john.smith@(comment)example.com and john.smith@example.com(comment) are equivalent to john.smith@example.com.
Addresses of this form, using various separators between the base name and the tag, are supported by several email services, including Runbox (plus), Gmail (plus), Rackspace Email (plus), Yahoo! Mail Plus (hyphen), Apple's iCloud (plus), Outlook.com (plus), ProtonMail (plus), FastMail (plus and Subdomain Addressing), MMDF (equals), Qmail and Courier Mail Server (hyphen). Postfix allows configuring an arbitrary separator from the legal character set.
The text of the tag may be used to apply filtering, or to create single-use, or disposable email addresses.Gina Trapani (2005) "Instant disposable Gmail addresses"
In practice, the form validation of some web sites may reject special characters such as "+" in an email address – treating them, incorrectly, as invalid characters. This can lead to an incorrect user receiving an e-mail if the "+" is silently stripped by a website without any warning or error messages. For example, an email intended for the user-entered email address foo+bar@example.com could be incorrectly sent to foobar@example.com. In other cases a poor user experience can occur if some parts of a site, such as a user registration page, allow the "+" character whilst other parts, such as a page for unsubscribing from a site's mailing list, do not.
An email address is generally recognized as having two parts joined with an at-sign ( @). However, the technical specification detailed in RFC 822 and subsequent RFCs are more extensive. I Knew How To Validate An Email Address Until I Read The RFC A regular expression can be used to check for all of these criteria, except that of bracketed nested comments. Mail::RFC822::Address
Syntactically correct, verified email addresses do not guarantee email box existence. Thus many mail servers use other techniques and check the mailbox existence against relevant systems such as the Domain Name System for the domain or using callback verification to check if the mailbox exists. This is however often disabled to avoid directory harvest attack.
Assuring an email address is of a good quality requires a combination of various validation techniques. Large websites, bulk mailers and spammers require fast algorithms that predict validity of email address. Such methods depend heavily on heuristic algorithms and statistical models. Verification & Validation Techniques for Email Address Quality Assurance by Jan Hornych 2011, University of Oxford
Many websites evaluate the validity of email addresses differently than the standards specify, rejecting addresses containing valid characters, such as + and /, or enforcing arbitrary length limitations. RFC 3696 provides specific advice for validating Internet identifiers, including email addresses.
HTML5 forms implemented in many browsers allow email address validation to be handled by the browser.
Email address internationalization provides for a much larger range of characters than many current validation algorithms allow, such as all Unicode characters above U+0080, encoded as UTF-8.
The IETF's EAI Working group published RFC 6530 "Overview and Framework for Internationalized Email", which enabled non-ASCII characters to be used in both the local-parts and domain of an email address. RFC 6530 provides for email based on the UTF-8 encoding, which permits the full repertoire of Unicode. RFC 6531 provides a mechanism for SMTP servers to negotiate transmission of the SMTPUTF8 content.
The basic EAI concepts involve exchanging mail in UTF-8. Though the original proposal included a downgrading mechanism for legacy systems, this has now been dropped. The local servers are responsible for the local-part of the address, whereas the domain would be restricted by the rules of internationalized domain names, though still transmitted in UTF-8. The mail server is also responsible for any mapping mechanism between the IMA form and any ASCII alias.
EAI enables users to have a localized address in a native language script or character set, as well as an ASCII form for communicating with legacy systems or for script-independent use. Applications that recognize internationalized domain names and mail addresses must have facilities to convert these representations.
Significant demand for such addresses is expected in China, Japan, Russia, and other markets that have large user bases in a non-Latin-based writing system.
For example, in addition to the .in top-level domain, the government of India in 2011 got approval for ".bharat", (from Bhārat Gaṇarājya), written in seven Brahmic scripts for use by Gujrati, Marathi, Bangali, Tamil, Telugu, Punjabi and Urdu speakers. Indian company XgenPlus.com claims to be the world's first EAI mailbox provider, and the Government of Rajasthan now supplies a free email account on domain राजस्थान.भारत for every citizen of the state. A leading media house Rajasthan Patrika launched their IDN domain पत्रिका.भारत with contactable email.
|
|