URL

Several HTML elements, most notably the A element, may contain an attribute which takes a URL as value. URLs, Uniform Resource Locators, are addresses of Web documents. More generally, URLs can be used on the Web to refer to "objects" on the Web or in other information systems.

The general syntax of absolute URLs is the following:

scheme://host:port/path/filename

where

scheme

specifies the information system (technically speaking, the protocol) to be used to access the resource; possible values include the following:

`http`	a Web document (to be accessed using Hypertext Transfer Protocol, HTTP)
`ftp`	a resource to be retrieved using FTP (File Transfer Protocol), usually a file in a so-called FTP server,
`file`	a file on a particular computer; a `file` URL is hardly useful on the Web
`gopher`	a file in a Gopher server
`mailto`	electronic mail address
`news`	a newsgroup or an article in Usenet news
`telnet`	for starting an interactive session via the Telnet protocol (which is part of TCP/IP)

host

is the Internet host name in the domain notation, eg www.hut.fi (or sometimes a numerical TCP/IP address); notice that typically, but not necessarily, Web servers have domain names starting with www

:port

is the port number part, which can usually be omitted since it has a reasonable default; that is, omit it, unless it is a part of a URL which you got somewhere (or you really know what you are doing)

path

is a directory path within the host

filename

is a file name within the directory.

Warning: Although many browsers allow you to omit the part http:// when specifying the URL of a document to be visited, you must not omit it in when writing a normal URL into an HTML document. (Otherwise browsers will try to interpret it as a relative URL.)

Actually, this pattern is mainly for Web documents, ie http URLs. For other URLs, simplifications and special interpretations are applied. For example, a mailto URL is just of the form mailto:address where address is a normal Internet E-mail address like Jukka.Korpela@hut.fi (as specified in RFC 822). Please notice that appending anything to the E-mail address in a mailto URL is nonstandard and may result in lost mail without anyone noticing! (See also the discussion of mailto: URLs in the description of the A element.)

An http URL can also be a fragment identifier which consists of an absolute URL, the # sign and a name (which refers to a location within the document specified by the absolute URL). See the description of the A element for more information.

It is safest to enclose URLs in quotes when writing them as attribute values in HTML.

For an overview of URLs, see W3C material on addressing.

As regards to the technical specifications of the syntax of URLs, see RFC 1738 (absolute URLs) and RFC 1808 (relative URLs).

In particular, the specifications say that within a URL only a limited set of characters can be used as such:

alphanumeric characters (A to Z, a to z, 0 to 9)
the characters $-_.+!*'(),
the characters ;/?:@=&# provided that they are used in the special meaning reserved for them in the RFCs mentioned above.

Other characters must be encoded. (The characters ;/?:@=&# must also be encoded, if they are not used in the special meaning.) This encoding (which is defined by URL specifications, not HTML specifications) consists of using the percent sign followed by two hexadecimal digits, presenting the code position. For example, tilde (~) should be presented as %7E and space as %20. (Violating the rules causes problems much more likely in the latter case than in the former.)

When a URL occurs as an attribute value in HTML, there is another complication caused by the & character which may have special use in query form submissions. In principle, that character should be escaped as & or as & (there is a footnote in the HTML 2.0 specification about this) and browsers should process it so that the actual URL passed to the processing CGI script has that notation replaced by plain & character. (Notice that it must not be encoded. This is a confusing issue, and CGI scripts should really be written so that semicolon ; and not ampersand & is used as field separator.)