HTML5's working draft provides a clear definition of how to do things that were until no so far ago "word of mouth" or "tricks of the trade" but lacked proper definitions.
On such thing is how to render what is supposed to be a URI into an absolute URL.
Here's the proposal from the HTML 5 draft from 24/11/2008:
The document base URL of a
Document
is the absolute URL obtained by running these steps:- If there is no
base
element that is both a child of thehead
element and has anhref
attribute, then the document base URL is the document's address. - Otherwise, let url be the value of the
href
attribute of the first such element. - The document base URL is the result of the previous step if it was successful; otherwise it is the document's address.
To resolve a URL to an absolute URL the user agent must use the following steps. Resolving a URL can result in an error, in which case the URL is not resolvable.
- Let url be the URL being resolved.
- Let document be the
Document
associated with url. - Let encoding be the character encoding of document.
- If encoding is UTF-16, then change it to UTF-8.
- Let base be the base URL for url. (This is an absolute URL.)
- Parse url into its component parts.
- If parsing url resulted in a
component, then replace the matching subtring of url with the string that results from expanding any sequences of percent-encoded octets in that component that are valid UTF-8 sequences into Unicode characters as defined by UTF-8. If any percent-encoded octets in that component are not valid UTF-8 sequences, then return an error and abort these steps.Apply the IDNA ToASCII algorithm to the matching substring, with both the AllowUnassigned and UseSTD3ASCIIRules flags set. Replace the matching substring with the result of the ToASCII algorithm.If ToASCII fails to convert one of the components of the string, e.g. because it is too long or because it contains invalid characters, then return an error and abort these steps. [RFC3490] - If parsing url resulted in a
component, then replace the matching substring of url with the string that results from applying the following steps to each character other than U+0025 PERCENT SIGN (%) that doesn't match the original production defined in RFC 3986: - Encode the character into a sequence of octets as defined by UTF-8.
- Replace the character with the percent-encoded form of those octets. [RFC3986]
For instance if url was "//example.com/a^b☺c%FFd%z/?e
", then thecomponent's substring would be " /a^b☺c%FFd%z/
" and the two characters that would have to be escaped would be "^
" and "☺
". The result after this step was applied would therefore be that url now had the value "//example.com/a%5Eb%E2%98%BAc%FFd%z/?e
". - If parsing url resulted in a
component, then replace the matching substring of url with the string that results from applying the following steps to each character other than U+0025 PERCENT SIGN (%) that doesn't match the original production defined in RFC 3986: - If the character in question cannot be expressed in the encoding encoding, then replace it with a single 0x3F octet (an ASCII question mark) and skip the remaining substeps for this character.
- Encode the character into a sequence of octets as defined by the encoding encoding.
- Replace the character with the percent-encoded form of those octets. [RFC3986]
- Apply the algorithm described in RFC 3986 section 5.2 Relative Resolution, using url as the potentially relative URI reference (R), and base as the base URI (Base). [RFC3986]
- Apply any relevant conformance criteria of RFC 3986 and RFC 3987, returning an error and aborting these steps if appropriate. [RFC3986] [RFC3987]For instance, if an absolute URI that would be returned by the above algorithm violates the restrictions specific to its scheme, e.g. a
data:
URI using the "//
" server-based naming authority syntax, then user agents are to treat this as an error instead. - Let result be the target URI (T) returned by the Relative Resolution algorithm.
- If result uses a scheme with a server-based naming authority, replace all U+005C REVERSE SOLIDUS (\) characters in result with U+002F SOLIDUS (/) characters.
- Return result.
No comments:
Post a Comment