Monday, November 24, 2008

How do you turn any URL into an absolute URL?

HTML5's working draft provides a clear definition of how to do things that were until no so far ago "word of mouth" or "tricks of the trade" but lacked proper definitions.
On such thing is how to render what is supposed to be a URI into an absolute URL.

Here's the proposal from the HTML 5 draft from 24/11/2008:

The document base URL of a Document is the absolute URL obtained by running these steps:
  1. If there is no base element that is both a child of the head element and has an href attribute, then the document base URL is the document's address.
  2. Otherwise, let url be the value of the href attribute of the first such element.
  3. Resolve the url URL, using the document's address as the base URL (thus, the base href attribute isn't affect by xml:base attributes).
  4. The document base URL is the result of the previous step if it was successful; otherwise it is the document's address.
To resolve a URL to an absolute URL the user agent must use the following steps. Resolving a URL can result in an error, in which case the URL is not resolvable.
  1. Let url be the URL being resolved.
  2. Let document be the Document associated with url.
  3. Let encoding be the character encoding of document.
  4. If encoding is UTF-16, then change it to UTF-8.
  5. Let base be the base URL for url. (This is an absolute URL.)
  6. Parse url into its component parts.
  7. If parsing url resulted in a component, then replace the matching subtring of url with the string that results from expanding any sequences of percent-encoded octets in that component that are valid UTF-8 sequences into Unicode characters as defined by UTF-8.
    If any percent-encoded octets in that component are not valid UTF-8 sequences, then return an error and abort these steps.
    Apply the IDNA ToASCII algorithm to the matching substring, with both the AllowUnassigned and UseSTD3ASCIIRules flags set. Replace the matching substring with the result of the ToASCII algorithm.
    If ToASCII fails to convert one of the components of the string, e.g. because it is too long or because it contains invalid characters, then return an error and abort these steps. [RFC3490]
  8. If parsing url resulted in a component, then replace the matching substring of url with the string that results from applying the following steps to each character other than U+0025 PERCENT SIGN (%) that doesn't match the original production defined in RFC 3986:
    1. Encode the character into a sequence of octets as defined by UTF-8.
    2. Replace the character with the percent-encoded form of those octets. [RFC3986]
    For instance if url was "//example.com/a^b☺c%FFd%z/?e", then the component's substring would be "/a^b☺c%FFd%z/" and the two characters that would have to be escaped would be "^" and "". The result after this step was applied would therefore be that url now had the value "//example.com/a%5Eb%E2%98%BAc%FFd%z/?e".
  9. If parsing url resulted in a component, then replace the matching substring of url with the string that results from applying the following steps to each character other than U+0025 PERCENT SIGN (%) that doesn't match the original production defined in RFC 3986:
    1. If the character in question cannot be expressed in the encoding encoding, then replace it with a single 0x3F octet (an ASCII question mark) and skip the remaining substeps for this character.
    2. Encode the character into a sequence of octets as defined by the encoding encoding.
    3. Replace the character with the percent-encoded form of those octets. [RFC3986]
  10. Apply the algorithm described in RFC 3986 section 5.2 Relative Resolution, using url as the potentially relative URI reference (R), and base as the base URI (Base). [RFC3986]
  11. Apply any relevant conformance criteria of RFC 3986 and RFC 3987, returning an error and aborting these steps if appropriate. [RFC3986] [RFC3987]
    For instance, if an absolute URI that would be returned by the above algorithm violates the restrictions specific to its scheme, e.g. a data: URI using the "//" server-based naming authority syntax, then user agents are to treat this as an error instead.
  12. Let result be the target URI (T) returned by the Relative Resolution algorithm.
  13. If result uses a scheme with a server-based naming authority, replace all U+005C REVERSE SOLIDUS (\) characters in result with U+002F SOLIDUS (/) characters.
  14. Return result.
A URL is an absolute URL if resolving it results in the same URL without an error.

HTML tags with attributes that have link semantics

Static analysis of HTML pages for harvesting links in the code should not necessarily focus on href attributes of anchor tags. Here's a short list of attributes that take a URI as a value along with their HTML tag names:















Tag NameAttribute Name
ahref
imgsrc
scriptsrc
iframesrc
basehref
formaction
linkhref
inputusemap
inputsrc
headprofile
framesrc
framelongdesc

What is "to exacebrate?

I'm reading a book and seeing the inflected verb exacebrated and wondered what it means.
Answers.com says:
To increase the severity, violence, or bitterness of; aggravate: a speech that exacerbated racial tensions; a heavy rainfall that exacerbated the flood problems.
Let's see some examples of using this verb in a sentence:
Global Warming to Exacerbate China's Water Crisis
or
Smoking may exacerbate the increased risk of a blood vessel bursting inside the brain (intracerebral stroke) already faced by people with high blood pressure
So we see that its meaning is to make things worse and move severe.