William Slawski writes in his blog, SEO by the SEA, about Patents assigned to Google. It is a worth reading and does make you feel like starting reading Google patents :-)
I am listing those that seem to be of most interest to me at this time (I might think otherwise tomorrow :-)
* 6,615,209 Detecting query-specific duplicate documents (Google, Inc.)
* 6,658,423 Detecting duplicate and near-duplicate files (Google, Inc.)
* 7,158,961 Methods and apparatus for estimating similarity (Google, Inc.)
* 7,366,718 Detecting duplicate and near-duplicate files (Google, Inc.)
I don't know if and when I do actually have the time to read those, but I'm listing them for later, just in case.