Patent on How to Identify a Web Page as a Blog

In our continuing series of blog entries focused on patents about blog entries, here is an interesting patent on how to identify a web page that is a blog. While we, human beings, can probably often identify a Web page that we’re looking at as being a blog, the web crawlers and other automated systems had a need to do so in a more systematic way.

US7565350 B2

Identifying a web page as belonging to a blog


Abstract: A machine learning classifier is used to determine whether a web page belongs to a blog, based on a number of characteristics of web pages (e.g., presence of words such as permalink, or being hosted on a known blogging site). The classifier may be initially trained using human-judged examples. After classifying web pages as being blog pages, the blog pages may be further identified or categorized as top level blogs based on their URLs, for example.

– J.A.