Link Rot Matters
Link rot (or linkrot) is an informal term for the process by which, sometimes as a result of changes to your website and sometimes due to no fault of your own, hyperlinks that used to work stop working. Continuously, over time, both on individual websites and across the Internet in general, increasing numbers of links point to web pages, servers or other resources that have become permanently unavailable. A link that doesn’t work anymore is called a broken link, a dead link or a dangling link. In any case, any link that doesn’t work is “rotten”.
Because broken links are very annoying to anyone trying to use them, they seriously disrupt the user experience and shake their confidence in the website that provided them. They typically live on for many years, alienating visitor after visitor after visitor… Sites containing them are regarded as unprofessional and unpleasant to navigate.
Think of the last time you were following a thread of information that led at long last to your grand prize, the information you were ultimately looking for. You clicked on the link expecting a successful conclusion to your search, and the page that promised you the exact information you were looking for was no longer there… FRUSTRATING!
Causes
A link may become broken for several reasons: The most common result of a dead link is a 404 error, which indicates that the web server responded, but the specific page could not be found.
Some news sites contribute greatly to the link rot problem by keeping recent news articles online where they are freely accessible, and then either removing them or moving them to a paid subscription area after a certain number of months. This causes a heavy loss of supporting links in sites discussing newsworthy events and using news sites as references.
Another type of dead link occurs when the server that hosts the target page stops working or relocates to a new domain name. In this case the browser may return a DNS error, or it may display a site unrelated to the content sought. The latter can occur when a domain name is allowed to lapse, and is subsequently reregistered by another party. Domain names acquired in this manner are attractive to those who wish to take advantage of the stream of unsuspecting surfers that will artificially inflate hit counters and Page Rankings.
Expired domains that were formerly websites are also sought for parked domain monetization. These domains do not respond with an error code and special software is needed to detect them.
A link might also be broken because of some form of blocking such as content filters or firewalls. Dead links that are commonplace on the Internet can also originate on your own website, when website content is assembled, copied, or deployed without properly verifying the targets, or simply not kept up to date.
Prevalence
The 404 “Not found” response is familiar to even the occasional Web user. A number of studies have examined the prevalence of link rot on the Web, in academic literature, and in digital libraries. In a 2003 experiment, Fetterly et al. (2003) discovered that about one link out of every 200 disappeared each week from the internet. McCown et al. (2005) discovered that half of the URLs cited in D-Lib Magazine articles were no longer accessible 10 years after publication, and other studies have shown link rot in academic literature to be even worse (Spinellis, 2003, Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital libraries and found that about 3% of the objects were no longer accessible after one year.
Discovering
Detecting link rot for a given URL is difficult using automated methods. If a URL is accessed and returns back an HTTP 200 (OK) response, it may be considered accessible, but the contents of the page may have changed and may no longer be relevant. Some web servers also return a soft 404, a page returned with a “200 OK” response (instead of a 404 that indicates the URL is no longer accessible).
Combating
Due to the unprofessional image and jarring experience that dead links bring to both the linking and linked to websites, there is a pressing business need to effectively eliminate them. A periodic automatic monitoring of all links is the best way to eradicate broken links as soon as they appear.
Credits : excerpts derived from the Wikipedia article about “Link Rot”, text available under the Creative Commons Attribution-ShareAlike License.
Our forum Broken Link Checker – Commercial Service is available for questions and comments.