Alexa Internet uses crawling, archiving, categorizing, and data mining techniques to build the Related Links lists for millions of Web URLs. One technique used is to analyze links on the crawled pages to find related sites. The day-to-day use of the Alexa service and Related Links by all Alexa users also helps build and refine the data. By looking at high-level trends within the millions of URL “paths” created by Alexa users, we can deduce relationships between Web sites.
For example, if many users go directly from site A to site B, the two sites are likely to be related. Next, all the URLs are checked to make sure they are live links. This process removes links that would take you to pages that don’t exist (404 errors), as well as any links to servers that aren’t available to the general Internet population, such as servers that are no longer active or are behind firewalls.
Finally, once all of the relationships are established and the links are checked, the top Related Links for each URL are automatically chosen by looking at the strength of the relationship between the sites. Alexa Internet recrawls the Web on a regular basis and rebuilds the data to pull in new sites and to refine the relationships between the existing sites. New sites with strong relationships to a site will automatically appear in the Related Links list for that site by displacing any sites with weaker relationships.
Please note that since the relationships between sites are based on strength, Related Links lists are not necessarily balanced. Site A may appear in the list for Site B, but Site B may not be in the list for Site A. Generally, this happens when the number of sites with strong relationships is greater than ten, or when sites do not have similar enough content.