Google’s Progress in Fighting Link Spam
Those who say that links are on the decline as a ranking factor often point to the efforts by spammers to use illegitimate practices to acquire links and earn rankings that their sites don’t deserve. This was a huge problem for Google in the 2002 to 2013 time period. However, the tide in this battle started to turn in 2012.
What happened first was that Google began to levy a wave of manual penalties. That alone sent shock waves through the SEO industry. The next major step was the release of the first version of Penguin on April 24, 2012. This was a huge step forward for Google.
As the next few years unfolded, Google invested heavily in a mix of approaches to use new versions of Penguin and manual penalties to refine its approach to dealing with people that use illegitimate approaches to obtaining links. This culminated with the release of Penguin 4.0 on September 23, 2016.
With the release of Penguin 4.0, Google’s confidence in its approach to links had become so high that the Penguin algorithm was no longer punishing sites for obtaining bad links. As of Penguin 4.0, the algorithm simply identifies links it considers bad and ignores them (causes them to have no ranking value).
This shift from penalizing sites with bad links to simply discounting those links reflects Google’s confidence that Penguin is finding a very large percentage of the bad links that it’s designed to find.
Of course, they still use manual penalties to address types of illegitimate link-building practices that people use that Penguin is not targeted at addressing.
How much progress has Google actually made? We still remember the Black Hat/White Hat panel on in December 2008 at SES Chicago. Other panelists included Dave Naylor, Todd Friesen and Doug Heil. A couple of the panel members argued that buying links at the beginning of campaigning for a website was a requirement, and it was irresponsible for an SEO pro to not do so.
How a decade changes things! It has been many years since any SEO in any venue has argued that buying links is a smart practice. In fact, you can’t find anyone making public recommendations about methods for obtaining links that violate Google’s Webmaster guidelines. The entire industry for doing those type of things has been driven underground. Driven underground is not the same as “gone,” but it does show that Google’s ability to find and detect problems has become quite effective.
One last point, and it’s an important one. Why does Google have the Penguin algorithm, and why does it assess manual link penalties? The answer is simple: Because links are a major ranking factor, and schemes to obtain links that don’t fit its guidelines are things that Google wants to proactively address. Otherwise it would not need to invest in fighting link spam.
Why are Links a Valuable Signal?
Why is Google still using links? Why not simply switch to user engagement signals and social media signals? We won’t develop the entire reason why these signals are problematic here, but will share brief points about each:
- 1. Social Media Signals: Two major reasons: (1) Google can’t be dependent on signals from third-party platforms that are run by its competitors (Google and Facebook are not friends); and (2) Major social media sites such as Facebook and LinkedIn have stopped sharing data on likes and shares – if the social media sites themselves don’t find these signals valuable, why should a search engine?
-
- 2. User Engagement Signals: Google probably finds some way to use these signals in one scenario or another, but there are limitations to what it can do. Here is what the head of Google's machine learning team, Jeff Dean, said about them: “An example of a messier reinforcement learning problem is perhaps trying to use it in what search results should we show. There’s a much broader set of search results we can show in response to different queries, and the reward signal is a little noisy. If a user looks at a search result and likes it or doesn’t like it, that’s not that obvious.”
But now, let’s get to the core of the issue: Why are links such a great signal? It comes down to three major points:
- 1. Implementing links requires a material investment to be made by you. You must own a website and you must take the time to implement the link on a web page. This may not be a huge investment, but it’s significantly more effort than it is to implement a link in a social media post.
-
- 2. When you implement a link, you are making a public endorsement identifying your brand with the web page that you’re linking to. In addition, it’s static. It sits there in an enduring manner. In contrast, with a link in a social media post, it’s gone from people’s feeds quickly, sometimes only in minutes.
-
- 3. Now, here is the big one: When you implement a link on a page on your site, people might click on it and leave your site. In fact, you’re inviting them to do so.
Think about that last one for a few seconds more. A (non-advertisement) link on your site is an indication by you (as the publisher of the page with the links) that you think the link has enough value to your visitors, and will do enough to enhance your relationship with those visitors, that you’re willing to have people leave your site.
That’s what makes links an incredibly valuable signal.
Basic Methodology
After consulting with a couple of experts (Paul Berger and Per Enge) on the best approach, we performed a calculation of the Spearman Correlation on the results for all the queries in our study, and then took the Quadratic Mean of those scores. The reason for doing this is that it leverages the square of the correlation variables (where the correlation value is R, the quadratic mean uses R squared).
It’s actually the R squared value that has some meaning in statistics. For example, if R is 0.8, then R squared is 0.64, and you can say about that 64 percent of the variability in Y is explained by X. As Paul Berger explained it to me, there is no meaningful sentence involving the correlation variable R, but R squared gives you something meaningful to say about the correlated relationship.
Here is a visual on how this calculation process works:

In addition to the different calculation approach, we also used a mix of different query types. We tested commercial head terms, commercial long tail terms and also informational queries. In fact, two-thirds of our queries were informational in nature.