Ever wondered what goes inside the search engine giants like Google, Bing, and Yahoo?
On January 27, 2023, the fourth-largest search engine in the world, Yandex, reached the top headlines after a massive (44GB to be exact) data leak.
“So, the most popular search engine in Russia suffered a breach. How is it relevant to me?” you may ask.
What many news channels waved off as a malicious act against the company and customer data security, digital experts evaluated it as a rare insight into how search engines operate.
And more importantly, what content do they prioritize and why.
The data leak made public what we initially thought were 1,922 ranking factors but thanks to Ben Wills, the number was corrected to 17,853. Massive, right?
We went through all 1,992.
Read on to see what we found most interesting.
It goes without saying that Yandex is not Google.
But several notable factors make this leak relevant (and educational) outside Russia and inside your search ranking experiments:
That being said, what the Yandex source code leak reveals helps us, for the first time, better differentiate between assuming and knowing how search rankings work.
Generally, the Yandex ranking factors can be divided into three categories:
The biggest weighing factors used in the statistical models are:
Advertising on a page is seen as a negative factor. As a matter of fact, it is the factor with the highest negative ranking weight:
Multiple ad-related factors, like the number of ad placements on the page and if the background is clickable, suggest Yandex doesn’t like pages with a high ratio of adverts to the visible screen.
The construction of the URL is another factor that Yandex takes into account. And more specifically:
Going back to the argument that Yandex isn’t Google, and this leak won’t be valuable, well, these URL factors much resemble the ones from Google’s URL Structure guidelines.
There’s a lot to unfold here. As it turns out, Yandex has numerous page-level factors that play a role in building the SERP. Some of the most notable ones include:
Building on the page freshness factor, the ultimate combination would be to have a well-established website that has been active for a long time and frequently update its content.
Also, Yandex judges the overall quality of a website by its clickability perspective. In other words, how often do users click on the URL for the search?
Another positive ranking factor is the domain name. Yandex gives a ranking boost to .COM domains.
We’ve already discussed content quality, but what about the overall page quality? Well, Yandex evaluates a page’s quality based on several factors:
There were several interesting user behavior takeaways from the leak that we need to discuss.
A key factor is the number of clicks and impressions a host receives overall. Yandex also takes into account whether a page is mobile-friendly and analyzes user behavior on mobile devices, including session duration and time spent on the page. Also, when visitors return to a website within the same month, that’s a positive ranking factor.
But probably the most fascinating one is that:
Pages that feature user reviews are given priority in the search results.
Yandex puts a lot of emphasis on prioritizing content that is geographically close to the user. So when two domain names are battling for the same search query, the one that’s closer to the user will get a ranking boost.
In terms of technical ranking factors, reducing the number of 400 client errors and 500 server errors will put you ahead in the SERP. On top of that, Yandex pays a lot of attention to crawl depth. That’s why make sure that no important page should be more than two clicks away from your home page.
Yandex employs various measures to penalize the creation of referral chains, which artificially inflate the popularity of a website.
One such measure is to analyze the percentage of hyperlinked text, as excessive linking can indicate manipulative behavior. It also considers the quality of the links directing to a site and penalizes sites with a large number of paid or low-quality links.
Speaking of link quality, factors that contribute directly to the link quality are the number of redirects and how the links are constructed.
Similar to Google, websites that incorporate good SEO practices perform better than those that do not. Being easily discoverable is a surefire way to achieve a higher ranking on Yandex.
Unsurprisingly, pages that can serve the search intent are the leaders for the respective search query. Pages with the exact search query in their title tag and body text have an advantage. Also, the use of synonyms is another positive signal that can lead to a ranking boost.
Another strong signal for a high-quality page is if it’s linked from Wikipedia. Yandex favors pages that are linked from Wikipedia and ranks them higher.
Websites that contain video content are prioritized. But there’s a catch, Yandex ranks higher pages with videos hosted by Yandex (duh).
In terms of evaluating a page’s video content, the standard measures apply - what the average watch time is against the total length of the video.
And while all of these 11 findings provided some great insights, there were some debates in the SEO world about whether the leaked data is valuable or not.
From outright dismissal to in-depth analysis, SEOs expressed mixed opinions on the Yandex leak.
Kevin Indig summarized the most common objections and offered great input on the significance of the ranking factors.
Snippet from the article “SEOs are underestimating The Yandex leak” by Kevin Indig
Notable names in the SEO field, such as Ben Wills, Alex Buraks, and Mic King, rolled up their sleeves and dove deep to decode the leaked data.
Ben Wills is among the first experts to make sense of the source code and help us grasp the event's significance (even though he opened the Twitter thread a bit later).
At about the same time, Rob Ousbey shared an alpha version of an explorer tool for the Yandex code:
Mic King shared his first impressions live while going through the source code, telling us, “Don’t sleep on this code.”
SEO forums in Russia were no less crowded, and webmasters had a similar insight to share as the Western SEO world. However, there was also a lot of talk about Yandex favoring its products and services, as covered by Russian SEO expert Dan Taylor.
Will the Yandex source code leak change how you do SEO?
Probably not.
But it will be a huge mistake to label it as non-important and scroll past it.
Apparently, there are a lot of similarities between Yandex and Google. Hence, this leak can serve as a great starting point for more experiments and push you to focus on user experience and quality content even more.
So knuckle down and start testing.
Lora has spent the last 8 years developing content strategies that drive better user experiences for SaaS companies in the CEE region. In collaboration with WordPress subject-matter experts and the 2024 Web Almanac, she helps site owners close the gap between web performance optimization and real-life business results.