What the Yandex Code Leak Revealed About Search Engines

Last updated on Feb 15th, 2024 | 6 min

TL;DR: The Yandex source code leak revealed 17,853 ranking factors, initially thought to be 1,922. This leak is significant as it shows the similarities between Yandex and other major search engines like Google, including the use of PageRank and Okapi BM25 for ranking documents. It highlights factors such as page quality, user engagement, and the importance of mobile-friendliness. 


Ever wondered what goes inside the search engine giants like Google, Bing, and Yahoo?

On January 27, 2023, the fourth-largest search engine in the world, Yandex, reached the top headlines after a massive (44GB to be exact) data leak.

“So, the most popular search engine in Russia suffered a breach. How is it relevant to me?” you may ask. 

What many news channels waved off as a malicious act against the company and customer data security, digital experts evaluated it as a rare insight into how search engines operate.

And more importantly, what content do they prioritize and why.

The data leak made public what we initially thought were 1,922 ranking factors but thanks to Ben Wills, the number was corrected to 17,853. Massive, right?

Mic King tweet on Yandex

We went through all 1,992. 

Read on to see what we found most interesting.


Can I use the learnings from the Yandex leak to rank higher on Google?

It goes without saying that Yandex is not Google.

But several notable factors make this leak relevant (and educational) outside Russia and inside your search ranking experiments:

  • There’s ~70% match between Yandex and Google search results
  • Yandex uses PageRank (almost identical to the one in Google)
  • Yandex employs lots of ex-Googlers, and many speculate it was engineered in a similar fashion
  • Yandex follows similar information retrieval best practices as Google, like the inverting index or embeddings
  • Just like Google and Bing search engines, Yandex uses the Okapi BM25 ranking function to estimate the relevance of documents to a given search query

Alex Buraks tweet on Yandex

That being said, what the Yandex source code leak reveals helps us, for the first time, better differentiate between assuming and knowing how search rankings work.
 

Inside the Yandex code leak: 11 findings about how search engines operate

Generally, the Yandex ranking factors can be divided into three categories:

  • Static factors like inbound backlinks, inbound internal links, headers, ads ratio, etc. These relate to your website.
  • Dynamic factors like text relevance, keyword inclusions, etc. These relate to both your website and the search query.
  • User search-related factors like the user’s location, query language, intent modifiers, etc. These relate directly to the user query.

The biggest weighing factors used in the statistical models are:

Yandex weighing factors

 

1. Onpage Advertising

Advertising on a page is seen as a negative factor. As a matter of fact, it is the factor with the highest negative ranking weight:

Mic King Tweet on Yandex ranking factors weight

Multiple ad-related factors, like the number of ad placements on the page and if the background is clickable, suggest Yandex doesn’t like pages with a high ratio of adverts to the visible screen.


2. URL-Level Factors

The construction of the URL is another factor that Yandex takes into account. And more specifically:

  • The presence of numbers in the URL.
  • The number of trailing slashes (“/”) in the URL
  • The number of capital letters in the URL

Going back to the argument that Yandex isn’t Google, and this leak won’t be valuable, well, these URL factors much resemble the ones from Google’s URL Structure guidelines


3. Page-Level Factors

There’s a lot to unfold here. As it turns out, Yandex has numerous page-level factors that play a role in building the SERP. Some of the most notable ones include:

  • Page freshness - especially for blog content and news websites. It’s a negative ranking factor if a content page is older than 10 years. So update your content frequently.
  • Last destination - Yandex rewards pages that end the user's search journey, meaning they have found what they are looking for. 
  • Healthy traffic source ratio - Yandex does not like pages that get traffic from a single source (e.g., organic search). For a page to rank high, it needs to get traffic from all kinds of sources - organic, paid, direct, etc. 
  • Content quality - It’s essential for your text to be original and not stuffed with keywords. It’s a ranking boost if your text has been cited/linked in external domains. Also, having poor-quality content will bring down the rankability of the good-quality content.
     

4. Website-Level Factors

Building on the page freshness factor, the ultimate combination would be to have a well-established website that has been active for a long time and frequently update its content. 

Also, Yandex judges the overall quality of a website by its clickability perspective. In other words, how often do users click on the URL for the search?

Another positive ranking factor is the domain name. Yandex gives a ranking boost to .COM domains.


5. Page Quality

We’ve already discussed content quality, but what about the overall page quality? Well, Yandex evaluates a page’s quality based on several factors:

  • Number of visits
  • Number of unique visitors
  • Time spent on a page
  • Number of actions taken on a page

 

6. User Behavior and Engagement

There were several interesting user behavior takeaways from the leak that we need to discuss. 

A key factor is the number of clicks and impressions a host receives overall. Yandex also takes into account whether a page is mobile-friendly and analyzes user behavior on mobile devices, including session duration and time spent on the page. Also, when visitors return to a website within the same month, that’s a positive ranking factor.

But probably the most fascinating one is that:

Pages that feature user reviews are given priority in the search results.
 

7. Host Rank and Location

Yandex puts a lot of emphasis on prioritizing content that is geographically close to the user. So when two domain names are battling for the same search query, the one that’s closer to the user will get a ranking boost. 

In terms of technical ranking factors, reducing the number of 400 client errors and 500 server errors will put you ahead in the SERP. On top of that, Yandex pays a lot of attention to crawl depth. That’s why make sure that no important page should be more than two clicks away from your home page.


8. Backlinks Quality

Yandex employs various measures to penalize the creation of referral chains, which artificially inflate the popularity of a website.

One such measure is to analyze the percentage of hyperlinked text, as excessive linking can indicate manipulative behavior. It also considers the quality of the links directing to a site and penalizes sites with a large number of paid or low-quality links.

Speaking of link quality, factors that contribute directly to the link quality are the number of redirects and how the links are constructed.


9. Impact on Search Traffic

Similar to Google, websites that incorporate good SEO practices perform better than those that do not. Being easily discoverable is a surefire way to achieve a higher ranking on Yandex.

Unsurprisingly, pages that can serve the search intent are the leaders for the respective search query. Pages with the exact search query in their title tag and body text have an advantage. Also, the use of synonyms is another positive signal that can lead to a ranking boost. 


10. Wikipedia Boost

Another strong signal for a high-quality page is if it’s linked from Wikipedia. Yandex favors pages that are linked from Wikipedia and ranks them higher. 
 

11. Video Content

Websites that contain video content are prioritized. But there’s a catch, Yandex ranks higher pages with videos hosted by Yandex (duh).

In terms of evaluating a page’s video content, the standard measures apply - what the average watch time is against the total length of the video.

And while all of these 11 findings provided some great insights, there were some debates in the SEO world about whether the leaked data is valuable or not.
 

Reactions to the leaked ranking factors in the SEO world

From outright dismissal to in-depth analysis, SEOs expressed mixed opinions on the Yandex leak.

Kevin Indig summarized the most common objections and offered great input on the significance of the ranking factors.

Excerpt from Kevin Indig's article on Yandex data leak
Snippet from the article “SEOs are underestimating The Yandex leak” by Kevin Indig
 

Notable names in the SEO field, such as Ben Wills, Alex Buraks, and Mic King, rolled up their sleeves and dove deep to decode the leaked data.

Ben Wills is among the first experts to make sense of the source code and help us grasp the event's significance (even though he opened the Twitter thread a bit later).

Ben Willis Tweet on yandex

At about the same time, Rob Ousbey shared an alpha version of an explorer tool for the Yandex code:

Rob Ousbey tweet on Yandex

Mic King shared his first impressions live while going through the source code, telling us, “Don’t sleep on this code.”

Mic King tweet on Yandex


SEO forums in Russia were no less crowded, and webmasters had a similar insight to share as the Western SEO world. However, there was also a lot of talk about Yandex favoring its products and services, as covered by Russian SEO expert Dan Taylor.
 

Conclusion

Will the Yandex source code leak change how you do SEO? 

Probably not.

But it will be a huge mistake to label it as non-important and scroll past it.

Apparently, there are a lot of similarities between Yandex and Google. Hence, this leak can serve as a great starting point for more experiments and push you to focus on user experience and quality content even more.

So knuckle down and start testing. 
 

Lora Raykova
User Experience Content Strategist

Lora has spent the last 8 years developing content strategies that drive better user experiences for SaaS companies in the CEE region. In collaboration with WordPress subject-matter experts and the 2024 Web Almanac, she helps site owners close the gap between web performance optimization and real-life business results.