OTTO is one of the largest online department stores in The Netherlands. With a wide range of more than 160.000 products, including women’s, men’s, and children’s fashion, multimedia, home, household, and garden appliances, customers can find almost everything for in and around their house at OTTO.

Due to this large assortment of products, which must be up-to-date at all times, it is of great importance that all data is well-structured and that all processes run smoothly. The information that OTTO provides on its website is constantly changing, which affects the organic search results from search engines. Changes to the website can therefore have great but also disastrous consequences for SEO scores. To gain a competitive advantage, it is important to have good supervision of accurate product descriptions, stocks, prices, etc., to make sure OTTO continues to score well on organic search keywords.

To keep track of the website’s health and to detect SEO-related problems at an early stage, OTTO wanted an in-house monitoring system that kept track of information over a longer period of time. With this question, they turned to Artefact.

The prerequisites of our monitoring system

After discussing the challenge with OTTO, we came up with several requirements for the monitoring system. It needs to be:

  • Able to retrieve up-to-date data on a weekly basis;
  • Available on an automated level;
  • Capable of processing large amounts of data;
  • GDPR-proof and safe storage of data;
  • Give full control of who, what, where, and when data is being retrieved and access to the data;
  • A clear dashboard that shows immediate changes for SEO specialists as well as laymen;
  • Able to show push notifications when a big error is detected.

If the monitoring system met all these requirements, it would eventually be able to reduce the percentage of website errors for SEO scores from 10% to 5%.

The solution: an in-house crawler

We quickly found that a crawler would be the best solution for our demand. A crawler is an algorithm that conducts automated studies (i.e., crawls) on the technical health of the website. The results of the crawls are presented in a clear dashboard that can be used as a strategic instrument to monitor and improve technical aspects, as well as the content on the website. Even though OTTO already made use of a crawler, this existing crawler did not meet all of our requirements. Therefore, we decided to build our own crawler, in full property of OTTO, that provides us with up-to-date insights to help improve SEO scores. This crawler had to map where website errors (4XX and 5XX status codes) took place on the website, to trace and handle them quickly.

It’s important to be notified of errors quickly to optimize the crawlability of the website. Weekly automated audits by the crawler help identify and correct errors such as dead links and missing pages, in which the automatization is the most important part. Automated tools and scripts enable quick problem-solving that cannot be realized manually. Next to that, it saves OTTO time and costs.

Implementation steps

We took several steps to build the crawler, some of which were very useful, others – not so much. We will highlight the most important ones:

  • The first step was to obtain the ‘Google Cloud Certified Cloud Digital Leader’ certification from Google for the entire SEO team.

  • After obtaining the certification, we dove into Google Cloud and tested different server configurations (lots of storage + less RAM, or less storage + lots of RAM) and interfaces (GUI, headless) to optimize efficiency.

  • When the first proof-of-work was finished, we started testing the BigQuery connection to create the dashboard in LookerStudio. The first test was successful but needed improving and improvements in scalability. Together with our Data Engineering and Data Analytics teams, we build a proof-of-concept to check the feasibility of building our own crawler. The most important factors here were the scalability and precision of the data.

  • Next, we added a category filter and extra fields to the data to generate overviews that were easy to understand and download, for both SEO specialists and laymen.

  • Finally, we built a clear dashboard based on the Screaming Frog template.

Improving CTR and website errors in minutes

The crawler has just been launched on OTTO’s website and is starting to gather data. Even though the crawler has not been operational for a long time, we have some preliminary results to share.

Obtained results

  • Detection of approximately 130.000 wrong/missing/too short or long meta titles and descriptions, which leads to an improved CTR after tackling these issues;
  • 50% decrease in 4XX pages;
  • The percentage of 404 URLs has decreased from 6.6% to 3%;
  • The number of no-index URLs with a depth of ≥6 has been decreased from 6200 to 0.

Expected results

  • Improvements in sitemap;
  • Decrease the number of competing URLs by means of canonical tags and internal links;
  • Decrease the number of indexed URLs that are being canonized;
  • Improve the internal link structure;
  • Optimise headers in terms of length and prevent duplicates / multiples on the same page;
  • Pagespeed optimizations;
  • Orphan page optimizations.

The prerequisites that were set for the crawler have all been met. The biggest advantages of this crawler are that it is in full ownership of OTTO and that the data no longer needs to be manually retrieved in batches, which saves a lot of time. In addition, we have full control over what the crawler does, who has access to it, and where the data is being stored in a GDPR-compliant manner.