Subscribe to our blog

Combing through the content on your site and removing some of it may seem shocking. But the benefits of pruning content can have a positive impact on visibility and traffic if done properly.

The importance of quality content for any website is echoed in digital marketing. Having quality pages enhances user experience and establishes authority with search engines. But since “quality content” is a subjective term, how can you be sure that the content on your site is high-quality? The easiest way to do this is by conducting a content audit.

A content audit can inform you of any low content performance that is bringing down the quality of the rest of your site. It can also provide insights to help improve your content marketing strategy in the long run. This guide will show you how to audit a site, with an emphasis on content pruning, using different tools based on the level of access to the site’s analytics.

What is a Content Audit?

A content audit is an important strategy in content marketing that involves looking at every single piece of content on your website. There different outcomes and goals during a content audit ranging from pruning low-performing content to discovering specific topics that resonate best with your audience.

Why Should You Do a Content Audit?

Content audits are beneficial for content marketers because they can lead to improved rankings, conversion rates, and more. The primary objective of any content audit though, is to improve on SEO and content marketing efforts. Auditing your website’s existing content will open your eyes to a multiplicity of opportunities to increase the overall site quality.

An example of a recent content audit we ran involved pruning over 3,000 pages, which resulted in organic traffic improvements. Outdated and low performing pages were removed from the site toward the end of April 2018.

The number of pages pruned was equal to roughly 15% of the entire site at the current time. Organic traffic after the project went up about 50% and has sustained itself since then, helping provide evidence of the benefits of a proper content audit and content pruning.

Which Types of Sites Require Audits?

The only type of site that may not need a current content audit is a freshly published one. Since the website is fresh, so is the content and the strategy. Most other sites could benefit from at least a small audit.

The size and age of your website will dictate the depth of the content audit needed. An audit of a large site (like news/publication sites) will take more time and budget, while a small site with only 50 posts can do this with relative ease. It also helps to be mindful of previous strategies and tactics that may no longer align with the current state of the business.

To quickly determine if there will be a good amount of content to be pruned from your site, do a simple Google search. Type into the search bar your domain with the blog directory to see the total number of blog pages indexed by Google within the search results.

Content Audit Steps

Depending on what kind of analytics access and tools you have, there are different methods you can use to conduct a content audit. We’ll cover three audit methods in this tutorial. The first uses Screaming Frog, the second uses Google Analytics and the third uses neither.

Each method mentioned will use the SEO tool, Ahrefs. It is possible to use alternatives to Ahrefs, Screaming Frog, and Google Analytics for your content audit, but these are the most common and powerful tools available. We have no affiliation with any tool mentioned.

Gathering and Setting Up Your Data

No Screaming Frog Method (GA access, Ahrefs)

Tools Needed: Google Analytics, Ahrefs, Spreadsheet software
The first data collection method is for users with access to Google Analytics and Ahrefs.

  1. Open Google Analytics for the domain you plan to audit.
  2. Go to “BEHAVIOR” > “Site Content” > “All Pages.”
  3. Set the time frame to 12 months if the business is highly seasonal to get a full scope of traffic. This could be reduced to as short as three months if there is not much seasonality for the content.
  4. Set a filter to include only the specified content section you would like to audit, for example “/blog/.”
  5. Change the number of rows shown on the bottom to show all of the selected pages, because Google Analytics exports the data as it is currently displayed.
  6. Select “Export” and choose your desired file of choice.
  7. Open up the exported file in your spreadsheet software (i.e. Microsoft Excel, Google Sheets, etc.)

  1. Delete the extra information in the top six rows and the “Day Index” rows on the bottom.
  2. Add a column next to the “Page” column, and use the concatenate function to link the domain to the page’s path. The formula should look like this: “=CONCATENATE(“https://www.website.com”,A2)”
  3. Input the formula on one cell and then drag the fill handle down or double click it to copy the formulas for the rest of the dataset.

  1. Log into Ahrefs, and navigate to “More” > “Batch analysis.”
  2. Paste your list of URLS into Ahref’s batch analysis tool, 200 at a time. Hit “Start Analysis” to run the URLs through the tool.
  3. Export the list and add the data to the spreadsheet you imported the Google Analytics data into, making sure that the data lines up to correctly to each URL.

No Google Analytics Access Method (Ahrefs)

Tools Needed: Ahrefs, Spreadsheet software
Sometimes having access to a website’s Google Analytics isn’t an option. The following method will show you how to use an SEO tool like Ahrefs to pull data for a content audit. Keep in mind that tools like Ahrefs generate their data points and numbers based off of search traffic estimations, so traffic numbers will not be exact. That is why it is best to try to gain analytics access.

  1. Login to Ahrefs and input the domain. Make sure to include the directory of the specified content you want to audit, if applicable.
  2. Select “Top pages.”
  3. Sort by the lowest traffic, then export all the rows.

  1. Download the exported file from the top right file folder in Ahrefs.
  2. Open up your spreadsheet software (i.e. Microsoft Excel or Google Sheets) and import the data.
  3. Go back into Ahrefs and into “More” > “Batch analysis.”
  4. Take the URLs on your spreadsheet and run them through the tool, 200 at a time.
  5. Export the data from the batch analysis and paste it accordingly into your original spreadsheet.

Note: Using this method could provide inaccurate data for newer industries since keywords and jargon can be new to Ahrefs or any other SEO tool.

Screaming Frog Method (GA access, GSC access, Ahrefs)

Tools Needed: Screaming Frog, Google Analytics, Google Search Console, Ahrefs, Spreadsheet software
Having access to each of these tools will allow you to look at different layers of data from your site. With this information, you will be able to make better decisions in regard to your content. One thing to note is that the API data may not be 100% accurate, so always compare your data between platforms.

  1. Open Screaming Frog and start by clicking on the API tab found on the top right or in the “Configuration” menu, under “API Access.”
  2. Connect Google Analytics to the Gmail account that has access to the domain’s data.
  3. Select what segment you want to look at, either all users, organic traffic, or a custom segment you have created.
  4. Set the date range for 12 months and select the metrics to pull from Google Analytics. Some metrics to consider are unique pageviews, sessions, time on page, landing page views, and goals.
  5. Select the next API, Google Search Console. Connect the API by logging into the Gmail account with access to the domain’s data.
  6. Specify the dates for Google Search Console (The API has access to 16 months of data).
  7. Select the final API, Ahrefs, and plug in the API access token for your account.
  8. Set your metrics. A few to include are backlinks, referring domains, and social shares.
  9. Setup date extraction for your site. Go into the menu and select “Configuration” > “Custom” > “Extraction.” This is where you are going to create a custom extraction to pull dates. Label the extractor as “Date,” select the appropriate scraping method, and input your syntax.

There are a number of different ways to pull dates from the site, which we have listed in the next section.

  1. Configure the spider. Go into “Configuration” > “Spider.” Afterwards, uncheck images, CSS, javascript, and SWF, then hit “OK.”
  2. Enter your domain and hit “Start.”

Note: If you plan on auditing a specific section of your site, like the blog, go into “Configuration” > “Include,” then enter the directory you plan to focus on (i.e. website.com/blog/.*)

  1. Wait. The time it will take to crawl through all the pages will depend on the size of your site or audit.
  2. Pull out the data from the crawl, go to the “Internal” tab, filter by “HTML,” and select “Export.”
  3. Import the data into your spreadsheet software of choice and double check to see if the analytics data matches the data from Google Analytics online interface.
  4. Paste each URL from the spreadsheet you created, up to 200 at a time inside Ahref’s batch analysis tool for the keywords metric. “More” > “Batch analysis.”
  5. Export that list and paste it into your current set of data, keeping only the “Keywords” column (Make sure that the URLs and data line up accordingly).

Custom Extraction for Dates

Finding dates for your content audit is important. By extracting dates from the pages, you can avoid cutting any new content from the site.

Date is on the blog post

If the date is on the blog post, use the “CSSPath” or “XPath” extractor in Screaming Frog, under “Configuration” > “Custom” > “Extraction.” Right click on the date within the blog post with Google Chrome and select “Inspect”. Then right click on the HTML line of the date and select either “Copy” > “Copy selector” or “Copy” > “Copy XPath.” Paste this syntax into the custom extraction section of Screaming Frog.

Date is not on the page

If there is no date on the page, take a look at the source code. Sometimes the date can be found with certain attributes, such as “datetime=.” To find out if any of the pages have a datetime attribute or anything else, click on a blog post and select “View Page Source.” Next, use the “Find” option (command/control + f) to search for “datetime.”

If you spot the “datetime=” attribute in the page source, go into Screaming Frog, “Configuration” > “Custom” > “Extraction,” and use the “Regex” method and input the following syntax:

datetime=”……….

Export the date info from the “Extraction” filter inside of the “Custom” tab, and put that data into your spreadsheet. Clean up the data in your spreadsheet by creating a new column using the formula, “=RIGHT(A2,10),” where A2 represents the cell with your date output and the number 10 represents the numbers/hyphens of the date.

Neither method above works

If neither of the above methods work for the domain you plan to audit, then you will have to go through the blog manually. Sort the posts by new on the site and note those URLs for your audit. Alternatively, if you have a content inventory, calendar, or something similar, you can reference that as well.

Filtering and Labeling Your Data

After you’ve gathered your data from either Google Analytics, Ahrefs or Screaming Frog, you’ll need to organize it. Start by creating two columns next to each URL to label actions for each page and to list notes.

Pageviews

Filter your data based on the most valuable metric to you, such as traffic. This could mean only focusing on pages under 50 page views (or any other number that makes sense for your site). This is a good base to start the filtering process, before looking through other sets of data.

Status Codes

Make sure that all the URLs you are looking through have a status code of 200 (OK), if crawled with Screaming Frog.

Fresh Content

Exclude any newer pages that may need some more time to gain momentum.

Social Shares

Consider keeping URLs with a large number of social shares, exclude these from your list as you spot them.

Goal Completions

Take a look at the number of goals completed by each page. Omit any pages from your list that are converting.

Impressions

If you see a high number of impressions from GSC, maybe the metadata (title tag and meta description) and content can be improved. Label these as “improve,” if necessary.

Number of Keywords

For URLs with a large number of keywords from the batch analysis, label as “improve.”

Backlinks and Referring Domains

If there are any backlinks or referring domains, make sure to redirect those to a similar page. To find a similar page on Google, enter into the search bar “site:website.com topic.” Label the action as “redirect,” and note the target URL under the notes column on your sheet.

Thin Content

If you notice that there are thin content pages that could be merged together to create a bigger resource, label those as “consolidate.” Note the pages that should be consolidated on the sheet. Other pages that could be consolidated are ones with overlapping topics/information or duplicate content.

Anything else, label as “remove.”

Content Audit Actions

Once all your data is filtered and labeled, the next step is to take action based on the labels set for each.

Improve

There may be a chance that some pages could be improved, rather than wiping them from your site. Use your judgement to decide what aspects of the page can be improved. If the page looks outdated, improve the page by refreshing the content to represent current data and information. If you spot any thin pages or notice that something could use more explanation, expand the content.

Redirect

Setup 301 redirects for these URLs to pages that are relevant and topically similar.

Consolidate

Merge the content of the pages that you find overlapping or have thin content. Optimize the content to focus on the overarching topic of the pages that you plan to combine.

Remove

These are your pages that are dead weight compared to everything else on the site. Delete these pages by serving a 410 header. Also, make sure to unlink any internal links to them and remove the deleted URLs from the sitemap. To find internal links to each URL, you can use Screaming Frog.

Wrapping Things Up

When going through your list of content, go through each page diligently. If you see any pages that you believe to be important and provide adequate information but are in the “remove” column, then maybe you need to surface the page higher up in the site structure or add a noindex tag. Be creative and logical when going through your audit to think of solutions to improve your overall content strategy.

Conducting a content audit sounds like a daunting task. But by following the steps outlined above, you can get one step closer to having a higher quality and trustworthy website.

Related Posts

Comments

  • In-depth article on content audit, Jon!

  • Thanks for this, Jon.

    If there is no “date” in post or source code, may be able to pull a non-exhaustive and imperfect list off G index with site:domain.com inurl:blog with Tools > Anytime > Custom range, e.g. https://www.google.com/search?q=site:siegemedia.com+inurl:seo&num=20&newwindow=1&tbs=cdr:1,cd_min:6/1/2018,cd_max:9/1/2018,sbd:1

    untested though, i guess its possible that just pulls available date info and not first indexed data

    also there’s a 1 in 3 chance of it being a wordpress site right? so you can usually check the archives, e.g. https://domain.com/2018/08/ and use scraper chrome extension to pull those urls, since a lot are paginated and thats annoying the work around there could be something like screamingfrog, configuration > include > /2018/08 as a catchall, idk

    • Hey @jim_thornton:disqus, thanks for the input! I took a look at the custom range search filter you mentioned and it looks like that picks up on the published date from the source code, instead of the modified date. The archive method you shared is interesting as well, will put some thought into that one.

      Since each site is built differently, the method of looking through the source code may not be true for everybody. But I will update the post to include what to look out for and creative ways to snag those modified dates.

  • Per

    Thanks for great article!

    How would you go about refreshing all sub pages of a 10 page site? Thinking about refreshing everything but the main page which is update to date already (and ranking well). 4 out of 10 articles I could probably remove.

    Maybe this would be seen as a too big move by Google.

  • Great explained @jedelacruz:disqus.

    Basically, I have been auditing my content based on Time on Site, Bounce Rate, Views, Shares, Links and Comments.

    Those are my parameters to audit content.

    One of the thing, I have also noticed one of the content ranked #1 position so I created same types of content but noticed that not works.

    Google has so many ranking signals which one applies I/you/we don’t know.