Auditing your website’s canonical link elements
The ability to cite canonical URLs can be a great asset for an SEO. Unfortunately it can also have negative impacts if used inappropriately. Between site migrations, the creation of new pages, switching to https, misuse or bugs in an SaaS, domain name switches and just the everyday grind of making changes that are due yesterday, it’s not that uncommon for an SEO or webmaster to use the canonical link element in a counterproductive fashion. Since it’s not visible and its effects are behind the scenes, incorrectly used canonicals can go unnoticed for years. Fortunately, finding bad canonicals can be rather simple.
A1 Website Analyzer
A1 Website Analyzer is a great way to crawl your site. There is a 30-day free trial so you can get all of the functionality for 30 days and decide if you want to buy a license.
- Open A1 Website Analyzer
- Enter your website’s homepage and click Start Scan and let the crawl run
- If you only want to export URLs indicating other URLs as canonical, Go into the “Analyze website” tab, click the Data icon on the left and select “Show only URLs with “canonical” to other….” (if you want to include URLs that point to themselves as canonical, then skip this step altogether).
- Click the “File” option on the upper left when the crawl is finished and choose “Export selected data as file…”
- Save the export
If you opted to only see URLs pointing to other URLs as canonical, then your export is finished. You will have a list of URLs that point to different URLs in their canonical element. If you opted to see all URLs….
The export will return a lot of columns and this is one of my favorite attributes of this tool (the sheer amount of data you can play with). But for the purpose of this tutorial I’ve hidden most of the columns in the example below to better focus on the data relevant to this post.
- URL Flags
This is the crawled URL that the data in the remainder of the columns in the raw pertain to.
This will display pertinent header information, including whether if the URL in the “Path” column contains a canonical element.
If the URL in the “Path” column redirects, the URL it redirects to will display here. A1 Website Analyzer classifies the canonical link element as a redirect. So if the URL is pointing to a different URL as its canonical, that URL will be displayed here.
We only want the URLs that have canonicals so we need to clear out any rows that don’t have a canonical link element in them. To do this:
- Highlight the the “URL Flags” column
- Click “Sort & Filter” (in the “Home” Tab) and Choose “Filter”
- Click the down arrow that now appears in the URL Flags header,
- Select Text Filters
- Choose “Contains”
- Type “canonical” and hit enter
This will filter for all of the URLs that contain a canonical link element.
If you’re simply trying to find URLs that have canonical links that don’t match the host URLs, you are done. If the URL points to itself i this element, then the “Redirects.Path” field will be empty.
Screaming Frog SEO Spider
Screaming Frog SEO Spider is another great way to crawl your site. The free version will allow you to crawl up to 500 pages, so for websites of this size or smaller, the free version will suffice. SSFSS doesn’t make it obvious that a URL points to a different URL as canonical, so this will require a little more excel work, but it’s manageable. The basic steps are as follows:
- Open SFSP
- Enter your website’s homepage and click Start
- Click the Directives tab when the crawl is finished
- Choose either “canonical” (all you need for this exercise) or “all” (if you want to do more with the export).
Once you’ve gotten your export, open it in Excel and, for the purposes of this tutorial, leave only the Address and Canonical Link Element 1 columns and clear out any rows that don’t have a canonical link element in them. To do this last part:
- Click the down arrow that now appears in the Canonical Link Element 1 header,
- Select Text Filters
- Choose Does Not Contain
- Type “http” and hit enter
*This will filter out all of the URLs where no canonical was found.
Now add a 3rd column header (in my example, it named it “Canonical Check”).
In the next row below the header enter the below formula:
This is telling Excel to display the text “Same” if the values in A2 and B2 (in this case, the URL and its respective canonical) are identical or to display “Different” if they are in fact different URLs. To replicate this for the rest of the URLs just click into C2, and double click (or click and drag) the small square on the lower right. This will enter the formula for the other URLs.
This will make it easy to see which, if any canonical URLs are different from the URLs where they reside (this can be good or bad depending on the reason for having the canonical link element there in the first place). You can further isolate what you’re looking for by highlighting the entire C column, clicking the Sort & Filter option on the upper right, and choosing either Filter A to Z (if you want the “Different” results showing first) or Filter Z to A (if you want the “Same” results showing first).
*Alternatively you can choose “Contains,” type “http,” click enter, then delete the resulting rows. This will remove the non-canonical rows altogether. Once this is done click the Filter icon on the Canonical Link Element 1 header and select “Clear Filter from Canonical Link Element 1.” This will altogether remove rows where the URL contain no canonical instead of merely filtering them out.