How to Use Google Sheets for Web Scraping and Campaign Creation

Video google sheet get data from website

We’ve all faced a situation where we needed to extract data from a website at some point. When working on a new account or campaign, you may not have the necessary data or information for ad creation, for example. In an ideal world, we would have all the information we need in an easy-to-import format like a CSV file, an Excel spreadsheet, or a Google Sheet. Unfortunately, that’s not always the case.

However, those who don’t have web scraping tools or coding knowledge to use something like Python to help them in this task may have had to resort to manually copying and pasting hundreds or even thousands of entries. Not only does this take time, but the chances of making one or two errors when manually copying and pasting data are quite high. Plus, it would then take even more time to verify the document and ensure it’s error-free.

Fortunately, there is a better solution. And I’ll show you how we did it.

Introducing IMPORTXML

Let me introduce you to IMPORTXML, a function in Google Sheets. According to Google’s help page, IMPORTXML “imports data from various structured types, including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.” Essentially, IMPORTXML is a function that allows you to extract structured data from web pages without the need for coding knowledge.

For example, it is quick and easy to extract data such as page titles, descriptions, or links, as well as more complex information.

How can IMPORTXML help extract elements from a web page?

The IMPORTXML function itself is quite simple and only requires two values:

  • The URL of the page from which we want to extract or scrape information.
  • And the XPath of the element in which the data is contained.

XPath stands for XML Path Language and can be used to navigate through elements and attributes of an XML document.

For example, to extract the page title from https://en.wikipedia.org/wiki/Google_Sheets, we would use:

=IMPORTXML("https://en.wikipedia.org/wiki/Google_Sheets", "//title")

This will return the value: Google Sheets — Wikipedia.

Or, if we’re looking for the page description, we can try this:

=IMPORTXML("https://www.searchenginejournal.com/","//meta[@name='description']/@content")

Here is a concise list of some common and useful XPath queries:

  • Page title: //title
  • Page meta description: //meta[@name='description']/@content
  • Page H1: //h1
  • Page links: //@href

Seeing IMPORTXML in Action

Since we discovered IMPORTXML in Google Sheets, it has truly become one of our secret weapons to automate many of our daily tasks, from campaign and ad creation to content research, and much more.

Furthermore, when the function is combined with other formulas and add-ons, it can be used for more advanced tasks that would otherwise require sophisticated solutions and development, like Python-based tools.

But in this example, we’ll be looking at IMPORTXML in its most basic form: scraping data from a web page. Let’s take a closer look.

Suppose we’ve been tasked with creating a campaign for “Search Engine Journal”. They’d like us to advertise the 30 latest articles published under the PPC section of their website. A fairly simple task, you might say. Unfortunately, the editors are unable to provide us with the data and have kindly asked us to browse the website to find the necessary information for campaign creation.

As mentioned at the beginning of our article, one way to do this would be to open two browser windows, one with the website and the other with Google Sheets or Excel. Then, we would start copying and pasting information, article by article and link by link.

But by using IMPORTXML in Google Sheets, we can achieve the same result with little to no risk of error, in a fraction of the time. Here’s how to do it.

Step 1: Start with a new Google Sheets document

First, open a new blank Google Sheets document.

Start with a Blank Google Sheets Document

Step 2: Add the content you need to scrape

Add the URL of the page (or pages) from which we want to scrape the information. In our case, we’ll start with https://www.searchenginejournal.com/category/pay-per-click/

Step 3: Find the XPath

Find the XPath of the element from which we want to import the content into our spreadsheet. In our example, let’s start with the titles of the 30 latest articles. Open Chrome and, once you hover over the title of any of the articles, right-click and select “Inspect”. This will open the Chrome Developer Tools window. Make sure the article title is still selected and highlighted, then right-click again and choose “Copy” > “Copy XPath”.

Step 4: Extract the data into Google Sheets

Back in your Google Sheets document, enter the IMPORTXML function as follows:

=IMPORTXML(B1, "//*[starts-with(@id, 'title')]")

Here are a few things to note:

First, in our formula, we replaced the URL of the page with a reference to the cell where the URL is stored (B1).

Second, when copying the XPath from Chrome, it will always be enclosed in double quotes. However, to avoid breaking the formula, the double quote character needs to be changed to a single quote character.

Note that in this case, since the page ID changes for each article (title_1, title_2, etc.), we need to slightly modify the query and use “starts-with” to capture all elements on the page with an ID that contains ‘title’.

Here’s what it looks like in the Google Sheets document:

Google Sheets screenshot

And in an instant, here’s what the results look like once the query has loaded the data into the spreadsheet:

Google Sheets results screenshot

As you can see, the list returns all the articles featured on the page we just scraped (including my previous article on automation and using ad customizers to improve Google Ads campaign performance). You can apply the same method to extract any other pieces of information you need for setting up your ad campaign.

Now let’s add the destination page URLs, featured snippets for each article, and author names to our Sheets document. For the destination page URLs, we need to adjust the query to specify that we’re looking for the HREF element attached to the article title. Therefore, our query will look like this:

=IMPORTXML(B1, "//*[starts-with(@id, 'title')]/@href")

And voila! Immediately, we have the destination page URLs.

You can do the same for featured snippets and author names.

Conclusion

And there you have it – a fully automated, error-free method for extracting data from (potentially) any web page, be it content and product descriptions, or e-commerce data such as product prices or shipping fees.

In an era where information and data can be the necessary advantage to achieve above-average results, the ability to scrape web pages and structured content easily and quickly can be invaluable. Plus, as we’ve seen above, IMPORTXML can help reduce execution times and the risk of errors.

Furthermore, this function is not just a great tool that can be exclusively used for PPC tasks, but it can also be very useful in many other projects that require web scraping, including SEO and content tasks.

Learn more tips and tricks about web scraping and campaign creation on Crawlan.com.

Related posts