How to Easily Extract Data from Websites into Google Sheets with IMPORTHTML

Video google sheet import html

Imagine stumbling upon an interesting table on a website and wanting to extract that tabular data into your spreadsheet for analysis. You could try manually copying and pasting, but that’s not very efficient. Fortunately, Google Sheets has a handy function called IMPORTHTML that allows you to do this easily. It will import the table and update your data at regular intervals to keep it up to date.

How does the IMPORTHTML function work in Google Sheets?

The IMPORTHTML function in Google Sheets searches for an HTML table or a specific list and extracts the data from it. You can use it to extract text from a table or a list. An HTML table is defined by the <table> tag, while a list is defined by the <ul> tags (for unordered lists) and <ol> tags (for ordered lists).

How to use the IMPORTHTML formula in Google Sheets

Before using the IMPORTHTML formula, let’s understand its syntax.

=IMPORTHTML(URL, query_type, index)

  • URL: The URL of the page, including the protocol (http:// or https://). Make sure to enclose the URL in double quotes.
  • query_type: Use “table” if you want to import a table, otherwise use “list” if you want to import a list.
  • index: The index of the table or list on the web page. It starts at 1. A table with index = 1 means it’s the first table, index = 2 means it’s the second table, and so on.

Importing data from a website into Google Sheets with IMPORTHTML

How to get the indices of tables/lists to extract data from a website into Google Sheets using the IMPORTHTML function

A page may contain one or more tables and/or lists. If you’re unsure how to find the indices of tables on an HTML page, follow the steps below:

Step 1

Open the “Developer Console” in your browser. For most Windows browsers, you can open the console by pressing F12. If you’re using a Mac, use Cmd+Opt+J for Chrome and Cmd+Opt+C for Safari. Note that for Safari, you’ll first need to enable the “Develop” menu.

Google Chrome Developer console

The precise appearance may vary depending on the version of Google Chrome you’re using. It may change from time to time but should be similar.

Step 2

Copy and paste the following code into the console to get the indices of all the tables:

var index = 1;
[].forEach.call(document.getElementsByTagName("table"), function(elements) {
    console.log("Index: " + index++, elements);
});

If you’re instead looking for all the indices of lists, you need to get all the elements with the <ul> or <ol> tags. The following code can be helpful:

var index = 1;
[].forEach.call(document.querySelectorAll("ul,ol"), function(elements) {
    console.log("Index: " + index++, elements);
});

Step 3

Press “Enter”. You’ll see numbers representing the indices displayed in the results. Hover your cursor over the result elements until the table/list you want to display is highlighted.

As you can see in the screenshot above, the highlighted table has an index of 6.

How to import an HTML table into Google Sheets

Let’s see how we can import an HTML table. We’ll extract the latest currency exchange rates from the Yahoo! Finance website into Google Sheets. Since the page contains only one table, we’ll use 1 as the index value.

Create a new blank spreadsheet in Google Sheets and give it a name, for example, “Currencies.” Then, copy and paste the following formula into cell “A1.”

=IMPORTHTML("https://finance.yahoo.com/currencies","table",1)

Then, press “Enter” and wait for the entire table to be populated in the spreadsheet.

In the image above, we can see that the IMPORTHTML function successfully extracted the latest exchange rate data into Google Sheets.

You may be interested in tracking exchange rate data. In that case, you might want to check out our tutorial on how to create a currency exchange rate tracker in Google Sheets without any programming.

How to import a list into Google Sheets

You can import a list using the same method. The only difference is to replace the word “table” with “list” in the parameter. The following steps explain how to extract data from a list containing programming languages starting with the letter “C.”

Create a new blank spreadsheet in Google Sheets and give it a name. Then, copy and paste the following formula into cell “C1”:

=IMPORTHTML("https://en.wikipedia.org/wiki/List_of_programming_languages","list",7)

Press “Enter” and wait for the data to be extracted, as shown in the screenshot below:

Importing a list into Google Sheets using the IMPORTHTML formula

Other options for extracting data in Google Sheets

If you’re looking for an alternative method to retrieve data from a structure other than HTML tables and lists, here are a few Google Sheets functions you might want to try:

Function Description
IMPORTXML Import data from XML files and web pages
IMPORTRANGE Import data from another spreadsheet
IMPORTJSON Import JSON data from a URL
IMPORTDATA Import external data as plain text
IMPORTFEED Import the content of an RSS or Atom news feed

If you want to import data from other sources and applications, or even be able to load data via APIs without any programming, you can try Coupler.io.

Coupler.io is a data integration solution to automate data exports from accounting applications, CRMs, databases, and many other applications and sources including Microsoft Excel, Clockify, Shopify, Airtable, and more.

In addition to that, Coupler.io provides JSON integration to extract data via APIs into Google Sheets without any programming at all! Try Coupler.io with a free 14-day trial.

How to reference a cell in IMPORTHTML in Google Sheets

You might want to put the URL and other parameters in cells and then reference them when using the IMPORTHTML formula. In this case, you can more easily modify the parameters by changing the values in the cells.

Here’s an example:

All the parameters for the URL, query, and index are placed in cells B1, B2, and B3. So, you can easily write the IMPORTHTML formula as follows:

=IMPORTHTML(B1,B2,B3)

Now, let’s add the above formula to cell A3:

Adding the IMPORTHTML formula to a referenced cell

If you want to retrieve the latest historical data for the EUR/USD currency pair from this page: https://finance.yahoo.com/quote/EURUSD%3DX/history?p=EURUSD%3DX, you can place the EURUSD string in a cell – for example, B1. In this case, if you want to extract other currency data, you just need to change the value in B1. Here’s an example of how to reference cell B1 in the Google Sheets IMPORTHTML formula:

=IMPORTHTML("https://finance.yahoo.com/quote/" & B1 & "%3DX/history?p=" & B1 & "%3DX", "table", 1)

Now, let’s add the above formula to cell A3:

Referencing a cell in the IMPORTHTML formula

If you want to extract historical data for AUD/USD, change the value in B1 to AUDUSD, and your data will update automatically.

Tip: You can avoid typing B1 multiple times by using the SUBSTITUTE function. Here’s what the updated formula looks like:

=IMPORTHTML(SUBSTITUTE("https://finance.yahoo.com/quote/{{CURRENCY}}%3DX/history?p={{CURRENCY}}%3DX", "{{CURRENCY}}", B1), "table", 1)

How to use IMPORTHTML to import a portion of data from a table in Google Sheets

You may want to extract only a few columns or filter only the rows that meet specific criteria. You can accomplish these operations by using the QUERY function in combination with IMPORTHTML.

IMPORTHTML: Importing specific columns

Let’s assume you have a spreadsheet with an IMPORTHTML function that extracts the latest EUR/USD exchange rate data from a website into Google Sheets.

Now, you only want to retrieve the “Date” and “Close” columns, which correspond to the 1st and 5th columns. To do this, you can combine your existing formula with the QUERY function. Here’s an example:

=QUERY(IMPORTHTML("https://finance.yahoo.com/quote/EURUSD%3DX/history?p=EURUSD%3DX", "table", 1), "SELECT Col1, Col5")

By setting “SELECT Col1, Col5” in the QUERY function, you’ll get the following result:

Retrieving only Date and Close columns

IMPORTHTML: Importing specific rows

You can also retrieve specific rows. For example, here’s how to add a filter to our previous formula to retrieve only data with “Close” values greater than 1.2250:

=QUERY(IMPORTHTML("https://finance.yahoo.com/quote/EURUSD%3DX/history?p=EURUSD%3DX", "table", 1), "SELECT Col1, Col5 WHERE Col5 > 1.2250")

Now, let’s add another filter to retrieve only the top 3 rates. Here’s the formula:

=QUERY(IMPORTHTML("https://finance.yahoo.com/quote/EURUSD%3DX/history?p=EURUSD%3DX", "table", 1), "SELECT Col1, Col5 WHERE Col5 > 1.2250 ORDER BY Col5 DESC LIMIT 3")

Retrieving the top 3 rates

How to set a custom interval to automatically refresh IMPORTHTML in Google Sheets

By default, the refresh interval for IMPORTHTML in Google Sheets is one hour. However, you can speed up the refresh interval if desired. Since the formula recalculates when its arguments change, you can use it to force the refresh interval. The idea is to concatenate the original URL with a query string that changes periodically based on the time we set, for example, every 5 minutes. Here are the steps:

First, add a query string to the original URL

Assume that we have the following values in B1-B5. The IMPORTHTML formula is set in B5. Notice that a query string “?refresh=” & B4 is added to the original URL.

Adding a query string to the URL

We’re not done yet. Let’s move on to the next step.

Next, use a script and a trigger to automate the refresh

We’ll refresh the value of B4 every 5 minutes using a script and a trigger. As a result, the Google Sheets IMPORTHTML formula will also refresh at the same interval. Follow these instructions:

Step 1:
Go to the Script Editor (either Tools > Script Editor or Extensions > App Script).

Step 2:
Copy and paste the following code into the “Code.gs” file. Then, save your changes by clicking the disk icon in the toolbar.

function myFunction() {
  var sheet = SpreadsheetApp.getActiveSheet();
  var cell = sheet.getRange("B4");
  var refresh = parseInt(cell.getValue().toString());
  var increment = refresh + 1;
  cell.setValue(increment);
}

Step 3:
Open the “Triggers” menu on the left, then click the “Add Trigger” button.

Step 4:
Set a trigger for myFunction to run every 5 minutes. You can optionally set the failure notification settings to “Notify me immediately” to receive immediate notification in case of errors.

Step 5:
Click the “Save” button. If prompted to allow the script to access your data, grant permission.

Step 6:
Run your script for the first time.

Now, you’ll see the data in your sheet refresh every 5 minutes. Even when your Google Sheets spreadsheet is closed, it will continue to refresh.

How many IMPORTHTML formulas can Google Sheets handle?

You can use IMPORTHTML in a Google Sheets spreadsheet as many times as you want. Previously, the limit was 50 for external data per Google Sheets spreadsheet, but Google removed this limitation in 2015. As Google Sheets is web-based, you may experience a decrease in speed if you have a large number of IMPORTHTML formulas in your spreadsheet, especially if your internet connection is slow.

How to extract non-public data from a website into Google Sheets using the IMPORTHTML function

You may want to extract data from a non-public URL of a website into Google Sheets. Unfortunately, you can’t do this using the IMPORTHTML function. As shown in the screenshot below, which shows what happens when you try to extract your LinkedIn contacts list.

#N/A Error - Could not fetch URL

The formula only works if the page is public and doesn’t require a login to access the data. You’ll get an #N/A Could not fetch url error message for non-public URLs.

What to do if the IMPORTHTML formula suddenly stops working in your Google Sheets spreadsheet

If your formula suddenly stops working, we recommend checking the following:

  • Check if there has been a change in the URL. Although rare, it’s possible that the page you’re extracting from has been moved to another URL.
  • Check if there has been a change in the protocol. For example, the site you’re extracting from now uses https instead of http, but the automatic redirection to https hasn’t been set up by the site owner.
  • Check if there has been a change in the index. The table or list with index = 9 could now have index = 8.

If you still can’t extract the desired data, it’s possible that the website owner now blocks robots/crawlers from reading the content of their website. Check the website’s robots.txt file by navigating to /robots.txt.

Error loading IMPORTHTML data in Google Sheets

This data loading error is one of the most common issues with IMPORTHTML. You’re highly likely to encounter it when trying to retrieve data from websites that use heavy scripts. These scripts take a long time to execute and can therefore pose security risks. This means that you can’t parse pages with JavaScript using IMPORTHTML in Google Sheets.

In this case, you can try finding another source with the necessary data or opt for another data import option, such as via the API using Coupler.io’s JSON importer. Good luck!

Home

Related posts