We use data from various sources for our data analysis works. In previous posts, we discussed how to import data from one Google Sheets to another, one sheet (tab) to another sheet(tab), and Import CSV files to Google Sheets. In this post, I will show you several methods to import data from website to Google Sheets.
Table of Contents
Why import data from website to Google Sheets
Why should you import data to Google Sheets from websites without copying and pasting them?
The answer depends on the type of data you are going to import. If you are going to use a small piece of data that does not change over time you can use the copy-paste method.
If you are to use large data sets, spread over several web pages, copy-paste method may not be effective. Because you can make mistakes while copying and pasting them manually. And also, time-consuming.
If you use website data that is updated over time, you have to go to the website every time it updates the value.
To overcome these issues, Google Sheets has several methods to pull data from websites.
Import HTML table data to Google Sheets
You can use the Google Sheets functions, IMPORTHTML, and IMPORTXML functions to import HTML table data from web pages.
You may also be interested in, How to pull data from Google Sheets to HTML table.
Import table data from websites to Google Sheets using the IMPORTHTML function
You can use this method to import data, that are publicly available. You cannot use this method to get data from web pages that require you to log in (authentication) to view the data.
The IMPORTHTML function has three parameters as below.
IMPORTHTML(url, query, index) URL - The URL of the web page that contains the HTML table QUERY - This function let you import both table and list data. In this, the query parameter is "table" INDEX - index starts from 1, and it defines which table in the HTML page needs to import. (1 represents the first table on the page, and 2 represents the second table, and so on)
Let’s say you want to import world population data from a Wikipedia page (https://en.wikipedia.org/wiki/World_population) which has several HTML tables on it.
If you want to import data in the fifth table of the page, you can use the IMPORTHTML function as shown below.
=IMPORTHTML("https://en.wikipedia.org/wiki/World_population","table",2)
How to find the index value (table number)
Sometimes it is difficult to find the index number of the table when there are multiple tables on the page. And also, there may be hidden tables in the table which you cannot identify by looking at the web page.
To find the table index you can use the following method.
- On the web page, Right click > click “Inspect“
- Go to the Console tab, paste the following code, and press Enter.
var i = 1; [].forEach.call(document.getElementsByTagName("table"),
function(x) { console.log(i++, x); });
Watch the video below for more details.
Import table data from websites to Google Sheets using the IMPORTXML function
In the previous method, we imported the entire table to Google Sheets from the web page. You may also need to import a part of the table such as a column, row, few columns, etc… In such cases, you can use the IMPORTXML function to import HTML table data to Google Sheets.
The IMPORTXML function also can import data from publicly available web pages only.
To use this method, you need to have a basic understanding of the HTML table structure. For more information on HTML tables, see https://www.w3schools.com/html/html_tables.asp.
The IMPORTXML function has two parameters.
IMPORTXML(url, xpath_query) URL - The URL of the web page that contains the HTML table XPATH_QUERY - The XPath query to run on the structured data
In the above function, the URL is the same as in the previous section.
The XPATH_QUERY is a little difficult to understand for non-programmers. To learn more about XPATH_QUERY, see https://www.w3schools.com/xml/xpath_intro.asp.
How to find XPATH for Google Sheets IMPORTXML function
When working with the IMPORTXML function, you need to find the XPATH for the HTML element that you need to import data from. To get the XPATH,
- In the browser, right-click on the HTML item you want to find the XPATH.
- Click Inspect. This will open up the developer tools of your browser. You will be landed near the code section relevant to the place you right-clicked.
- The relevant section of the page will highlight when you hover the mouse over the code.
- Right-click on the code section related to your element (i.e. <table…> for table).
- Then, go to Copy > Copy XPath
See the video demonstration below.
Below, I will show you some useful examples that you can use to import HTML table data with the IMPORTXML function. So, you can copy these examples to your Google Sheets and change the parameters as required.
Example 01: Import the entire table using the IMPORTXML function
Similar to the IMPORTHTML function, you can use the IMPORTXML function also to import the complete HTML table.
Let’s import a table from the same web page we discussed above.
You can copy the XPath to the table element as explained above. The XPath to table 4 should look like the one below. For this case, you should copy the XPath to the table element.
//*[@id="mw-content-text"]/div[1]/table[4]
If you use the above XPath directly for the XPATH_QUERY parameter, you will get all the table data in a single cell. To avoid this, you can write the XPATH_QUERY like below.
//*[@id="mw-content-text"]/div[1]/table[4]//tr
The //tr selects all the table rows (tr) that are descendants of the table[4] element. So, your IMPORTXML function should be as below.
=IMPORTXML("https://en.wikipedia.org/wiki/World_population","//*[@id='mw-content-text']/div[1]/table[4]//tr
Example 02: Import only a specified range of rows using the IMPORTXML function
Unlike in the IMPORTHTML function, with the IMPORTXML function, you can specify the range of rows you need to import from the HTML table.
For example, if you want to import all the rows after row number 2, you can modify the above formula like below.
=IMPORTXML("https://en.wikipedia.org/wiki/World_population","//*[@id='mw-content-text']/div[1]/table[4]//tr[position()>2]")
The parameter in the red color, calls predicates. It is used to find a specific node or a node that contains a specific value. Predicates are always embedded in square brackets.
You can use predicates such as, [1], [last()], [last()-1] etc… to import a single row or range of rows.
You can also use these predicates in the following manner.
=IMPORTXML("https://en.wikipedia.org/wiki/World_population","//*[@id='mw-content-text']/div[1]/table[4]//tr[position()>2 and position()<6]")
The predicates used in the above formula is [position()>2 and position()<6]
. This will return row numbers between 2 and 6.
Example 03: Import selected columns of the table using the IMPORTXML function
In the first example, we imported the entire HTML table, including the header row. You can use the Google Sheets QUERY function on this result to select the columns you required.
However, when tables contain merged cells or columns in the table’s header row, you will get several issues. Therefore, you should avoid header rows when setting the parameters for the IMPORTXML function. You can use the method explained in the above second example to avoid header rows.
The following example imports the first and third columns of the table.
=QUERY(IMPORTXML("https://en.wikipedia.org/wiki/World_population","//*[@id='mw-content-text']/div[1]/table[4]//tr[position()>1]"),"select Col2, Col4")
Pay attention to the text highlighted in the above formula. It is similar to Example-2 we discussed above. I have provided it as the data set for the query function. Then, the query is written to select columns 2 and 4.
Watch the following demonstration
Example 04: Import data in a single cell using the IMPORTXML function
You can also import data in a single cell of an HTML data table using this function.
To get the cell data correctly, you need to get the XPath directly to the text inside the cell.
The following formula import data in a single cell of the HTML table.
=IMPORTXML("https://en.wikipedia.org/wiki/World_population","//*[@id='mw-content-text']/div[1]/table[4]/tbody/tr[2]/td[3]/text()")
Import HTML list from websites to Google Sheets
You can import HTML list data to Google Sheets using the same techniques used to import HTML tables.
Import HTML List data using IMPORTHTML function
The IMPORTHTML function support both HTML table and list data. To import the list data, you need to find the index number of the list you need to import. You can follow the same procedure to find the index number as we did for the table above.
The following formula import the first list of the web page.
=IMPORTHTML("https://en.wikipedia.org/wiki/World_population","list",1)
Import HTML List data to Google Sheets using the IMPORTXML function
The IMPORTXML function can also be used to import HTML List data to Google Sheets. Compared to the IMPORTHTML function, you can modify the XPath query to get more refined results. With the IMPORTHTML function, you could only import the whole list.
For this, you need to have a piece of basic knowledge of HTML List elements. You can learn HTML list elements from https://www.w3schools.com/html/html_lists.asp.
The following formula imports a list element from the page.
=IMPORTXML("https://en.wikipedia.org/wiki/World_population","//*[@id='mw-content-text']/div[1]/div[32]/ul[1]//li")
You can use the other query parameters we discussed above when importing list data also.
A few examples are listed below.
The following formula imports the list excluding the first three items of the list.
=IMPORTXML("https://en.wikipedia.org/wiki/World_population","//*[@id='mw-content-text']/div[1]/div[32]/ul[1]//li[position()>3]")
The following formula imports the hyperlinks in the list
=IMPORTXML("https://en.wikipedia.org/wiki/World_population","//*[@id='mw-content-text']/div[1]/div[32]/ul[1]//li//@href")
Watch the following demo video on importing list items with the IMPORTXML function.
Import heading elements from web pages to Google Sheets
You can use the IMPORTXML function to import heading elements in web pages to Google Sheets.
You can get a basic knowledge of HTML Heading elements from https://www.w3schools.com/html/html_headings.asp.
The following formula imports all the h2 headings on the page.
=IMPORTXML("https://en.wikipedia.org/wiki/World_population","//h2")
Import RSS Feed data from websites to Google Sheets
RSS feed is a standardized way of letting you access the latest content of a website (especially blogs and other websites that update regularly). Basically, this is a computer-readable format. You can access the RSS feed of a blog like this which is based on WordPress by adding /feed/ to the end of the blog URL (Ex. https://www.bpwebs.com/feed/).
Feed readers like Feedly let you import RSS feeds of your favorite websites. It displays the latest content of those websites in an organized manner.
You can create your own feed reader in Google Sheets using the IMPORTFEED function to read the latest content of your favorite websites.
The following formula imports all the contents of the RSS feed of this blog
=IMPORTFEED("https://www.bpwebs.com/feed/",,true)
The following formula import only the titles of the latest blog post of this blog.
=IMPORTFEED("https://www.bpwebs.com/feed/","items title",true)
The following formula imports both the title and the URL of the latest posts of this blog.
={IMPORTFEED("https://www.bpwebs.com/feed/","items title",true),IMPORTFEED("https://www.bpwebs.com/feed/","items url",true)}
Watch the following video that demonstrates the above formulas.
Import data from website APIs to Google Sheets
Some websites provide APIs (application programming interfaces) that allow other programs to get some information. With Google Apps Script, you can write programs to get data from such websites to Google Sheets.
For this, you need to have some knowledge of Google Apps Script.
In our blog post, Create A Crypto Portfolio Tracker in Google Sheets we created a Google Apps Script to obtain live crypto prices from CoinMarketCap through their APIs.
Depending on the type of API you may have to customize the Google Apps. However, from the blog post mentioned above, you can get some ideas on importing data to Google Sheets from APIs. You can use this method to import data even from web pages that need authentication to view data.
Wrapping Up
For our data analysis works, we get data from various sources. Websites are one such source from that we get data very often. In this post, we discussed several methods that you can use to import data from websites to Google Sheets. With these methods, you can import data directly to Google Sheets, without copying and pasting from source to Google Sheets. This will reduce the errors and it ensures you always have the latest data.