Liberating HTML Data Tables

Section author: Tony Hirst (psychemedia on Twitter)

It’s not uncommon to see small data sets published on the web using an HTML table element. If you have a quick click around Wikipedia, you’re likely to find a wide variety of examples. Some sites will use Javascript libraries to enhance the presentation or usability of a table, for example, by making columns sortable; but most of the time, we are faced with a flat HTML table, and the data locked in it.

In this section, we look at some quick tricks for liberating data from HTML tables on public webpages and turning them into something more useful.

Screenscraping HTML Tables Using Google Spreadsheets

The Google spreadsheet formula:


will scrape a table from an HTML web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N’th table in the page (counting starts at 1) as the target table for data scraping.

So for example, have a look at the following Wikipedia page – List of largest United Kingdom settlements by population (found using a search on Wikipedia for UK city population):

Grab the URL, fire up a new Google spreadsheet, and start to enter the formula =importHTML into one of the cells:

Autocompletion works a treat, so finish off the expression and add in the URL and table number:


The table numbers are not always obvious – start with 1 and increment the table number until you get the correct one.

As if by magic, a data table appears in the spreadsheet, pulled in directly from the Wikipedia page:

If the data in the HTML table is updated, the data in the spreadsheet will also be updated when you refresh or call the spreadsheet page.

Any questions? Got stuck? Ask School of Data!

Last updated on Sep 02, 2013.

  • Great recipe. Only one remark. For some reason (change in GoogleSpreadsheets interface?), the working formula is now =importHTML(“”,”table”,2). As the table is still goes second at the page, I suspect it’s because Google decided to start counting at 1. Or maybe I’m wrong. Any ideas?

    • Michael Bauer

      Good catch. I do think it’s due to some changes on the wiki site though (the TOC is now a table and the first table)

  • NRP

    thanks for sharing, liked it..!!! 😀

  • David Meyers

    Is it possible to do this if you don’t know the URL? For example, if I want to enter baseball player names into the cells in column A, and run a script to put their batting average in column B. Since I don’t know beforehand which players will be in the list, I can’t preset the URL for that player’s page of stats. Any suggestions?

    • Michael Bauer


      If you know the name-url combinations that would be possible. The URL doesn’t have to be hard-coded it can also be in a cell that is referenced in the formula.

  • Britney Muller

    Amazing! Can you do this through multiple pages?

    Thanks so much!

    • Britney Muller

      To clarify: Have it auto go through multiple pages?

    • SchoolofData

      Hello Britney, not it cannot, you can only get one table per formula.

      If you want to scrape several pages, you should take a look at or Webscraper is a Google Chrome extension and is a dedicated program.

      Cédric Lombion
      School of Data Community Manager

      • Britney Muller

        Awesome, Thanks so much Cédric!

  • Amrapalli

    Thank you so much, this is simply awesome. Work got reduced a lot.

Theme by Anders Norén