Skip to content

Instantly share code, notes, and snippets.

@idclark
Created January 5, 2012 01:44
Show Gist options
  • Select an option

  • Save idclark/1563255 to your computer and use it in GitHub Desktop.

Select an option

Save idclark/1563255 to your computer and use it in GitHub Desktop.

Revisions

  1. idclark created this gist Jan 5, 2012.
    21 changes: 21 additions & 0 deletions Marathon_scraper
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,21 @@
    rm(list=ls())

    library(XML)
    library(ggplot2)
    library(reshape)


    page_numbers <- 1:1430

    weburl <- "http://results.public.chicagomarathon.com/2011/index.php?page=1&content=list&lang=EN&num_results=25&pid=list&search_sort_order=ASC&top_results=3&type=list"
    pages <- rep(1,1430)

    tables <-(for i in page_numbers){
    readHTMLTable(weburl)

    }


    n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))

    times <- tables[[which.max(n.rows)]]