Skip to content

Instantly share code, notes, and snippets.

@technickle
Created May 4, 2015 18:37
Show Gist options
  • Select an option

  • Save technickle/18d066d7278471cc38ad to your computer and use it in GitHub Desktop.

Select an option

Save technickle/18d066d7278471cc38ad to your computer and use it in GitHub Desktop.

Revisions

  1. technickle created this gist May 4, 2015.
    41 changes: 41 additions & 0 deletions process-local-website-uris
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,41 @@
    # this script parses a CSV download from
    # https://data.ny.gov/Government-Finance/New-York-State-Locality-Hierarchy-with-Websites/55k6-h6qq
    # and parses each row's URI to add path, hostname, and global top-level domain columns
    # the resulting file is written back to the current path

    require 'uri'
    require 'CSV'

    # load an array of (row) arrays from "locals.csv"
    localgovs = CSV.read("locals.csv")

    # process each row of the array
    localgovs.each{ |x|

    # URI is assumed to be found in 9th column.
    case x[8]

    # if this is the column header row, append the 3 new column headers
    when "Website"
    x.push("path")
    x.push("host")
    x.push("gTLD")

    # if this row has no website entry, just append 3 blank values to the row
    when nil
    x.push("").push("").push("")

    # otherwise we assume a valid URI is available and process it
    else
    x.push(URI(x[8]).path)
    x.push(URI(x[8]).host)
    x.push(x.last.split(".").last)
    end
    }

    # write the results to "locals-processed.csv"
    CSV.open("locals-processed.csv","wb") do |csv|
    localgovs.each {|x|
    csv << x
    }
    end