Skip to content

Instantly share code, notes, and snippets.

@vintagesucks
Forked from chilts/alexa.js
Last active April 24, 2018 11:15
Show Gist options
  • Select an option

  • Save vintagesucks/02caa884904a0c5805ba2b58988a5c8d to your computer and use it in GitHub Desktop.

Select an option

Save vintagesucks/02caa884904a0c5805ba2b58988a5c8d to your computer and use it in GitHub Desktop.
Getting the Alexa top 1 million sites directly from the server, unzipping it, parsing the csv and writing each entry to a text file
var request = require('request');
var unzip = require('unzip');
var csv2 = require('csv2');
var fs = require('fs');
var wstream = fs.createWriteStream('alexa.txt');
request.get('http://s3.amazonaws.com/alexa-static/top-1m.csv.zip')
.pipe(unzip.Parse())
.on('entry', function (entry) {
entry.pipe(csv2()).on('data', function (data) {
wstream.write(data[1] + '\n');
})
})
;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment