Skip to content

Instantly share code, notes, and snippets.

View andirey's full-sized avatar

Data diver andirey

  • Global
View GitHub Profile
@andirey
andirey / gist:14d47cdbbe04f083995d4b0b3ccc231a
Created March 8, 2018 10:41 — forked from conormm/r-to-python-data-wrangling-basics.md
R to Python: Data wrangling with dplyr and pandas
R to python useful data wrangling snippets
The dplyr package in R makes data wrangling significantly easier.
The beauty of dplyr is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).
dplyr is organised around six key verbs
@andirey
andirey / off_diagonal.R
Created March 1, 2017 20:28 — forked from dsparks/off_diagonal.R
Modifying select off-diagonal items in a matrix
# A method for modifying only select off-diagonal items in a matrix
# From "Thierry" and "Ben Bolker"
# At http://stackoverflow.com/a/11759744/479554
# A sample matrix
size <- 6
mat <- matrix(seq_len(size ^ 2), ncol = size)
print(mat)
# A companion matrix that indicates how "off" a diagonal is:
@andirey
andirey / mice_imp.R
Created February 23, 2017 08:00 — forked from mick001/mice_imp.R
Imputing missing data with R; MICE package: Full article at http://datascienceplus.com/imputing-missing-data-with-r-mice-package/
# Using airquality dataset
data <- airquality
data[4:10,3] <- rep(NA,7)
data[1:5,4] <- NA
# Removing categorical variables
data <- airquality[-c(5,6)]
summary(data)
#-------------------------------------------------------------------------------
@andirey
andirey / classifier.py
Created May 5, 2016 17:43 — forked from zacstewart/classifier.py
Document Classification with scikit-learn
import os
import numpy
from pandas import DataFrame
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.cross_validation import KFold
from sklearn.metrics import confusion_matrix, f1_score
NEWLINE = '\n'
@andirey
andirey / tweet_dumper.py
Created April 11, 2016 07:13 — forked from yanofsky/LICENSE
A script to download all of a user's tweets into a csv
#!/usr/bin/env python
# encoding: utf-8
import tweepy #https://github.com/tweepy/tweepy
import csv
#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""