Skip to content

Instantly share code, notes, and snippets.

import org.apache.http.client.methods.HttpGet
import org.apache.http.impl.client.{BasicResponseHandler, HttpClientBuilder}
import org.apache.spark.mllib.fpm.PrefixSpan
// sequence database
val sequenceDatabase = {
val url = "http://www.philippe-fournier-viger.com/spmf/datasets/SIGN.txt"
val client = HttpClientBuilder.create().build()
val request = new HttpGet(url)
val response = client.execute(request)
package com.databricks.spark.jira
import scala.io.Source
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.sources.{TableScan, BaseRelation, RelationProvider}
@FlorianMuellerklein
FlorianMuellerklein / gist:d761e09df7c770a93c17
Created January 11, 2015 21:00
gbdt feature tansformation example
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
train_loc = 'train.csv'
test_loc = 'test.csv'
TREES = 30
NODES = 7
@pjankiewicz
pjankiewicz / gist:8ab7094d263bf0d4cfb8
Last active September 9, 2016 03:04
kaggle vazu
'''
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2004 Sam Hocevar <sam@hocevar.net>
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
@krishnanraman
krishnanraman / testconv.scala
Created August 2, 2014 01:27
How many numbers do you need to add to exceed 2000 ?
$ scald.rb --hdfs-local testconv.scala
compiling testconv.scala
scalac -classpath /Users/kraman/.sbt/boot/scala-2.9.3/lib/scala-library.jar:/Users/kraman/.sbt/boot/scala-2.9.3/lib/scala-compiler.jar:/Users/kraman/workspace/scalding/scalding-core/target/scala-2.9.3/scalding-core-assembly-0.11.1.jar:/var/folders/b_/17q0nsss269_2kf855mtg4_c0000gn/T/maven/hadoop-core-1.1.2.jar:/var/folders/b_/17q0nsss269_2kf855mtg4_c0000gn/T/maven/commons-codec-1.8.jar:/var/folders/b_/17q0nsss269_2kf855mtg4_c0000gn/T/maven/commons-configuration-1.9.jar:/var/folders/b_/17q0nsss269_2kf855mtg4_c0000gn/T/maven/jackson-asl-0.9.5.jar:/var/folders/b_/17q0nsss269_2kf855mtg4_c0000gn/T/maven/jackson-mapper-asl-1.9.13.jar:/var/folders/b_/17q0nsss269_2kf855mtg4_c0000gn/T/maven/commons-lang-2.6.jar:/var/folders/b_/17q0nsss269_2kf855mtg4_c0000gn/T/maven/slf4j-log4j12-1.6.6.jar:/var/folders/b_/17q0nsss269_2kf855mtg4_c0000gn/T/maven/log4j-1.2.15.jar:/var/folders/b_/17q0nsss269_2kf855mtg4_c0000gn/T/maven/commons-httpclient-3.1.jar:/var/folders/b
@johnynek
johnynek / scalding_alice.scala
Created July 18, 2014 17:15
Learn Scalding with Alice
/**
git clone https://github.com/twitter/scalding.git
cd scalding
./sbt scalding-repl/console
*/
import scala.io.Source
val alice = Source.fromURL("http://www.gutenberg.org/files/11/11.txt").getLines
// Add the line numbers, which we might want later
val aliceLineNum = alice.zipWithIndex.toList
@hellertime
hellertime / HiveSources.scala
Last active August 29, 2015 13:58
Using cascading-hive in Scalding
import scala.collection.JavaConversions._
import cascading.scheme.Scheme
import cascading.tap.SinkMode
import cascading.tuple.Fields
import com.twitter.scalding.{FixedPathSource, HadoopSchemeInstance, SchemedSource}
import org.apache.hadoop.mapred.{JobConf, OutputCollector, RecordReader}
trait HiveScheme extends SchemedSource {
// cascading-hive Schemes take two arrays as arguments
@huowa222
huowa222 / gist:9640856
Created March 19, 2014 12:40
recommendation opensource list
原文地址:http://in.sdo.com/?p=1707
收集和整理了目前互联网上能找到的开源推荐系统,并附上了个人的一些简单点评(未必全面准确),这个列表是目前为止比较全面的了,希望对大家了解掌握推荐系统有帮助(文/陈运文)
SVDFeature
由上海交大的同学开发,采用C++语言,代码质量很高。去年我们参加KDD竞赛时用过,很好很方便,而且出自咱们国人之手,所以置顶推荐!
项目地址:
http://svdfeature.apexlab.org/wiki/Main_Page
SVDFeature包含一个很灵活的Matrix Factorization推荐框架,能方便的实现SVD、SVD++等方法, 是单模型推荐算法中精度最高的一种。SVDFeature代码精炼,可以用相对较少的内存实现较大规模的单机版矩阵分解运算。
另外含有Logistic regression的model,可以很方便的用来进行ensemble运算
@arunoda
arunoda / gist:7790979
Last active February 23, 2026 14:28
Installing SSHPass

Installing SSHPASS

SSHPass is a tiny utility, which allows you to provide the ssh password without using the prompt. This will very helpful for scripting. SSHPass is not good to use in multi-user environment. If you use SSHPass on your development machine, it don't do anything evil.

Installing on Ubuntu

apt-get install sshpass

Installing on OS X

@austinogilvie
austinogilvie / Linear_Regression_With_Loess.ipynb
Created October 30, 2013 19:30
Fit a loess curve with Python. Posted to bitbucket by Jure Zbontar - http://bit.ly/1aIyNaH.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.