Skip to content

Instantly share code, notes, and snippets.

View bladereo's full-sized avatar

Wenjun YANG bladereo

  • Didi Chuxing
  • Beijing, China
View GitHub Profile
@akashpalrecha
akashpalrecha / an-inquiry-into-matplotlib-figures.ipynb
Last active December 27, 2024 14:38
An Inquiry into Matplotlib's Figures, Axes, subplots and the very amazing GridSpec!
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Random Sampling Without Replacement

Last time we talked about rolling unbiased virtural dice, now let's turn to opening virtual booster packs in a collectible card game. If you're a nerd like me, you may be interested in reading about the specifics of how cards are distributed in [Magic: The Gathering][2] and [Yu-Gi-Oh!][3], but our main concern here is to randomly select n items from an array of N options. I'm told that it's also useful in scientific computing or whatever.

@abridgland
abridgland / gaussian-processes-1.ipynb
Last active August 24, 2025 14:36
A Jupyter notebook to accompany Intro to Gaussian Processes - Part I at http://bridg.land/posts/gaussian-processes-1
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@aparrish
aparrish / understanding-word-vectors.ipynb
Last active December 18, 2025 05:55
Understanding word vectors: A tutorial for "Reading and Writing Electronic Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@aparrish
aparrish / spacy_intro.ipynb
Last active March 14, 2025 21:43
NLP Concepts with spaCy. Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

FWIW: I (@rondy) am not the creator of the content shared here, which is an excerpt from Edmond Lau's book. I simply copied and pasted it from another location and saved it as a personal note, before it gained popularity on news.ycombinator.com. Unfortunately, I cannot recall the exact origin of the original source, nor was I able to find the author's name, so I am can't provide the appropriate credits.


Effective Engineer - Notes

What's an Effective Engineer?

@jlln
jlln / spark OneHot encoder.scala
Last active June 2, 2018 14:29
One-hot encoder for use with Spark DataFrames.
import scala.collection.JavaConverters._
import org.apache.spark.sql.types.{StructType,StructField,StringType}
import org.apache.spark.sql.Row
def identityMatrix(n:Int):Array[Array[String]]=Array.tabulate(n,n)((x,y) => if(x==y) "1" else "0")
def encodeStringOneHot(table:org.apache.spark.sql.DataFrame,column:String) = {
//Accepts the dataframe and the target column name. Returns a new dataframe in which the target column has been replaced with a one-hot/dummy encoding.
table.registerTempTable("temp")
@inkhorn
inkhorn / cellphone analysis.R
Created April 6, 2015 20:47
Cell Phone Analysis
library(jsonlite)
cp = fromJSON(txt = "Cell Phone Data.txt", simplifyDataFrame = TRUE)
num.atts = c(4,9,11,12,13,14,15,16,18,22)
cp[,num.atts] = sapply(cp[,num.atts], function (x) as.numeric(x))
cp$aspect.ratio = cp$att_pixels_y / cp$att_pixels_x
cp$isSmartPhone = ifelse(grepl("smart|iphone|blackberry", cp$name, ignore.case=TRUE) == TRUE | cp$att_screen_size >= 4, "Yes", "No")
@hadley
hadley / ds-training.md
Created March 13, 2015 18:49
My advise on what you need to do to become a data scientist...

If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?

I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:

  • Statistical knowledge
  • Programming/hacking skills
  • Domain expertise

Statistical knowledge

@neubig
neubig / lstm-lm.py
Last active August 23, 2017 09:18
This is a minimal implementation of training for a language model using long short-term memory (LSTM) neural networks
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# This is a simplified implementation of the LSTM language model (by Graham Neubig)
#
# LSTM Neural Networks for Language Modeling
# Martin Sundermeyer, Ralf Schlüter, Hermann Ney
# InterSpeech 2012
#
# The structure of the model is extremely simple. At every time step we