Skip to content

Instantly share code, notes, and snippets.

@jasonrudolph
Created September 21, 2012 18:42
Show Gist options
  • Select an option

  • Save jasonrudolph/3763157 to your computer and use it in GitHub Desktop.

Select an option

Save jasonrudolph/3763157 to your computer and use it in GitHub Desktop.
Rough Notes from Strange Loop 2012

About

This gist is a collection of my rough notes from Strange Loop 2012.

Follow me on GitHub or Twitter to get updates as they're posted.

I'm posting these notes immediately after each talk. Expect typos, formatting glitches, incomplete thoughts, and ...

In-memory Databases - the Future is Now!

Michael Stonebraker

Abstract

--- rough notes ---

  • "not your father's transaction processing"
  • how does this fit into big data?
    • big volume - I have too much data
    • big velocity - data is coming at me too fast
    • big variety
  • focusing on high velocity today
  • in 1985, 1,000 transactions/second seemed like an amazing stretch goal
  • today, you can do that on your iPhone
  • TP (Transaction Processing) is now a much broader problem (New TP)
    • massively multiplayer games
    • social networking
    • real time ad placement (i.e., you have 1 millisecond to decide which ad to show me)
    • real time couponing
    • etc.
    • sensor tagging generates new TP applications
      • marathon runners
      • taxicab
      • dynamic traffic routing
      • car insurance "by the drink"
      • mobile social networking
      • ...
      • and TP volumes are ginormous!!
      • serious need for speed and scalability
    • wall street electronic trading
    • real-time fraud detection
    • micro transactions (e.g., buying a soda with your iPhone)
  • In all cases
    • workload is a mix of updates and queries
    • coming at you like a firehose
    • still and ACID problem
      • don't lose my data
      • make sure it's correct
    • tends to break traditional solutions
  • put differently
    • you need to ingest a firehose in real-time
    • you need to process, validate, enrich and respond in real-time (i.e., update)
    • you often need real-time analytics
  • if your data doesn't fit in main memory now, then wait a couple of years and it will
    • yes, Facebook is an exception; you are not Facebook
    • main memory is going down in price faster than the size of TP data is growing
  • 2007 paper "Through the OLTP Looking Glass" found that traditional databases (e.g., Oracle, DB2, etc.) spend less than 10% of their time doing "useful work." The rest is the overhead of record-level locking, latching, recovery, and buffer pool management.
  • How do we go faster?
    • Main memory deployment gets rid of buffer pool (which eliminates 25% of the overhead); leaves other 75% of overhead intact
  • Solution Choices
    • OldSQL - legacy RDBMS vendors
      • Code lines date from the 1980s
      • 30 years worth of "bloatware"
      • Mediocre performance on New TP
      • Slow because they spend all of their time on overhead
      • Would have to re-architect their legacy code to do better
      • They all face the "The Innovators Dilemma"
      • They'll ultimately drift off into the sunset
    • NoSQL - Give up SQL and ACID for performance
      • Give up SQL? That's throwing away 30 years of RDBMS experience.
      • "Stored procedures are good! One round trip from app to DBMS rather than one round trip per record. Move the code to the data, not the other way around." (Editor's note: Hmm. Datomic anyone?)
      • Give up ACID? Can you guarantee that you won't need it tomorrow?
      • Eventual consistency does not mean "eventual consistency"; eventual consistency means "creates garbage"
      • Eventual consistency only works if the changes are allowed to happen in any order (i.e., commutative)
      • Appropriate for:
        • non-transactional systems
        • single record transactions that are commutative
    • NewSQL
      • Preserve SQL and ACID
      • Get performance from a new architecture
      • Scale by running on a cluster of nodes
      • Automatic sharding (parallelism)
      • Focus on OLTP workload
        • a few high volume transaction signatures; implement as stored procedures
        • occasional ad-hoc transactions
      • Issue #1: Buffer pool; Solution: Run in memory
      • Issue #2: Write ahead log; Solution: replication and tandem-style failover (and fail back); You need HA anyway for New TP
      • Issue #3: Multithreading; Solution: Don't do it, or get rid of all shared data structures
      • Issue #4: Record-level locking; Solution: run to completion in timestamp order => no locking
  • VoltDB is NewSQL
    • Open source
    • 70x faster than OldSQL, running on the same hardware
    • 5x faster than Cassandra on VoltDB key-value layer
    • Runs a subset of SQL
    • Scales linearly to 384 cores

Functional Design Patterns

Stuart Sierra

Abstract

--- rough notes ---

  • The concept of design patterns have been largely overlooked in dynamic languages.
    • Norvig (1998) - 16 of the 23 patterns in the GoF book have qualitatively simpler implementation in Lip or Dylan than in C++ for at least some uses of each pattern
  • "Are there any Haskell programmers in the audience? Yeah: you're gonna be pissed." (i.e., not gonna talk about monads)
  • Pattern-Oriented Software Architecture
    • categories
      • architectural patterns
      • design patterns
      • idioms (specific to a single programming language)
    • we're going to focus somewhere in the middle between design patterns and idioms

State Patterns

  • State/event pattern
    • about
      • state is derived from previous state + input
      • many divers inputs
      • need to recover past states
      • need to visualize intermediate states
    • implementation
      • update-state function takes the current state and an event and returns the new state
    • can recreate any past state by reducing over events
    • great for logging (i.e., you store every event that happened in the system)
    • offers great flexibility for how you will store state in your system
      • one extreme: never store state; you can always rebuild it from events
  • Consequences pattern
    • about
      • each event can trigger multiple events
      • generated events cause state changes
      • need to visualize intermediate states
    • implementation
      • function takes the current state and an event, and returns a sequence of consequences
      • the consequences might be more events
    • consequence functions do not compose naturally; you have to update state in between

Data Building Patterns

  • Accumulator pattern
    • large collection of inputs; maybe larger than memory
    • small or scalar result
    • lazy sequences (i.e., map, mapcat, filter, etc.)
    • reduce is the universal accumulator
  • Map-reduce pattern
    • input is linear; maybe larger than one disk
    • motivated by things that were historically true, but are becoming less so:
      • disks are (were?) slow and local to one machine
      • networks are (were?) slow
    • prediction that map-reduce will become less important as the constraints above become less and less true
    • now quite map and reduce
  • Reduce/combine Pattern
  • Recursive expansion pattern
    • build up result out of primitives
    • build abstractions in layers
    • recuse until no more work left to do
    • e.g., macroexpansion, Dataomic transaction functions

Flow Control Patterns

  • Pipeline pattern
    • process with many discrete steps

    • similar "shape" of data at each step; usually a map or record

    • only one execution path

    • example:

        (defn large-process [input]
          (-> input
              subprocess-a
              subprocess-ab
              subprocess-c))
      
        (defn subprocess-a [data]
          ; ...
          )
      
    • useful to be able to easily the exact order of the steps; show it to business users to vet that you've ordered it correctly?

  • Wrapper pattern
    • about
      • process with many discrete steps
      • one main execution path
      • possible branch at each step
    • Clojure's Ring library is an example
    • implementation
      • input: a function
      • output: a function that does something before and/or after the given function; might not call the given function at all
  • Token pattern
    • about
      • may need to cancel an operation, but ...
      • the operation itself is not an identity
    • implementation
      • create a fn that performs the operation
      • returns a fn that allows you to cancel (i.e., undo) the operation
    • examples:
      • the scheduled thread pool in Java
      • Clojure watches
  • Observer pattern
    • yes, it's a GoF pattern
    • register an observer fn with stateful container
    • examples:
      • Clojure watches
  • Strategy pattern
    • yes, another GoF pattern
    • many processes with a similar structure
    • need extension points for future variations
    • examples:
      • Clojure protocols; very similar to the original OO approach
      • Clojure multimethods; dispatch on input

A Whole New World

Gary Bernhardt

Abstract

--- rough notes ---

Gary introduces a side project that Gary has been working on part-time for a little over a year, including ...

a text editor, called "aneditor"

  • modal (like Vim)
  • terminal only
  • "much more powerful than Vim"
  • not an IDE
  • layers
    • overlay an orthogonal "layer" on top of the source code
    • examples:
      • diff layer
      • crash layer => overlay backtrace onto the source code
      • performance layer => overlay time profiling information onto the source code
  • interactions
    • the user experience:
      • one keystroke shows you all the code that the current line interacts with
      • renders a graphical tree-like display in the terminal
      • can navigate through the graph
      • uses GraphViz
    • Useful questions you can answer
      • what code does this test hit?
      • what code does this request hit?
      • what code might hit this crash point?

a terminal, called "anterminal"

  • raster graphics
  • 24-bit color
  • italics, bold, underline
  • momentary keypresses

And, we've all been punk'd. All lies. None of the things above actually exist.

  • We were surprised (right?) when Gary said that he wrote his own terminal. Why? Why is it surprising that someone would write a terminal?
    1. shipping culture => our "shipping culture" is poisonous to infrastructure replacement, even when the infrastructure sucks
    2. legacy and paralysis
      • we overlook the things that have existed our entire programming careers (e.g., terminals)
      • we overlook their limitations and whether those limitations make sense in today's world
  • Advocating for thinking, hammock time, prototyping
  • Our "shipping culture", incremental changes don't allow for fundamental improvements (i.e., rethinkings) like the ones Gary described above

Editor's note: With Light Table, Catnip, aneditor/anterminal, and Bret Victor's upcoming talk, it seems like "Re-imagining Your Development Environment" will be one of the themes of Strange Loop 2012

@gmarik
Copy link
Copy Markdown

gmarik commented Sep 24, 2012

Thank you!

@dotemacs
Copy link
Copy Markdown

Yea, thanks for this

@rippinrobr
Copy link
Copy Markdown

Thanks for doing this Jason.

@gutomcosta
Copy link
Copy Markdown

Thank you!

@jbrechtel
Copy link
Copy Markdown

Thanks

@jbrechtel
Copy link
Copy Markdown

Thanks!

@kivikakk
Copy link
Copy Markdown

👍 A great help!

@benjaminballard
Copy link
Copy Markdown

This is great. Thanks!

@dgfitch
Copy link
Copy Markdown

dgfitch commented Nov 13, 2012

Awesome, stumbled over this from hearing about the Computer Whisperer talk. Unfortunately, it doesn't look like that one will get released until next April, but for anyone else looking for video of the Strange Loop talks, https://thestrangeloop.com/news/strange-loop-2012-video-schedule

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment