Skip to content

Instantly share code, notes, and snippets.

@quarterdome
Last active August 29, 2015 14:25
Show Gist options
  • Select an option

  • Save quarterdome/292785b60a685a600cba to your computer and use it in GitHub Desktop.

Select an option

Save quarterdome/292785b60a685a600cba to your computer and use it in GitHub Desktop.

Revisions

  1. Matt Nemenman revised this gist Jul 25, 2015. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -20,6 +20,7 @@ merged as (
    null as state,
    null as action,
    null as contract,
    null as source_app,
    1 as is_first_event
    from web.traffic_source

    @@ -35,6 +36,7 @@ merged as (
    state,
    action,
    contract,
    source_app,
    0 as is_first_event
    from api.interest
    where
  2. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,5 @@
    -- This SQL snippet defines the session as any event that happened after
    -- traffic_source event, and before the next traffic_source event
    -- This SQL snippet attributes an interest to traffic source. Any interest
    -- event is attributed to previous traffic_source event.

    with

  3. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 5 additions and 0 deletions.
    5 changes: 5 additions & 0 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -19,6 +19,7 @@ merged as (
    null as interest_id,
    null as state,
    null as action,
    null as contract,
    1 as is_first_event
    from web.traffic_source

    @@ -33,6 +34,7 @@ merged as (
    interest_id,
    state,
    action,
    contract,
    0 as is_first_event
    from api.interest
    where
    @@ -77,6 +79,9 @@ select
    sent_at,
    user_id,
    interest_id,
    state,
    action,
    contract,
    interest_traffic_source,
    interest_campaign
    from interest_with_attribution
  4. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,5 @@
    -- This snippet defines the session as any event that happened after
    --- traffic_source event, and before the next traffic_source event
    -- This SQL snippet defines the session as any event that happened after
    -- traffic_source event, and before the next traffic_source event

    with

  5. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -17,6 +17,8 @@ merged as (
    name as traffic_source_name,
    campaign,
    null as interest_id,
    null as state,
    null as action,
    1 as is_first_event
    from web.traffic_source

    @@ -29,6 +31,8 @@ merged as (
    null as traffic_source_name,
    null as campaign,
    interest_id,
    state,
    action,
    0 as is_first_event
    from api.interest
    where
  6. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 19 additions and 0 deletions.
    19 changes: 19 additions & 0 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,14 @@
    -- This snippet defines the session as any event that happened after
    --- traffic_source event, and before the next traffic_source event

    with

    -- Merge traffic_sources and interest events in one event pool.
    -- * pad with 'null' every field that is not common between these two events
    -- * add a new field 'is_first_event'; set it to 1 for traffic_source events
    -- and to 0 for interest events
    -- * add a new field 'type' with event type: interest or traffic_source.

    merged as (
    select
    'traffic_source' as type,
    @@ -26,6 +35,9 @@ merged as (
    source_app in ('TabletApartmentList', 'ApartmentList', 'MobileApartmentList')
    ),

    -- Add a new column with session_id per user. Partition table by user id,
    -- and calculare running sum of is_first_event. This gives you a unique id.

    interest_with_session as (
    select
    *,
    @@ -36,6 +48,11 @@ interest_with_session as (
    from merged
    ),

    -- Copy traffic_source campaign and name to all subsequent events.
    -- * Now partition by user_id, and session_id
    -- * Use first_value() window function, to copy the value from first
    -- event in partition to all other events.

    interest_with_attribution as (
    select
    *,
    @@ -50,6 +67,8 @@ interest_with_attribution as (
    from interest_with_session
    )

    -- Now filter out irrelevant columns, and filter keep only interest rows.

    select
    sent_at,
    user_id,
  7. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 0 additions and 3 deletions.
    3 changes: 0 additions & 3 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -10,7 +10,6 @@ merged as (
    null as interest_id,
    1 as is_first_event
    from web.traffic_source
    where sent_at > '2015-07-20'

    union

    @@ -25,8 +24,6 @@ merged as (
    from api.interest
    where
    source_app in ('TabletApartmentList', 'ApartmentList', 'MobileApartmentList')
    and
    sent_at > '2015-07-21'
    ),

    interest_with_session as (
  8. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -10,7 +10,7 @@ merged as (
    null as interest_id,
    1 as is_first_event
    from web.traffic_source
    where sent_at > '2015-07-21'
    where sent_at > '2015-07-20'

    union

  9. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -39,7 +39,7 @@ interest_with_session as (
    from merged
    ),

    interest_with_attribution (
    interest_with_attribution as (
    select
    *,
    first_value(campaign) over
  10. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 20 additions and 9 deletions.
    29 changes: 20 additions & 9 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -2,6 +2,7 @@ with

    merged as (
    select
    'traffic_source' as type,
    sent_at,
    user_id,
    name as traffic_source_name,
    @@ -14,6 +15,7 @@ merged as (
    union

    select
    'interest' as type,
    sent_at,
    user_id,
    null as traffic_source_name,
    @@ -35,21 +37,30 @@ interest_with_session as (
    order by sent_at
    rows between unbounded preceding and current row ) as session_id
    from merged
    ),

    interest_with_attribution (
    select
    *,
    first_value(campaign) over
    ( partition by user_id, session_id
    order by sent_at
    rows between unbounded preceding and current row) as interest_campaign,
    first_value(traffic_source_name) over
    ( partition by user_id, session_id
    order by sent_at
    rows between unbounded preceding and current row) as interest_traffic_source
    from interest_with_session
    )

    select
    sent_at,
    user_id,
    interest_id,
    first_value(campaign) over
    ( partition by user_id, session_id
    order by sent_at
    rows between unbounded preceding and current row) as interest_campaign,
    first_value(traffic_source_name) over
    ( partition by user_id, session_id
    order by sent_at
    rows between unbounded preceding and current row) as interest_traffic_source
    from interest_with_session
    interest_traffic_source,
    interest_campaign
    from interest_with_attribution
    where type = 'interest'

    ;

  11. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -38,7 +38,9 @@ interest_with_session as (
    )

    select
    *,
    sent_at,
    user_id,
    interest_id,
    first_value(campaign) over
    ( partition by user_id, session_id
    order by sent_at
  12. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 0 additions and 2 deletions.
    2 changes: 0 additions & 2 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,6 @@ with

    merged as (
    select
    event_id,
    sent_at,
    user_id,
    name as traffic_source_name,
    @@ -15,7 +14,6 @@ merged as (
    union

    select
    event_id,
    sent_at,
    user_id,
    null as traffic_source_name,
  13. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -43,10 +43,12 @@ select
    *,
    first_value(campaign) over
    ( partition by user_id, session_id
    order by sent_at) as interest_campaign,
    order by sent_at
    rows between unbounded preceding and current row) as interest_campaign,
    first_value(traffic_source_name) over
    ( partition by user_id, session_id
    order by sent_at) as interest_traffic_source
    order by sent_at
    rows between unbounded preceding and current row) as interest_traffic_source
    from interest_with_session

    ;
  14. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 8 additions and 7 deletions.
    15 changes: 8 additions & 7 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -5,7 +5,7 @@ merged as (
    event_id,
    sent_at,
    user_id,
    name,
    name as traffic_source_name,
    campaign,
    null as interest_id,
    1 as is_first_event
    @@ -18,7 +18,7 @@ merged as (
    event_id,
    sent_at,
    user_id,
    null as name,
    null as traffic_source_name,
    null as campaign,
    interest_id,
    0 as is_first_event
    @@ -31,9 +31,7 @@ merged as (

    interest_with_session as (
    select
    event_id,
    sent_at,
    interest_id,
    *,
    sum(is_first_event) over
    ( partition by user_id
    order by sent_at
    @@ -43,9 +41,12 @@ interest_with_session as (

    select
    *,
    first_value() over
    first_value(campaign) over
    ( partition by user_id, session_id
    order by sent_at) as interest_campaign,
    first_value(traffic_source_name) over
    ( partition by user_id, session_id
    order by sent_at) as interest_campaign
    order by sent_at) as interest_traffic_source
    from interest_with_session

    ;
  15. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -4,6 +4,7 @@ merged as (
    select
    event_id,
    sent_at,
    user_id,
    name,
    campaign,
    null as interest_id,
    @@ -16,6 +17,7 @@ merged as (
    select
    event_id,
    sent_at,
    user_id,
    null as name,
    null as campaign,
    interest_id,
  16. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 5 additions and 5 deletions.
    10 changes: 5 additions & 5 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -3,21 +3,21 @@ with
    merged as (
    select
    event_id,
    sent_at
    sent_at,
    name,
    campaign,
    null,
    null as interest_id,
    1 as is_first_event
    from web.traffic_sources
    from web.traffic_source
    where sent_at > '2015-07-21'

    union

    select
    event_id,
    sent_at,
    null,
    null,
    null as name,
    null as campaign,
    interest_id,
    0 as is_first_event
    from api.interest
  17. Matt Nemenman revised this gist Jul 23, 2015. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -21,8 +21,10 @@ merged as (
    interest_id,
    0 as is_first_event
    from api.interest
    where source_app in ('TabletApartmentList', 'ApartmentList', 'MobileApartmentList')
    where sent_at > '2015-07-21'
    where
    source_app in ('TabletApartmentList', 'ApartmentList', 'MobileApartmentList')
    and
    sent_at > '2015-07-21'
    ),

    interest_with_session as (
  18. Matt Nemenman created this gist Jul 23, 2015.
    50 changes: 50 additions & 0 deletions interest_attribution.sql
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,50 @@
    with

    merged as (
    select
    event_id,
    sent_at
    name,
    campaign,
    null,
    1 as is_first_event
    from web.traffic_sources
    where sent_at > '2015-07-21'

    union

    select
    event_id,
    sent_at,
    null,
    null,
    interest_id,
    0 as is_first_event
    from api.interest
    where source_app in ('TabletApartmentList', 'ApartmentList', 'MobileApartmentList')
    where sent_at > '2015-07-21'
    ),

    interest_with_session as (
    select
    event_id,
    sent_at,
    interest_id,
    sum(is_first_event) over
    ( partition by user_id
    order by sent_at
    rows between unbounded preceding and current row ) as session_id
    from merged
    )

    select
    *,
    first_value() over
    ( partition by user_id, session_id
    order by sent_at) as interest_campaign
    from interest_with_session

    ;