Mudit emptyr1

See also:

Service	Type	Storage	Limitations
Amazon DynamoDB	⚠️ Proprietary NoSQL	25 GB	⚠️ Payment method required
Amazon RDS	⚠️ Proprietary RDBMS		⚠️ Only free for 1 year
Azure SQL Database	MS SQL Server		⚠️ Only free for 1 year
👉 Clever Cloud	PostgreSQL, MySQL, MongoDB, Redis	256 MB (PostgreSQL)	Max 5 connections (PostgreSQL)

Jennifer Zhao | SQL Problems Collection | 2018-02-22 22:14:27

Comments I
- Calculate the average comments for the users with >= 2 posts, and each post has comments greater or equal to 40
Comments II
- What is the distribution of comments?
Now what if content_type becomes {comment, video, photo, article}，what is the comment distribution for each content type?

FWIW: I (@rondy) am not the creator of the content shared here, which is an excerpt from Edmond Lau's book. I simply copied and pasted it from another location and saved it as a personal note, before it gained popularity on news.ycombinator.com. Unfortunately, I cannot recall the exact origin of the original source, nor was I able to find the author's name, so I am can't provide the appropriate credits.

Effective Engineer - Notes

By Edmond Lau
Highly Recommended 👍
http://www.theeffectiveengineer.com/

What's an Effective Engineer?

wget https://storage.googleapis.com/golang/go1.7.linux-armv6l.tar.gz
tar -C /usr/local -xzf go1.7.linux-armv6l.tar.gz
export PATH=$PATH:/usr/local/go/bin

	package main

	import (
	"context"
	"flag"
	"fmt"
	"log"
	"net/http"
	"os"
	"os/signal"

	import org.apache.spark.sql.SparkSession

	object SparkSessionS3 {
	//create a spark session with optimizations to work with Amazon S3.
	def getSparkSession: SparkSession = {
	val spark = SparkSession
	.builder
	.appName("my spark application name")
	.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
	.config("spark.hadoop.fs.s3a.access.key", "my access key")

	from airflow import DAG
	from airflow.operators import BashOperator
	from datetime import datetime
	import os
	import sys

	args = {
	'owner': 'airflow'
	, 'start_date': datetime(2017, 1, 27)
	, 'provide_context': True

	# start with this: https://gist.github.com/modqhx/c761020d987fd21929c64e69af27af2f
	wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz \| tar -xvzf spark-2.1.0-bin-hadoop2.7.tgz
	mv spark-2.1.0-bin-hadoop2.7 /usr/local/spark2.1

	echo 'SPARK_HOME=/usr/local/spark2.1' >> ~/.bashrc
	echo 'PATH=$SPARK_HOME/bin:$PATH' >> ~/.bashrc

	# Install java and set JAVA_HOME if not set already from here: https://www.digitalocean.com/community/tutorials/how-to-install-java-on-ubuntu-with-apt-get

	sudo apt-get update

	// This example shows how to use row_number and rank to create
	// a dataframe of precipitation values associated with a zip and date
	// from the closest NOAA station

	import org.apache.spark.sql.expressions.Window
	import org.apache.spark.sql.functions._

	// mocked NOAA weather station data
	case class noaaData(zip:String,station:String,date:Long,value:String=null,distance:Int)
	val t = Seq(

	# Add this snippet to the top of your playbook.
	# It will install python2 if missing (but checks first so no expensive repeated apt updates)
	# gwillem@gmail.com

	- hosts: all
	gather_facts: False

	tasks:
	- name: install python 2
	raw: test -e /usr/bin/python \|\| (apt -y update && apt install -y python-minimal)