Skip to content

Instantly share code, notes, and snippets.

@smrgit
smrgit / kMeans_in_BQ.sql
Last active October 23, 2021 14:43
kMeans using JavaScript UDFs in BigQuery
CREATE TEMPORARY FUNCTION
-- In this function, we're going to be working on arrays of values.
-- we're also going to define a set of functions 'inside' the kMeans.
-- *heavily borrowing from https://github.com/NathanEpstein/clusters* --
kMeans(x ARRAY<FLOAT64>, -- ESR1 gene expression
y ARRAY<FLOAT64>, -- EGFR gene expression
iterations FLOAT64, -- the number of iterations
@omarish
omarish / Load_CSV_to_Vertica.md
Last active December 8, 2019 02:44
Quick primer on loading a large CSV file into a Vertica Database.

Load a Large CSV into Vertica

Here's an efficient way to load a dataset into Vertica by splitting it up into multiple pieces and then parallelizing the load process.

Note that this only makes sense if your Vertica cluster is a single node. If it's running more nodes, there are definitely more efficient ways of doing this.

For this example, the large CSV file will be called large_file.csv. If your file is under 1GB, it probably makes sense to load it using a single COPY command.