Skip to content

Instantly share code, notes, and snippets.

@powertech2nd
Created March 5, 2026 11:47
Show Gist options
  • Select an option

  • Save powertech2nd/2c7e7aca5080327aa1c8bc4aee5b92fc to your computer and use it in GitHub Desktop.

Select an option

Save powertech2nd/2c7e7aca5080327aa1c8bc4aee5b92fc to your computer and use it in GitHub Desktop.

06 — The "Assembly Line" (Complex Aggregation)

Goal: Chain multiple operations together — filter, join, calculate — in a single query. Key Idea: The Aggregation Pipeline is a series of stages. Each stage transforms the data and passes it to the next.


The Pipeline Mental Model

Collection → [Stage 1] → [Stage 2] → [Stage 3] → Result
               $match     $lookup     $group
              (WHERE)     (JOIN)    (GROUP BY)

Think of it like a factory assembly line: raw documents go in, each stage does one job, and the final result comes out the other end.


Common Pipeline Stages

Stage SQL Equivalent What It Does
$match WHERE Filter documents
$lookup LEFT JOIN Join another collection
$group GROUP BY Aggregate/summarize data
$project SELECT col Include/exclude/reshape fields
$sort ORDER BY Sort the results
$limit LIMIT Keep only N results
$skip OFFSET Skip N results (for pagination)
$unwind (no direct equiv.) Flatten an array field
$count COUNT(*) Count documents

Full Example: Customer Order Total

Task: Get total order value for "Customer A" with product names

db.orders.aggregate([

  // Stage 1: Filter  (WHERE customer = "Customer A")
  { $match: { customer: "Customer A" } },

  // Stage 2: Join    (LEFT JOIN products ON orders.product_id = products._id)
  {
    $lookup: {
      from: "products",
      localField: "product_id",
      foreignField: "_id",
      as: "product_details"
    }
  },

  // Stage 3: Summarize  (GROUP BY customer, SUM(price))
  {
    $group: {
      _id: "$customer",                   // Group by this field
      total: { $sum: "$price" },          // Sum of all prices
      orderCount: { $sum: 1 }             // Count of orders
    }
  }

])

🔍 Breaking Down $group

$group: {
  _id: "$customer",        // The field to group by ($ prefix = field reference)
  total: { $sum: "$price" } // Accumulator: sum up all price values
  //      ↑
  //   Accumulator operators:
  //   $sum, $avg, $min, $max, $count, $first, $last, $push (array)
}

More Accumulator Examples

$group: {
  _id: "$category",
  averagePrice: { $avg: "$price" },
  cheapest:     { $min: "$price" },
  mostExpensive:{ $max: "$price" },
  allNames:     { $push: "$name" }   // Collect all names into an array
}

Sorting and Limiting Results

db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: { _id: "$customer", total: { $sum: "$price" } } },
  { $sort: { total: -1 } },   // -1 = descending (highest first)
  { $limit: 5 }               // Top 5 customers only
])

Equivalent SQL:

SELECT customer, SUM(price) AS total
FROM Orders
WHERE status = 'completed'
GROUP BY customer
ORDER BY total DESC
LIMIT 5;

💡 Notes

  • Stages execute in order — the sequence matters.
  • $match early in the pipeline = better performance (filters data before other stages run).
  • You can have as many stages as needed.
  • $ prefix before a field name (like "$price") means "the value of this field" — it's a field reference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment