Goal: Chain multiple operations together — filter, join, calculate — in a single query. Key Idea: The Aggregation Pipeline is a series of stages. Each stage transforms the data and passes it to the next.
Collection → [Stage 1] → [Stage 2] → [Stage 3] → Result
$match $lookup $group
(WHERE) (JOIN) (GROUP BY)
Think of it like a factory assembly line: raw documents go in, each stage does one job, and the final result comes out the other end.
| Stage | SQL Equivalent | What It Does |
|---|---|---|
$match |
WHERE |
Filter documents |
$lookup |
LEFT JOIN |
Join another collection |
$group |
GROUP BY |
Aggregate/summarize data |
$project |
SELECT col |
Include/exclude/reshape fields |
$sort |
ORDER BY |
Sort the results |
$limit |
LIMIT |
Keep only N results |
$skip |
OFFSET |
Skip N results (for pagination) |
$unwind |
(no direct equiv.) | Flatten an array field |
$count |
COUNT(*) |
Count documents |
db.orders.aggregate([
// Stage 1: Filter (WHERE customer = "Customer A")
{ $match: { customer: "Customer A" } },
// Stage 2: Join (LEFT JOIN products ON orders.product_id = products._id)
{
$lookup: {
from: "products",
localField: "product_id",
foreignField: "_id",
as: "product_details"
}
},
// Stage 3: Summarize (GROUP BY customer, SUM(price))
{
$group: {
_id: "$customer", // Group by this field
total: { $sum: "$price" }, // Sum of all prices
orderCount: { $sum: 1 } // Count of orders
}
}
])
$group: {
_id: "$customer", // The field to group by ($ prefix = field reference)
total: { $sum: "$price" } // Accumulator: sum up all price values
// ↑
// Accumulator operators:
// $sum, $avg, $min, $max, $count, $first, $last, $push (array)
}
$group: {
_id: "$category",
averagePrice: { $avg: "$price" },
cheapest: { $min: "$price" },
mostExpensive:{ $max: "$price" },
allNames: { $push: "$name" } // Collect all names into an array
}
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: { _id: "$customer", total: { $sum: "$price" } } },
{ $sort: { total: -1 } }, // -1 = descending (highest first)
{ $limit: 5 } // Top 5 customers only
])
Equivalent SQL:
SELECT customer, SUM(price) AS total
FROM Orders
WHERE status = 'completed'
GROUP BY customer
ORDER BY total DESC
LIMIT 5;
- Stages execute in order — the sequence matters.
$matchearly in the pipeline = better performance (filters data before other stages run).- You can have as many stages as needed.
$prefix before a field name (like"$price") means "the value of this field" — it's a field reference.