MongoDB Aggregations (Day 14)
Before reading this article, I highly recommend reading the following previous parts of the series:
- MongoDB - Day 1 (Introduction To MongoDB)
- MongoDB - Day 2 (Install MongoDB in Windows)
- MongoDB - Day 3 (Database Basics)
- MongoDB - Day 4 (Basics of Collection)
- MongoDB - Day 5 (Data Types in MongoDB)
- MongoDB - Day 6 (Insert Method)
- MongoDB - Day 7 (Find Method Part 1)
- MongoDB - Day 8 (Find Method Part 1)
- MongoDB- Day 9 (Update Method)
- MongoDB- Day10 (Remove Method)
- MongoDB - Day 11 (Collection Methods)
- MongoDB - Day12 (Cursor Methods)
- MongoDB - Day13 (Indexing)
Introduction
Aggregation functions perform operations on groups of documents and return the computed result. Aggregation works mainly to group the data of multiple documents and perform various operations on the grouped data and returns a single or multiple results.
Types of aggregate functions
MongoDB performs aggregate operations in one of the following three ways.
- Single-purpose aggregate methods and commands.
- Pipeline.
- Map Reduce
First, we will create a “Demo” collection and insert the following data into that collection.
- {
- "_id": ObjectId("55dc6c94b32228b6ef8753c0"),
- "Name": "Pankaj Choudhary",
- "Age": 21,
- "Salary": 25000
- }
- {
- "_id": ObjectId("55dc6ca8b32228b6ef8753c1"),
- "Name": "Sandeep Jangid",
- "Age": 22,
- "Salary": 27000
- }
- {
- "_id": ObjectId("55dc6cb9b32228b6ef8753c2"),
- "Name": "Rahul Prajapat",
- "Age": 23,
- "Salary": 37000
- }
- {
- "_id": ObjectId("55dc6ccab32228b6ef8753c3"),
- "Name": "Sanjeev Baldia",
- "Age": 22,
- "Salary": 28000
- }
- {
- "_id": ObjectId("55dc6cdcb32228b6ef8753c4"),
- "Name": "Narendra Sharma",
- "Age": 25,
- "Salary": 25000
- }
- {
- "_id": ObjectId("55dc6cf5b32228b6ef8753c5"),
- "Name": "Nitin Yadav",
- "Age": 28,
- "Salary": 35000
- }
- {
- "_id": ObjectId("55dc6d09b32228b6ef8753c6"),
- "Name": "Omveer Choudhary",
- "Age": 32,
- "Salary": 37000
- }
Single-Purpose aggregate methods and commands
As implied by its name, single-purpose aggregate methods are used for specific aggregation operations on sets of data. Single-purpose aggregate methods are less complex but with limited scope compared to pipeline and map reduce operations. Single-purpose aggregate methods and commands provide straightforward semantics for common data processing options.
The following are some important single-purpose operations.
Count
The Count operation takes a number of documents and depending on the match query returns the count of the documents. In MongoDB, the count command and cursor.count() method are used to do the count operations.
Example 1
- db.Demo.count()
7
In this example Demo.count() is a cursor method that returns the count of all the documents present in the Demo collection.
Example 2
- db.runCommand({count:'Demo',query:{Salary:{$gt:30000}}})
{ "n" : 3, "ok" : 1 }
In this example, the count command takes all the documents in which the value of the Salary field is greater than 30000 and returns the count.
Distinct
The Distinct operation takes a document and depending on the match query returns the unique values for a field. In MongoDB the cursor.distinct() method and the distinct command performs the distinct operations.
Example 1
- db.Demo.distinct("Salary")
[ 25000, 27000, 37000, 28000, 35000 ]
In this example, we use the distinct cursor method that returns the distinct value for the “Salary” field.
Example 2
- db.runCommand({distinct:'Demo',key:"Salary",query:{Salary:{$gt:30000}}})
- {
- "values":
- [
- 37000,
- 35000
- ],
- "stats":
- {
- "n": 3,
- "nscanned": 0,
- "nscannedObjects": 7,
- "timems": 598,
- "planSummary": "COLLSCAN"
- },
- "ok": 1
- }
Group
The Group operation takes a number of documents and depending on the match query creates a group of fields grouped by their value and finally returns an array of documents with the computed result for each group. In MongoDB, the group command and the cursor.group() methods do the group operations.
Example 1
- db.Demo.group({key:{Age:1},reduce:function(cur,result){result.Salary+=cur.Salary},initial:{Salary:0}})
- [
- {
- "Age": 21,
- "Salary": 25000
- },
- {
- "Age": 22,
- "Salary": 55000
- },
- {
- "Age": 23,
- "Salary": 37000
- },
- {
- "Age": 25,
- "Salary": 25000
- },
- {
- "Age": 28,
- "Salary": 35000
- },
- {
- "Age": 32,
- "Salary": 37000
- }]
Example 2
- db.runCommand({group:{ns:'Demo',key:{Salary:1},cond:{Salary:{$lt:30000}},$reduce:function(cur,resu
- lt){result.Age+=cur.Age},initial:{Age:10}}})
- {
- "retval":
- [
- {
- "Salary": 25000,
- "Age": 56
- }, {
- "Salary": 27000,
- "Age": 32
- }, {
- "Salary": 28000,
- "Age": 32
- }
- ],
- "count": NumberLong(4),
- "keys": NumberLong(3),
- "ok": 1
- }
Aggregate Pipeline
Pipeline means the possibility to execute an operation on some input and use the output as the input for the next command and so on. An aggregate pipeline is a framework modeled on the concept of data processing pipelines. In aggregate pipelines, documents enter a multi-stage pipeline that transforms the documents into an aggregated result. The pipeline operation uses the match query to fetch the exact documents and grouping to generate the group of documents. Pipeline operations provide tools for sorting documents by specified fields. A pipeline operation contains many stages, like filters that operate like queries and the transformation of documents that modify the form of output. Mainly, aggregate commands operate on a single collection and pass the entire collection into the aggregation pipeline.
Aggregate pipelines are an alternative to map-reduce. It may be an appropriate choice for aggregate operations because map-reduce are unnecessarily complex. But pipelines have some limited behavior on values types and result size.
Syntax
db.Collection_Name.aggregate({Pipeline expression})
Parameter
Pipeline expression: Pipeline expressions specify the transformation to apply on the input documents. A pipeline expression only operates on the current documents of the pipeline stage.
In an aggregate pipeline, documents pass through many stages and each stage transforms the documents into another form. It is not recommended that each pipeline stage produce a document with respect to each document. Some stages might reduce some documents and might generate new documents.
A pipeline aggregate contains a number of stages, each stage takes some documents as input and does operations on these documents and generates output. The following are the possible stages in an aggregate pipeline:
Stage |
Description |
$project | Select specific fields from a collection |
$match | Specifies the selection criteria, to reduce the amount of documents |
$group | Used to divide the documents into various groups |
$sort | Sort the documents |
$skip | Used to skip a number of documents |
$limit | Defines the number of documents in an output result |
$unwind | Unwinds the documents using arrays |
$redact | Reshapes each document by restricting the content for each document |
$out | It is the last stage of a pipeline that writes the resulting documents of the aggregate pipeline to a collection. |
Let's see an example to understand the concept of aggregate pipelines.
Example 1
- db.Demo.aggregate([{$group:{_id:"$Age",Salary_Is:{$sum:"$Salary"}}}])
- {
- "_id": 32,
- "Salary_Is": 37000
- }
- {
- "_id": 25,
- "Salary_Is": 25000
- }
- {
- "_id": 23,
- "Salary_Is": 37000
- }
- {
- "_id": 22,
- "Salary_Is": 55000
- }
- {
- "_id": 28,
- "Salary_Is": 35000
- }
- {
- "_id": 21,
- "Salary_Is": 25000
- }
Select Age, sum(Salary) from Demo group by Age.
- db.Demo.aggregate
- ([{
- $group: {
- _id: "$Age",
- Sum_Salary: {
- $sum: "$Salary"
- },
- Avg_Salary: {
- $avg: "$Salary"
- },
- Min
- _Salary: {
- $min: "$Salary"
- },
- Max_Salary: {
- $max: "$Salary"
- }
- }
- }])
- {
- "_id": 32,
- "Sum_Salary": 37000,
- "Avg_Salary": 37000,
- "Min_Salary": 37000,
- "Max_Salary": 37000
- }
- {
- "_id": 25,
- "Sum_Salary": 25000,
- "Avg_Salary": 25000,
- "Min_Salary": 25000,
- "Max_Salary": 25000
- }
- {
- "_id": 23,
- "Sum_Salary": 37000,
- "Avg_Salary": 37000,
- "Min_Salary": 37000,
- "Max_Salary": 37000
- }
- {
- "_id": 22,
- "Sum_Salary": 55000,
- "Avg_Salary": 27500,
- "Min_Salary": 27000,
- "Max_Salary": 28000
- }
- {
- "_id": 28,
- "Sum_Salary": 35000,
- "Avg_Salary": 35000,
- "Min_Salary": 35000,
- "Max_Salary": 35000
- }
- {
- "_id": 21,
- "Sum_Salary": 25000,
- "Avg_Salary": 25000,
- "Min_Salary": 25000,
- "Max_Salary": 25000
- }
Select Age, sum(Salary), avg(Salary), max(Salary), min(Salary) from Demo group by Age.
Example 3
- db.Demo.aggregate
- ([{
- $group:
- {
- _id: "$Age",
- Sum_Salary:
- {
- $sum: "$Salary"
- },
- Avg_Salary:
- {
- $avg: "$Salary"
- },
- Min
- _Salary:
- {
- $min: "$Salary"
- },
- Max_Salary:
- {
- $max: "$Salary"
- }
- }
- },
- {
- $match:
- {
- _id:
- {
- $gt: 21,
- $lt: 27
- }
- }
- }])
- {
- "_id": 25,
- "Sum_Salary": 25000,
- "Avg_Salary": 25000,
- "Min_Salary": 25000,
- "Max_Salary": 25000
- }
- {
- "_id": 23,
- "Sum_Salary": 37000,
- "Avg_Salary": 37000,
- "Min_Salary": 37000,
- "Max_Salary": 37000
- }
- {
- "_id": 22,
- "Sum_Salary": 55000,
- "Avg_Salary": 27500,
- "Min_Salary": 27000,
- "Max_Salary": 28000
- }
The following query also generate the same output
- db.Demo.aggregate
- ([{
- $match:
- {
- Age:
- {
- $gt: 21,
- $lt: 27
- }
- }
- }, {
- $group:
- {
- _id: "$Age",
- Sum_Salary:
- {
- $sum: "$Salary"
- },
- Avg_Salary:
- {
- $avg: "$Salary"
- },
- Min_Salary:
- {
- $min: "$Salary"
- },
- Max_Salary:
- {
- $max: "$Salary"
- }
- }
- }])
Select Age, sum(Salary), avg(Salary), max(Salary), min(Salary) from Demo Where Age>21 And Age<27 group by Age
Example 4
- db.Demo.aggregate
- ([{
- $limit: 3
- },
- {
- $group:
- {
- _id: "$Age",
- Sum_Salary:
- {
- $sum: "$Salary"
- },
- Avg_Salary:
- {
- $avg: "
- $Salary "
- },
- Min_Salary:{$min:"
- $Salary "},Max_Salary:{$max:"
- $Salary "}
- }
- }])
- {
- "_id": 23,
- "Sum_Salary": 37000,
- "Avg_Salary": 37000,
- "Min_Salary": 37000,
- "Max_Salary": 37000
- }
- {
- "_id": 22,
- "Sum_Salary": 27000,
- "Avg_Salary": 27000,
- "Min_Salary": 27000,
- "Max_Salary": 27000
- }
- {
- "_id": 21,
- "Sum_Salary": 25000,
- "Avg_Salary": 25000,
- "Min_Salary": 25000,
- "Max_Salary": 25000
- }
Important Point
The order of the stages in the aggregate pipeline is very important. Like db.Demo.aggregate($limit,$group) and db.Demo.aggregate($group,$limit) the methods don’t provide the same result.
Now we will execute the previous method but interchange the order of the “limit” and “group” stages.
- db.Demo.aggregate
- ([{
- $group:
- {
- _id: "$Age",
- Sum_Salary:
- {
- $sum: "$Salary"
- },
- Avg_Salary:
- {
- $avg: "$Salary"
- },
- M
- in_Salary:
- {
- $min: "$Salary"
- },
- Max_Salary:
- {
- $max: "$Salary"
- }
- }
- }, {
- $limit: 3
- }])
- {
- "_id": 32,
- "Sum_Salary": 37000,
- "Avg_Salary": 37000,
- "Min_Salary": 37000,
- "Max_Salary": 37000
- }
- {
- "_id": 25,
- "Sum_Salary": 25000,
- "Avg_Salary": 25000,
- "Min_Salary": 25000,
- "Max_Salary": 25000
- }
- {
- "_id": 23,
- "Sum_Salary": 37000,
- "Avg_Salary": 37000,
- "Min_Salary": 37000,
- "Max_Salary": 37000
- }
Example 5
- db.Demo.aggregate
- ([{
- $group:
- {
- _id: "$Age",
- Sum_Salary: {
- $sum: "$Salary"
- },
- Avg_Salary: {
- $avg: "$Salary"
- },
- M
- in_Salary: {
- $min: "$Salary"
- },
- Max_Salary: {
- $max: "$Salary"
- }
- }
- }, {
- $project: {
- Sum_Salary: 1,
- Max_Salary: 1,
- _id: 0
- }
- }, {
- $limit: 4
- }])
- {
- "Sum_Salary": 37000,
- "Max_Salary": 37000
- }
- {
- "Sum_Salary": 25000,
- "Max_Salary": 25000
- }
- {
- "Sum_Salary": 37000,
- "Max_Salary": 37000
- }
- {
- "Sum_Salary": 55000,
- "Max_Salary": 28000
- }
Example 6
- db.Demo.aggregate([{
- $group: {
- _id: "$Age",
- Sum_Salary: {
- $sum: "$Salary"
- },
- Avg_Salary: {
- $avg: "$Salary"
- },
- M
- in_Salary: {
- $min: "$Salary"
- },
- Max_Salary: {
- $max: "$Salary"
- }
- }
- }, {
- $project: {
- Sum_Salary: 1,
- Max_Salary: 1,
- _id: 0
- }
- }, {
- $limit: 4
- }, {
- $skip: 2
- }])
- {
- "Sum_Salary": 37000,
- "Max_Salary": 37000
- }
- {
- "Sum_Salary": 55000,
- "Max_Salary": 28000
- }
Example 7
- db.Demo.aggregate([{
- $group: {
- _id: "$Age",
- Sum_Salary: {
- $sum: "$Salary"
- },
- Avg_Salary: {
- $avg: "$Salary"
- },
- M
- in_Salary: {
- $min: "$Salary"
- },
- Max_Salary: {
- $max: "$Salary"
- }
- }
- }, {
- $project: {
- Sum_Salary: 1,
- Max_Salary: 1,
- _id: 0
- }
- }, {
- $limit: 4
- }, {
- $sort: {
- Sum_Salary: 1
- }
- }])
- {
- "Sum_Salary": 25000,
- "Max_Salary": 25000
- }
- {
- "Sum_Salary": 37000,
- "Max_Salary": 37000
- }
- {
- "Sum_Salary": 37000,
- "Max_Salary": 37000
- }
- {
- "Sum_Salary": 55000,
- "Max_Salary": 28000
- }
Example 8
- db.Demo.aggregate([{
- $group: {
- _id: "$Age",
- Sum_Salary: {
- $sum: "$Salary"
- },
- Avg_Salary: {
- $avg: "$Salary"
- },
- M
- in_Salary: {
- $min: "$Salary"
- },
- Max_Salary: {
- $max: "$Salary"
- }
- }
- }, {
- $project: {
- Sum_Salary: 1,
- Max_Salary: 1,
- _id: 0
- }
- }, {
- $redact: {
- $cond: {
- if: {
- $gt: ["$Sum_Salary", 30000]
- },
- then: "$$PRUNE",
- else: "$$DESCEND"
- }
- }
- }])
- {
- "Sum_Salary": 25000,
- "Max_Salary": 25000
- }
- {
- "Sum_Salary": 25000,
- "Max_Salary": 25000
- }
Map-Reduce
Map-Reduce is another way to do aggregation. Map-Reduce is a combination of two operations. The first part of Map-Reduce is “Map”, that processes each document and retrieves one or more objects for each input document. The second part of Map-Reduce is “Reduce” that combines the result of the Map operations. Map-Reduce handles large amounts of data into aggregate results.
How Map-Reduce operation Works
MongoDB uses a JavaScript function in map-reduce aggregation. Map-Reduce aggregates may return the documents as output or may write the result to a collection. Map-Reduce uses the JavaScript functions that provide high flexibility.
Let us look at some examples for Map-Reduce aggregations.
Example 1
To perform the the mapReduce function for any collection, we must perform the following three steps. Now we will understand these procedures using the “Demo” collection.
Step 1
Define the map function.
- function()
- {
- emit(this.Age,this.Salary);
- } ;
In the function “this” defines the current collection name. This function maps the Salary to Age for each document and emits the Age and Salary pairs.
Step 2
Define the reduce function.
- function(key,values)
- {
- return Array.sum(values);
- }
Define the reduce function with the two arguments, key, and values. The values parameter is an array type that is emitted by the map function and contains salary values and grouped by Age. This function reduces the values array to a sum of its elements.
Step 3
Output Collection.
- { out:"My_Coll"}
The combined form of these three steps is:
- db.Demo.mapReduce(function()
- {
- emit(this.Age, this.Salary);
- },
- function(key, values)
- {
- return Array.sum(v alues);
- },
- {
- query:
- {
- Age:
- {
- $gt: 20
- }
- },
- out: "My_Coll"
- })
- {
- "result" : "My_Coll",
- "timeMillis" : 4,
- "counts" :
- {
- "input" : 7,
- "emit" : 7,
- "reduce" : 1,
- "output" : 6
- },
- "ok" : 1
- }
The result of the preceding query will be stored in the “My_Coll” collection so we can use the find() method to retrieve the result from the “My_Coll” collection.
- db.My_Coll.find().pretty()
- {
- "_id": 21,
- "value": 25000
- }
- {
- "_id": 22,
- "value": 55000
- }
- {
- "_id": 23,
- "value": 37000
- }
- {
- "_id": 25,
- "value": 25000
- }
- {
- "_id": 28,
- "value": 35000
- }
- {
- "_id": 32,
- "value": 37000
- }
Step 1
- var map_func = function()
- {
- emit(this.Age, this.Salary);
- };
- var reduce_func = function(key, values)
- {
- return Array.sum(values);
- };
- db.Demo.mapReduce(map_func, reduce_func,
- {
- query:
- {
- Age:
- {
- $gt: 23
- }
- },
- out: "My_Coll"
- })
- {
- "result": "My_Coll",
- "timeMillis": 4,
- "counts":
- {
- "input": 3,
- "emit": 3,
- "reduce": 0,
- "output": 3
- },
- "ok": 1
- }
- db.My_Coll.find().pretty()
- {
- "_id": 25,
- "value": 25000
- }
- {
- "_id": 28,
- "value": 35000
- }
- {
- "_id": 32,
- "value": 37000
- }
- db.Demo.mapReduce(
- function()
- {
- emit(this.Age, this.Salary);
- },
- function(key, values)
- {
- return Array.avg(v alues);
- }, {
- query:
- {
- Age:
- {
- $gt: 20
- },
- Salary:
- {
- $gte: 24000
- }
- },
- out: "My_Coll",
- limit: 4,
- sort:
- {
- Salary: 1
- }
- })
- {
- "result": "My_Coll",
- "timeMillis": 5,
- "counts":
- {
- "input": 4,
- "emit": 4,
- "reduce": 1,
- "output": 3
- },
- "ok": 1
- }
- }
- db.My_Coll.find().pretty()
- {
- "_id": 21,
- "value": 25000
- }
- {
- "_id": 22,
- "value": 27500
- }
- {
- "_id": 25,
- "value": 25000
- }