DAY10 MongoDB 聚合(Aggregate)种类介绍

终於来到第十天，进入比较有趣(X)、痛苦(O)的聚合了，这个算是 MongoDB 里面比较发挥特色的开始。

MongoDB 的 Aggregation 分为两种阶段，第一种阶段为 资料筛选，第二种阶段为资料聚合与统计，白话来说就是一个是选择要处理的资料，另一个是把资料整理成你想看得样子。但要记得，这两者间没有顺序关系，也没有次数限制，你可以[筛选 -> 统计] 或者 [统计 -> 筛选 -> 统计] 都是可以任意安排的。

基本范例长这样

db.employee.aggregate([
   { $match: { status: "A" } },
   { $group: { _id: "$employee_id", total: { $sum: "$field" } } }
])

MongoDB Aggregation 种类

MongoDB aggregation 分为三种，後两种都是比较有弹性、客制化，但是方法不同。

Single purpose
Aggregation pipeline
Map-reduce function

Single purpose

单一功能，最基本的聚合，例如你要查询栏位不重覆的值，通常就是 distinct 语法，而在 MongoDB 内也是一样，就是

db.employee.distinct("field_name")

这个称为 single purpose，马上学完 1/3 了，是不是很轻松。

Map-reduce function

顾名思义就是 function 做法。

Map:

var mapFunc = function(){
  emit(this.employee_id, this.employee_field);
};

Reduce:

var reduceFunc = function(empId, fields){
  return Array.sum(fields);
};

最後组合在一起

db.employee.mapReduce(
  mapFunc,
  reduceFunc,
  { out: "example_map_reduce"}
);

db.example_map_reduce.find()

{
  {"_id": 10001, "value": 33},
  {"_id": 10002, "value": 144},
  {"_id": 10003, "value": 15}
}

Aggregation pipeline

db.employee.aggregate([
  { $group: { _id: "$employee_id", value: { $sum: "$employee_field" }}},
  { $out: "example_map_reduce_2"}
])

db.example_map_reduce_2.find()

Aggregation pipeline，顾名思义所有动作就是在一个个管道中进行，每个水管大小以及形状都是依照需求而定，比较常见的就是(但没有一定的顺序喔)

$match 找到目标资料 => ($unwind 拆解数据) => $group 资料聚合&统计 => $project 重新呈现资料样貌 => $out 输出

这其中还有像是 $sort、$limit 没完全列出来。

整理以一下後两种特性：

map-reduce
- 弹性高
- 可重复使用方法
- 不易维护
- 效能差
aggregation pipeline
- 程序码难重复使用
- 程序码很多，难以阅读
- 效能就是比较快

查询过去网路上的使用者经验，map-reduce速度就是会比较慢，虽然时至今日也演进了好几个版本，但当时已经选择 aggregate 做法，就没有再深入探讨 map-reduce 了，如果有人在近期版本有在使用，可以分享一下使用心得。

Aggregation Operators

MongoDB aggregate 的 operator 不算太多，几乎都很常使用到，再来会一一介绍功能以及如何使用，但在之前我们先准备好测试资料。

db.getCollection('movie').insertMany([
{"name": "movieA", "language": "en-gb", "rating": 8, "totalCost": 30000000, "producer": "companyA"},
{"name": "movieB", "language": "en-gb", "rating": 5, "totalCost": 10000000, "producer": "companyA"},
{"name": "movieC", "language": "zh-tw", "rating": 6, "totalCost": 25000000, "producer": "companyA"},
{"name": "movieD", "language": "zh-tw", "rating": 8, "totalCost": 10000000, "producer": "companyB"},
{"name": "movieE", "language": "zh-tw", "rating": 9, "totalCost": 6000000, "producer": "companyC"},
])

$sort

对特定栏位进行排序，例如我们针对rating进行倒序排序

db.movie.aggregate({"$sort" : { "rating" : -1 }})

{
    "_id" : ObjectId("6120c79d2976f517181ffefa"),
    "name" : "movieE",
    "language" : "zh-tw",
    "rating" : 9.0,
    "totalCost" : 6000000.0,
    "producer" : "companyC"
}

{
    "_id" : ObjectId("6120c79d2976f517181ffef6"),
    "name" : "movieA",
    "language" : "en-gb",
    "rating" : 8.0,
    "totalCost" : 30000000.0,
    "producer" : "companyA"
}

{
    "_id" : ObjectId("6120c79d2976f517181ffef9"),
    "name" : "movieD",
    "language" : "zh-tw",
    "rating" : 8.0,
    "totalCost" : 10000000.0,
    "producer" : "companyB"
}

{
    "_id" : ObjectId("6120c79d2976f517181ffef8"),
    "name" : "movieC",
    "language" : "zh-tw",
    "rating" : 6.0,
    "totalCost" : 25000000.0,
    "producer" : "companyA"
}

{
    "_id" : ObjectId("6120c79d2976f517181ffef7"),
    "name" : "movieB",
    "language" : "en-gb",
    "rating" : 5.0,
    "totalCost" : 10000000.0,
    "producer" : "companyA"
}

$limit

设定希望取得资料的笔数，例如我们只希望取得评价前二高的电影

db.movie.aggregate(
{"$sort" : { "rating" : -1 }},
{"$limit" : 2})

结果

{
    "_id" : ObjectId("6120c79d2976f517181ffefa"),
    "name" : "movieE",
    "language" : "zh-tw",
    "rating" : 9.0,
    "totalCost" : 6000000.0,
    "producer" : "companyC"
}

{
    "_id" : ObjectId("6120c79d2976f517181ffef9"),
    "name" : "movieD",
    "language" : "zh-tw",
    "rating" : 8.0,
    "totalCost" : 10000000.0,
    "producer" : "companyB"
}

pipeline 是依照语法的顺序执行，如果今天反过来，那结果会变怎样呢?

db.movie.aggregate(
{"$limit" : 2},
{"$sort" : { "rating" : -1 }})

结果

{
    "_id" : ObjectId("6120c79d2976f517181ffef6"),
    "name" : "movieA",
    "language" : "en-gb",
    "rating" : 8.0,
    "totalCost" : 30000000.0,
    "producer" : "companyA"
}

{
    "_id" : ObjectId("6120c79d2976f517181ffef7"),
    "name" : "movieB",
    "language" : "en-gb",
    "rating" : 5.0,
    "totalCost" : 10000000.0,
    "producer" : "companyA"
}

可以看到取出来的评价前二高电影，跟我们想像的不同，原因就出在这次 pipeline 是先取出两名，再进行排序的。因此各位在使用上要特别注意每个 pipeline 的位置。

明天再来看稍微复杂的 operators

本系列文章会同步发表於我个人的部落格 Pie Note

<<: 铁人赛 Day10 -- 一定要知道的 CSS (七) -- background:linear-gradient渐层背景

>>: [Day06] - 新拟物风按钮(四) - 事件处理

DAY10 MongoDB 聚合(Aggregate)种类介绍

DAY10 MongoDB 聚合(Aggregate)种类介绍

MongoDB Aggregation 种类

Single purpose

Map-reduce function

Aggregation pipeline

Aggregation Operators

$sort

$limit

RISC-V on Rust 从零开始(5) - RISC-V 指令集分析

004-元件名称

Day17 Redis应用实战-GEO/HyperLogLog/Transaction操作

[JS] You Don't Know JavaScript [Async & Performance] - Now & Later

DAY10: setTimeout和setImmediate的比较

【day12】InvitationDetailFragment

第六章之四

[ Vue ] 使用 Vitawind 1.2 来建置 Vite + Tailwind JIT 专案

CSS微动画 - 为什麽别人的按钮点起来比较有感觉？

Day11 测试写起乃-FactoryBot-create、build、build_stubbed