DAY11 MongoDB 深入聚合与常见问题

MongoDB 的运算子前面有提到过，那是属於查询用的，本篇还会再提到一些运算子，专门是给 aggregate 使用。$sort与$limit在昨天讲完了，今天继续...

$project

指定取出那些栏位，例如我只想知道评价最高的电影名称，可以只显示 name 栏位。语法如下：

db.movie.aggregate(
{"$sort" : { "rating" : -1 }},
{"$limit" : 1},
{"$project": {"name": 1}})

结果：

/* 1 */
{
    "_id" : ObjectId("6120c79d2976f517181ffefa"),
    "name" : "movieE"
}

嗯...好像不如预期，多了 _id 栏位。其实这个栏位是预设都会查出来的，需要特别关闭它，语法也很简单。

{"$project": {_id: 0, "name": 1}}

$match

基本上就是 find 指令的条件，应用在 aggregation 就是查询符合这些条件的资料。
我们来查询 producer 是 "companyA" 且评价大於 6 的电影。

{"$match": {"producer": "companyA", "rating": { $gte: 6 }}}

结果

/* 1 */
{
    "_id" : ObjectId("6120c79d2976f517181ffef6"),
    "name" : "movieA",
    "language" : "en-gb",
    "rating" : 8.0,
    "totalCost" : 30000000.0,
    "producer" : "companyA"
}

/* 2 */
{
    "_id" : ObjectId("6120c79d2976f517181ffef8"),
    "name" : "movieC",
    "language" : "zh-tw",
    "rating" : 6.0,
    "totalCost" : 25000000.0,
    "producer" : "companyA"
}

$group & $sum

一般我们熟知的 group by 语法，使用指定的栏位进行分群。
$sum 也一并在这个范例使用。
假设我们要计算每间发行商的成本总和...

db.movie.aggregate([
{ "$group" :
    {
        _id: "$producer",
        "totalCost" : {"$sum":"$totalCost"}
    }
}
])

结果

/* 1 */
{
    "_id" : "companyB",
    "totalCost" : 10000000.0
}

/* 2 */
{
    "_id" : "companyA",
    "totalCost" : 65000000.0
}

/* 3 */
{
    "_id" : "companyC",
    "totalCost" : 6000000.0
}

栏位名称目前是只能用预设的 _id，可以使用 project 语法来改变。

$lookup

lookup 就是关联式资料库 join 的概念，当然还有很多新增的功能，这个可能之後再单独开文章去详细解释应用。简单来说就是跨表(collection)查询资料。
我们先新增 producer collection，里面放的是每间公司负责人的名称。之後我们希望查询每部电影背後出品公司的资讯，本文目的是解说功能，就不新增太多栏位了。

新增 producer 资料语法

db.getCollection('producer').insertMany([
{"companyName": "companyA", "pic": "Thrall"},
{"companyName": "companyB", "pic": "Arthas"},
{"companyName": "companyC", "pic": "Jaina"},
])

look up 语法

db.movie.aggregate([
{ "$lookup" :
    {
        from: "producer",
        localField: "producer",
        foreignField: "companyName",
        as: "companyDetail"
    }},
{ "$project": {_id:0, language:0, producer:0}}
])

结果

/* 1 */
{
    "name" : "movieA",
    "rating" : 8.0,
    "totalCost" : 30000000.0,
    "companyDetail" : [
        {
            "_id" : ObjectId("6120d3e92976f517181ffefe"),
            "companyName" : "companyA",
            "pic" : "Thrall"
        }
    ]
}

/* 2 */
{
    "name" : "movieB",
    "rating" : 5.0,
    "totalCost" : 10000000.0,
    "companyDetail" : [
        {
            "_id" : ObjectId("6120d3e92976f517181ffefe"),
            "companyName" : "companyA",
            "pic" : "Thrall"
        }
    ]
}

/* 3 */
{
    "name" : "movieC",
    "rating" : 6.0,
    "totalCost" : 25000000.0,
    "companyDetail" : [
        {
            "_id" : ObjectId("6120d3e92976f517181ffefe"),
            "companyName" : "companyA",
            "pic" : "Thrall"
        }
    ]
}

/* 4 */
{
    "name" : "movieD",
    "rating" : 8.0,
    "totalCost" : 10000000.0,
    "companyDetail" : [
        {
            "_id" : ObjectId("6120d3e92976f517181ffeff"),
            "companyName" : "companyB",
            "pic" : "Arthas"
        }
    ]
}

/* 5 */
{
    "name" : "movieE",
    "rating" : 9.0,
    "totalCost" : 6000000.0,
    "companyDetail" : [
        {
            "_id" : ObjectId("6120d3e92976f517181fff00"),
            "companyName" : "companyC",
            "pic" : "Jaina"
        }
    ]
}

聚合大致上就介绍到这，否则篇幅可能会占太多，有问题再发问或私信罗！

常见聚合问题 1

每间公司第一部电影是什麽
每间公司电影总成本是多少
每间公司出了几部电影
列出分别的电影名称与年份

扣除效能，在商业需求上应该很常遇到这样的问题

db.movie.aggregate([
  { $match: { rating: { $gte:1 }} },
  { $group: {
              _id: '$producer',
              'totalCount': {$sum:1},
              'totalCost': {$sum: '$totalCost'},
              'totalMovies': {$push: {'name':'$name'} } ,
              'totalMovies_withLang': {$push: {'name': '$name', 'lang': '$language'} } ,                
            }
  }
])

常见聚合问题 2 - 条件统计数量

超过评分 7 分电影数量有多少？

db.movie.aggregate([
  { $match: { rating: { $gte:7 } }},
  { $count: 'HigherThan 7 rating movies count' }
])

常见聚合问题 3 - 阵列统计

假设每部电影都有一个 tags 阵列栏位 ["Action", "Eastern", "Western", "Historical", "Fantasy", "Drama", "Horror", "Thriller", "Science"] 会随机包含一到多个元素。

总共有几种元素
每种元素出现次数

db.movie.aggregate([
  { $unwind: { path: '$tags' }},
  { $group: { _id: '$tags', 'Counts': {$sum:1} } }
])

本系列文章会同步发表於我个人的部落格 Pie Note

<<: Day.3 天秤的两端

>>: [Day11] 在 GCP 上建立 VM 与布署 API 程序

DAY11 MongoDB 深入聚合与常见问题

DAY11 MongoDB 深入聚合与常见问题

$project

$match

$group & $sum

$lookup

常见聚合问题 1

常见聚合问题 2 - 条件统计数量

常见聚合问题 3 - 阵列统计

Day 13 阿里云架设网站-弹性负载 & CDN

[Day28] Scrum失败经验谈 – 我的压力，团队倍感压力

CI/CD - Drone 五分钟成为终极工具人

MacOS读取蓝牙摇杆讯号，利用python修改pynput程序码实现 - 3.修改pynput

[DAY 25] _STM32 看门狗简介_独立看门狗(1)

[Day 16] 阿嬷都看得懂的通用 .html 档案结构

.NET Core第16天_AnchorTagHelper的使用

Day 27 - 工作满一年了，该离职吗?

小测试

大数据平台：丛集管理