冒险村19 - Activerecord-import

19 - Activerecord-import

在写入资料的时候,时常会有大量资料一次 import 的情况,可能是汇入整个 csv 档案到资料表、或者是接收 api 将资料汇入。可能会使用 transaction 的方式来写入确保资料都有正常汇入。

举例来说,一次想入一笔资料:

  2.times.each do |time|
    Post.create(
      title: "Post test #{time}",
      content: "mock content #{time}",
      is_available: true,
      user_id: 1
    )
  end

  TRANSACTION (0.2ms)  BEGIN
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
  Post Create (0.3ms)  INSERT INTO "posts" ("title", "content", "is_available", "user_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"  [["title", "Post test 0"], ["content", "mock content 0"], ["is_available", true], ["user_id", 1], ["created_at", "2021-10-02 08:40:31.090718"], ["updated_at", "2021-10-02 08:40:31.090718"]]
  TRANSACTION (0.6ms)  COMMIT

  TRANSACTION (0.1ms)  BEGIN
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
  Post Create (0.3ms)  INSERT INTO "posts" ("title", "content", "is_available", "user_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"  [["title", "Post test 1"], ["content", "mock content 1"], ["is_available", true], ["user_id", 1], ["created_at", "2021-10-02 08:40:31.093480"], ["updated_at", "2021-10-02 08:40:31.093480"]]
  TRANSACTION (0.2ms)  COMMIT

这样子的写法会得到每一笔都会是一个 transaction,所以可能改成用一个 transaction 来包,只会有一次 transaction

  Post.transaction do
    2.times.each do |time|
      Post.create(
        title: "Post test #{time}",
        content: "mock content #{time}",
        is_available: true,
        user_id: 1
      )
    end
  end

  TRANSACTION (0.1ms)  BEGIN
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
  Post Create (0.3ms)  INSERT INTO "posts" ("title", "content", "is_available", "user_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"  [["title", "Post test 0"], ["content", "mock content 0"], ["is_available", true], ["user_id", 1], ["created_at", "2021-10-02 08:44:53.350348"], ["updated_at", "2021-10-02 08:44:53.350348"]]
  User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
  Post Create (0.3ms)  INSERT INTO "posts" ("title", "content", "is_available", "user_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"  [["title", "Post test 1"], ["content", "mock content 1"], ["is_available", true], ["user_id", 1], ["created_at", "2021-10-02 08:44:53.352100"], ["updated_at", "2021-10-02 08:44:53.352100"]]
  TRANSACTION (0.6ms)  COMMIT

不过如果资料大量的时候,SQL 就会一直 n+1 的 INSERT,也可能因为 transaction 把资料表锁住造成其他人要使用会需要排队的情况。

所以这篇要来介绍 Activerecord-Import gem:

gem install

  # Gemfile
  gem "activerecord-import"

Example

  values = 
    2.times.each_with_object([]) do |time, array|
      array << {
        title: "Post test #{time}",
        content: "mock content #{time}",
        is_available: true,
        user_id: 1
      }
    end
  
  # values => [
    # {:title=>"Post test 0", :content=>"mock content 0", :is_available=>true, :user_id=>1},
    # {:title=>"Post test 1", :content=>"mock content 1", :is_available=>true, :user_id=>1}
  # ]

  Post.import values, validate: true

  Post Create Many (3.4ms)  INSERT INTO "posts" ("title","content","is_available","user_id","created_at","updated_at") VALUES ('Post test 0','mock content 0',TRUE,1,'2021-10-02 08:57:48.839905','2021-10-02 08:57:48.839905'),('Post test 1','mock content 1',TRUE,1,'2021-10-02 08:57:48.839905','2021-10-02 08:57:48.839905') RETURNING "id"
  => #<struct ActiveRecord::Import::Result failed_instances=[], num_inserts=1, ids=[19, 20], results=[]>

会发现只会有一笔 INSERT 且可以指定参数 validate 来看是否需要验证,不指定则 default 为 true

注: 不需验证的资料在 import 时速度约快有验证的 5 倍左右

再者一个满重要的问题,当如果输入的资料其实其他栏位都相同,或者是差不多,但需要一个新的写入时间来记录是新的一笔资料的这种情况,或者是想覆盖原本的资料,正常的情况可能会做 delete + create,不过这个套件有提供 Duplicate Key Update 来让 MySQL 做 ON DUPLICATE KEY UPDATE,或者是 PG 做 ON CONFLICT DO UPDATE,相关可参考各资料库 Duplicate Key Update 的写法。

注: 须建立 uniq index 将重复资料改为 upsert

参考来源

My blog


<<:  招募

>>:  [Day 26] BDD - 组合技

[Day - 27] React Bootstrap 套件心得分享

本节大纲 前言 使用原因 如何使用 前言 Bootstrap 是目前很受欢迎的前端框架之一,透过 B...

Day30 铁人赛後心得

真的很感谢 ITHOME 的团队,办这次的比赛,我也没有想到第一次打铁人文就报双主题,而且都挑战成功...

那些被忽略但很好用的 Web API / Geolocation

我的字典里没有放弃,因为已锁定你 现在有不少网站都有地图相关的功能,而为了解决地图绘制、路线运算、...

加点GCP – cloud function

Golang 加点GCP – cloud function 如果想要开始做点其他事情,就必须脱离本机...

聊聊 SaaS 有哪些吧!

前阵子看到了 XChange 成立了天使创投,合夥人 David 分享了他对 SaaS 的看法让我也...