卡夫卡的藏书阁【Book8】- Kafka 手动重新选举 Partition Leader

“I cannot make you understand. I cannot make anyone understand what is happening inside me. I cannot even explain it to myself.”
― Franz Kafka, The Metamorphosis

今天是接续上一天模拟其中一个 broker 挂掉然後又恢复後的状态,因为有两个 Partition Leader 同时在单一 broker 上,这不符合我们希望各 broker 平均分摊流量的目标,因此今天会简单示范如何手动重新选举 Partition Leader

  • 每个 Kafka 的 partition 会有一个 Leader,而每个 Leader 会有0个或多个的跟随者 ( follower )
  • Leader 会负责 partition 的所有读写,follower 们会去跟 Leader 拿资料,就跟一般的 consumer 一样

1. 首先,查看一下目前 topic 分配的状态

$ kafka-topics --describe --zookeeper --topic topicWithThreeBroker

Topic: topicWithThreeBroker	TopicId: BAocHAwHR_STmwAUlI3YMw	PartitionCount: 3	ReplicationFactor: 2	Configs:
	Topic: topicWithThreeBroker	Partition: 0	Leader: 1	Replicas: 1,0	Isr: 1,0
	Topic: topicWithThreeBroker	Partition: 1	Leader: 2	Replicas: 2,1	Isr: 1,2
	Topic: topicWithThreeBroker	Partition: 2	Leader: 2	Replicas: 0,2	Isr: 2,0

2. 接着,创建一个 json 档,填入要重新选举 leader 的 topic 和所属的 partition

$ vim leader_election.json
{ "partitions":
    { "topic": "topicWithThreeBroker", "partition": 0 },
    { "topic": "topicWithThreeBroker", "partition": 1 },
    { "topic": "topicWithThreeBroker", "partition": 2 }

3. 用 kafka-leader-election.sh 重新选举 Leader

$ kafka-leader-election --path-to-json-file leader-election.json --election-type preferred --bootstrap-server :9092

Successfully completed leader election (PREFERRED) for partitions topicWithThreeBroker-2
Valid replica already elected for partitions topicWithThreeBroker-2

这边可以看到只有 partition2 重新进行了选举

4. 查看重新选举完後的 topic 状态

$ kafka-topics --describe --zookeeper --topic topicWithThreeBroker

Topic: topicWithThreeBroker	TopicId: BAocHAwHR_STmwAUlI3YMw	PartitionCount: 3	ReplicationFactor: 2	Configs:
	Topic: topicWithThreeBroker	Partition: 0	Leader: 1	Replicas: 1,0	Isr: 1,0
	Topic: topicWithThreeBroker	Partition: 1	Leader: 2	Replicas: 2,1	Isr: 1,2
	Topic: topicWithThreeBroker	Partition: 2	Leader: 0	Replicas: 0,2	Isr: 2,0

这边可以看到原本 Broker2 上挤了两个 Partition Leader,重新选举後又平均分散了,这样可以避免单一机器 loading 比例过重

这时如果再执行一下选举,会发现会显示讯息表示没有需要重新选举的 partition,因为已经 partition leader 已经分派平均了

$ kafka-leader-election --path-to-json-file leader_election.json --election-type preferred --bootstrap-server :9092
Valid replica already elected for partitions

KafkaController 在新增 partition 的自动选举策略

在上一天的模拟中,我们将 broker0 关掉後,其实是由 KafkaController 去帮 partition2 自动重新选举新的 leader,并且在每次 Isr 发生变动後去通知每一台的 broker 去更新 metadataCache 的资讯,更甚者在为某个 topic 新增 partition 时,也是由 KafkaController 去作自动重新选举、分配的动作。

1. 跟上面一样的 topic 分配的状态

$ kafka-topics --describe --zookeeper --topic topicWithThreeBroker

Topic: topicWithThreeBroker	TopicId: BAocHAwHR_STmwAUlI3YMw	PartitionCount: 3	ReplicationFactor: 2	Configs:
	Topic: topicWithThreeBroker	Partition: 0	Leader: 1	Replicas: 1,0	Isr: 1,0
	Topic: topicWithThreeBroker	Partition: 1	Leader: 2	Replicas: 2,1	Isr: 1,2
	Topic: topicWithThreeBroker	Partition: 2	Leader: 2	Replicas: 0,2	Isr: 2,0

2. 为 topic topicWithThreeBroker 新增6个 partition

$ kafka-topics --zookeeper --topic topicWithThreeBroker --alter --partitions 9

WARNING: If partitions are increased for a topic that has a key, the partition logic or ordering of the messages will be affected
Adding partitions succeeded!

3. 查看自动选举的策略

$ kafka-topics --describe --zookeeper --topic topicWithThreeBroker

Topic: topicWithThreeBroker	TopicId: BAocHAwHR_STmwAUlI3YMw	PartitionCount: 9	ReplicationFactor: 2	Configs:
	Topic: topicWithThreeBroker	Partition: 0	Leader: 1	Replicas: 1,0	Isr: 1,0
	Topic: topicWithThreeBroker	Partition: 1	Leader: 2	Replicas: 2,1	Isr: 2,1
	Topic: topicWithThreeBroker	Partition: 2	Leader: 2	Replicas: 0,2	Isr: 2,0
	Topic: topicWithThreeBroker	Partition: 3	Leader: 1	Replicas: 1,2	Isr: 1,2
	Topic: topicWithThreeBroker	Partition: 4	Leader: 2	Replicas: 2,1	Isr: 2,1
	Topic: topicWithThreeBroker	Partition: 5	Leader: 0	Replicas: 0,2	Isr: 0,2
	Topic: topicWithThreeBroker	Partition: 6	Leader: 1	Replicas: 1,2	Isr: 1,2
	Topic: topicWithThreeBroker	Partition: 7	Leader: 2	Replicas: 2,1	Isr: 2,1
	Topic: topicWithThreeBroker	Partition: 8	Leader: 0	Replicas: 0,2	Isr: 0,2

可以看到 KafkaController 预设的自动分配策略就是将 partition 平均分派到各 broker 上

