Kafka 주제 및 파티션 이해
엔터프라이즈 솔루션 목적으로 Kafka를 배우기 시작했습니다.
독서 중에 몇 가지 질문이 떠 올랐습니다.
- 제작자가 메시지를 작성할 때-메시지 를 보내려는 주제 를 지정합니다. 맞습니까? 파티션에 관심이 있습니까?
- 구독자가 실행 중일 때 동일한 주제의 소비자 클러스터 또는이 소비자 그룹에 관심이있는 여러 주제의 일부가되도록 그룹 ID를 지정합니까?
각 소비자 그룹에 브로커에 해당 파티션이 있습니까? 아니면 각 소비자 그룹에 있습니까?
파티션이 브로커에 의해 생성되었으므로 소비자가 걱정하지 않습니까?
이것은 각 파티션에 대한 오프셋이있는 큐이므로, 읽고 싶은 메시지를 지정하는 것은 소비자의 책임입니까? 상태를 저장해야합니까?
메시지가 대기열에서 삭제되면 어떻게됩니까? -예를 들어 : 보존 시간이 3 시간이고 시간이 지나면 오프셋이 양쪽에서 어떻게 처리됩니까?
이 게시물에는 이미 답변이 있지만 Kafka Definitive Guide의 몇 가지 그림으로 내 견해를 추가하고 있습니다.
각 질문에 대답하기 전에 제작자 구성 요소에 대한 개요를 추가하십시오.
1. 제작자가 메시지를 작성할 때-메시지를 보내려는 주제를 지정합니다. 맞습니까? 파티션에 관심이 있습니까?
생산자는 다음에 따라 메시지를 배치 할 대상 파티션을 결정합니다.
- 메시지 내에 지정된 경우 파티션 ID
- key % num partitions, if no partition id is mentioned
- Round robin if neither partition id nor message key are available in message, meaning only value is available
2. When a subscriber is running - Does it specify its group id so that it can be part of a cluster of consumers of the same topic or several topics that this group of consumers is interested in?
You should always configure group.id unless you are using the simple assignment API and you don’t need to store offsets in Kafka. It will not be a part of any group. source
3. Does each consumer group have a corresponding partition on the broker or does each consumer have one?
In one consumer group, each partition will be processed by one consumer only. These are the possible scenarios
- Number of consumers is less than number of topic partitions then multiple partitions can be assigned to one of the consumer in the group
- Number of consumers same as number of topic partitions, then partition and consumer mapping can be like below,
- Number of consumers is higher than number of topic partitions, then partition and consumer mapping can be as seen below, Not effective, check Consumer 5
4. As the partitions created by the broker, therefore not a concern for the consumers?
Consumer should be aware of the number of partitions, as was discussed in question 3.
5. Since this is a queue with an offset for each partition, is it responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?
Kafka(to be specific Group Coordinator) takes care of the offset state by producing a message to an internal __consumer_offsets topic, this behavior can be configurable to manual as well by setting enable.auto.commit
to false
. In that case consumer.commitSync()
and consumer.commitAsync()
can be helpful for managing offset.
More about Group Coordinator:
- It's one of the elected broker in the cluster from Kafka server side.
- Consumers interact with Group Coordinator for offset commits and fetch requests.
- Consumer sends periodic heartbeats to Group Coordinator.
6. What happens when a message is deleted from the queue? - For example: The retention was for 3 hours, then the time passes, how is the offset being handled on both sides?
If any consumer starts after retention period, messages will be consumed as per auto.offset.reset
configuration which could be latest/earliest
. technically it's latest
(start processing new messages) because all the messages got expired by that time and retention is topic level configuration.
Let's take those in order :)
1 - When a producer is producing a message - It will specify the topic it wants to send the message to, is that right? Does it care about partitions?
By default, the producer doesn't care about partitioning. You have the option to use a customized partitioner to have a better control, but it's totally optional.
2 - When a subscriber is running - Does it specify its group id so that it can be part of a cluster of consumers of the same topic or several topics that this group of consumers is interested in?
Yes, consumers join (or create if they're alone) a consumer group to share load. No two consumers in the same group will ever receive the same message.
3 - Does each consumer group have a corresponding partition on the broker or does each consumer have one?
Neither. All consumers in a consumer group are assigned a set of partitions, under two conditions : no two consumers in the same group have any partition in common - and the consumer group as a whole is assigned every existing partition.
4 - Are the partitions created by the broker, therefore not a concern for the consumers?
They're not, but you can see from 3 that it's totally useless to have more consumers than existing partitions, so it's your maximum parallelism level for consuming.
5 - Since this is a queue with an offset for each partition, is it responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?
Yes, consumers save an offset per topic per partition. This is totally handled by Kafka, no worries about it.
6 - What happens when a message is deleted from the queue? - For example: The retention was for 3 hours, then the time passes, how is the offset being handled on both sides?
If a consumer ever request an offset not available for a partition on the brokers (for example, due to deletion), it enters an error mode, and ultimately reset itself for this partition to either the most recent or the oldest message available (depending on the auto.offset.reset configuration value), and continue working.
Kafka uses Topic conception which comes to bring order into message flow.
To balance the load, a topic may be divided into multiple partitions and replicated across brokers.
Partitions are ordered, immutable sequences of messages that’s continually appended i.e. a commit log.
Messages in the partition have a sequential id number that uniquely identifies each message within the partition.
파티션을 사용하면 토픽의 로그를 단일 서버 (브로커)에 맞는 크기 이상으로 확장하여 병렬 처리 단위로 사용할 수 있습니다.
토픽의 파티션은 각 브로커가 파티션 공유에 대한 데이터 및 요청을 처리하는 Kafka 클러스터의 브로커를 통해 분배됩니다.
각 파티션은 구성 가능한 개수의 브로커에 복제되어 내결함성을 보장합니다.
이 기사에서 잘 설명했습니다 : http://codeflex.co/what-is-apache-kafka/
참고 URL : https://stackoverflow.com/questions/38024514/understanding-kafka-topics-and-partitions
'IT박스' 카테고리의 다른 글
"테스트 디렉토리에 복사"를 사용하여 단위 테스트를 수행하려면 어떻게해야합니까? (0) | 2020.07.10 |
---|---|
몇 초 동안 정지되는 "원격 시스템 탐색기 작동" (0) | 2020.07.10 |
VS2017 .Net 표준 라이브러리의 단위 테스트 내부 메소드 (0) | 2020.07.10 |
iPad에서 HTML5 비디오를 자동 재생할 수 있습니까? (0) | 2020.07.10 |
Mvn 설치 또는 Mvn 패키지 (0) | 2020.07.10 |