Why?

Decoupling, Improved scalability, Increased availability, Better performance

Questions

msg format, size, type (text/media)

repeated consumption

msg order

data retention

producer, consumers

delivery semantics ⇒ {atleast , atmost, exactly} once

target throughput, latency

Messaging Models

Point to point

Publish-Subscribe

Clients

Producer: pushes messages to specific topics.

Consumer group: subscribes to topics and consumes messages.

Core service and storage

  1. Broker: holds multiple partitions. A partition holds a subset of messages for a topic.
    1. Storage
      1. Data storage: messages are persisted in data storage in partitions.
      2. State storage: consumer states are managed by state storage.
      3. Metadata storage: configuration and properties of topics are persisted in metadata storage.
  2. Coordination service
    1. Service discovery: which brokers are alive.
    2. Leader election: one of the brokers is selected as the active controller. There is only one active controller in the cluster. The active controller is responsible for assigning partitions.
    3. Apache Zookeeper [2] or etcd [3] are commonly used to elect a controller.