Best Practices
Production considerations for event-driven systems including monitoring, schema evolution, and reliability patterns.
Architecture Patterns
Use CQRS for Complex Domains
Separate command and query models when read and write requirements differ significantly. This enables independent scaling and optimization of each side.
Event Sourcing for Audit Trails
Store state as events when you need complete audit trails, time-travel debugging, or the ability to rebuild state from history.
Idempotent Consumers
Design consumers to handle duplicate messages safely. Use unique message IDs to detect and skip duplicates.
@KafkaListener
public void consume(PurchaseOrderDTO order) {
if (orderRepository.existsById(order.getOrderId())) {
log.info("Duplicate order, skipping: {}", order.getOrderId());
return;
}
processOrder(order);
}
Kafka Best Practices
Partition by Business Key
Partition messages by customerId, orderId, or other business keys to maintain ordering within a logical group.
kafkaTemplate.send("purchase-orders", order.getCustomerId(), order);
Use Consumer Groups
Enable horizontal scaling by adding consumers to the same group. Kafka distributes partitions across group members.
Monitor Consumer Lag
Track how far behind consumers are from the latest message. High lag indicates processing bottlenecks.
Plan for Schema Evolution
Use Avro with Schema Registry for backward/forward compatibility. Design schemas to evolve without breaking consumers.
Testing Strategies
Unit Test Services with Mocks
Test business logic independently by mocking repositories and external dependencies.
Integration Test with Embedded Kafka
Use @EmbeddedKafka for testing producer/consumer interactions end-to-end.
Test Error Scenarios
Verify retry logic, dead letter topics, and error handling work correctly.
Production Considerations
Authentication and Authorization
Add proper security with Spring Security, OAuth2, or JWT tokens. This workshop uses simplified security for learning.
Monitoring and Alerting
- Consumer lag metrics
- Message processing rates
- Error rates and dead letter queue monitoring
- System health checks
Separate Read/Write Databases
For high-scale systems, use separate databases optimized for each workload:
- Write DB: PostgreSQL/MySQL for transactional consistency
- Read DB: Elasticsearch/MongoDB for query performance
- Sync: Kafka events to propagate changes
Configure Retention Policies
Set appropriate retention for Kafka topics based on use case:
- Event logs: Long retention (30-90 days or infinite)
- Transient messages: Short retention (1-7 days)
- Compacted topics: Keep latest value per key
Reliability Patterns
Retries with Exponential Backoff
Retry failed operations with increasing delays to handle transient failures.
Circuit Breakers
Prevent cascading failures by stopping requests to failing services temporarily.
Dead Letter Topics
Send messages that fail after max retries to a dead letter topic for manual investigation.
Health Checks
Implement health endpoints that verify connectivity to Kafka, databases, and other dependencies.
Summary
Building production-ready event-driven systems requires careful attention to:
- Idempotency and retry handling
- Schema evolution and compatibility
- Monitoring and observability
- Security and authentication
- Scalability and performance
This workshop provides a foundation for understanding these patterns. Apply these principles when building real-world systems.
Key Takeaways
- Design for idempotency and retries from day one
- Monitor consumer lag continuously
- Plan schema evolution before going to production
- Test failure scenarios thoroughly