Managed Apache Kafka on AWS
Interested in AWS Kafka? Rather than going with Amazon MSK, you can save with true open source Apache Kafka fully managed and hosted on AWS with Instaclustr.
Apache Kafka: The Platform for Building Real-Time Streaming Data Pipelines and Applications
Our fully managed and hosted Apache Kafka service can be hosted on AWS to provide you with a reliable and highly available alternative to Amazon MSK. Instaclustr Managed Apache Kafka on AWS enables you to build fast and scalable distributed systems for real time streaming.
While some turn to Amazon Managed Streaming for Apache Kafka, you can use AWS for hosting your Apache Kafka instance without committing to Amazon as your managed service provider to avoid long-term vendor lock-in. Our service lets you deploy your open source Apache Kafka cluster on AWS, fully managed by Instaclustr, to create highly available clusters for your streaming data with pre-configured and fully optimized settings. Our open source Apache Kafka optimization techniques are based on best practices developed from thorough and exhaustive testing to match different real-world use cases.
The 10-part blog series showcases a detailed anomaly detection application we deployed on Amazon EKS and integrated with a massive-scale Apache Kafka and Cassandra data pipeline on AWS, all through the Instaclustr Managed Platform. The series highlights best practices, performance tuning, monitoring and tracing capabilities, and above all demonstrates how a massively scalable Kafka-Cassandra data pipeline can be architected to handle and detect anomalies from billions of daily transactions.
Explore the 10-part series by category.
-
Best Practices and Benchmarking
In this blog we introduce the main motivation behind the project, and cover functionality and initial test results beginning with Cassandra.
-
Automatic Provisioning Using Instaclustr’s Provisioning API
Learn how to provision Cassandra and Kafka clusters automatically with Instaclustr’s provisioning API.
-
Load Generation for Cassandra/Kafka Clusters
In this post, we generate high volume load for Kafka, the log aggregation system that operates via a publish-subscribe mechanism.
-
Prototyping
Metrics were added to compute and report CPU utilization, memory, rate-of-event production, and producer latency.
-
Application Monitoring with Prometheus
We explore how to better understand an open source system using Prometheus for distributed metrics monitoring.
-
Application Tracing with OpenTracing
In this post, we look at another way of increasing visibility into a system using OpenTracing for distributed tracing.
-
Kubernetes Cluster Creation and Application Deployment
We explore deploying the Anomalia Machina application on Kubernetes with the help of Amazon EKS.
-
Production Application Deployment with Kubernetes
We deploy the instrumented application in a cloud production environment.
-
Anomaly Detection at Scale
We test out the application to see how anomaly detection can scale on small Kafka and Cassandra Instaclustr production clusters.
-
Final Results
Our final blog of the Anomalia Machina series focuses on scaling the application out from 3 to 48 Cassandra nodes. The scale results were impressive: 574 CPU cores (across Cassandra, Kafka, and Kubernetes clusters), 2.3 million writes/s into Kafka during its peak, and 220,000 anomaly checks per second (sustainable). In total, the application handled, a massive 19 billion anomaly checks per day.
Best Practices for running managed Kafka on AWS
- Best Practices and Benchmarking
- Automation Provisioning Using Instaclustr's Provisioning API
- Prototyping
In this blog, we introduce the main motivation behind the project, and cover functionality and initial test results beginning with Cassandra.
Learn how to provision Cassandra and Kafka clusters automatically with Instaclustr’s provisioning API.
In this post, we generate a high volume load for Kafka, the log aggregation system that operates via a publish-subscribe mechanism.