// Most trusted platform for open source

Managed Apache Kafka on AWS

Interested in AWS Kafka? Rather than going with Amazon MSK, you can save with true open source Apache Kafka fully managed and hosted on AWS with Instaclustr.

Free Trial Book a Demo

Apache Kafka: The Platform for Building Real-Time Streaming Data Pipelines and Applications

Our fully managed and hosted Apache Kafka service can be hosted on AWS to provide you with a reliable and highly available alternative to Amazon MSK. Instaclustr Managed Apache Kafka on AWS enables you to build fast and scalable distributed systems for real time streaming.

While some turn to Amazon Managed Streaming for Apache Kafka, you can use AWS for hosting your Apache Kafka instance without committing to Amazon as your managed service provider to avoid long-term vendor lock-in. Our service lets you deploy your open source Apache Kafka cluster on AWS, fully managed by Instaclustr, to create highly available clusters for your streaming data with pre-configured and fully optimized settings. Our open source Apache Kafka optimization techniques are based on best practices developed from thorough and exhaustive testing to match different real-world use cases.

Kafka and Cassandra in Action: Explore How We Built a Massively Scalable Anomaly Detection Application

The 10-part blog series showcases a detailed anomaly detection application we deployed on Amazon EKS and integrated with a massive-scale Apache Kafka and Cassandra data pipeline on AWS, all through the Instaclustr Managed Platform. The series highlights best practices, performance tuning, monitoring and tracing capabilities, and above all demonstrates how a massively scalable Kafka-Cassandra data pipeline can be architected to handle and detect anomalies from billions of daily transactions.

Explore the 10-part series by category.

Best Practices and Benchmarking

In this blog we introduce the main motivation behind the project, and cover functionality and initial test results beginning with Cassandra.

Learn More
Automatic Provisioning Using Instaclustr’s Provisioning API

Learn how to provision Cassandra and Kafka clusters automatically with Instaclustr’s provisioning API.

Learn More
Load Generation for Cassandra/Kafka Clusters

In this post, we generate high volume load for Kafka, the log aggregation system that operates via a publish-subscribe mechanism.

Learn More
Prototyping

Metrics were added to compute and report CPU utilization, memory, rate-of-event production, and producer latency.

Learn More
Application Monitoring with Prometheus

We explore how to better understand an open source system using Prometheus for distributed metrics monitoring.

Learn More
Application Tracing with OpenTracing

In this post, we look at another way of increasing visibility into a system using OpenTracing for distributed tracing.

Learn More
Kubernetes Cluster Creation and Application Deployment

We explore deploying the Anomalia Machina application on Kubernetes with the help of Amazon EKS.

Learn More
Production Application Deployment with Kubernetes

We deploy the instrumented application in a cloud production environment.

Learn More
Anomaly Detection at Scale

We test out the application to see how anomaly detection can scale on small Kafka and Cassandra Instaclustr production clusters.

Learn More
Final Results

Our final blog of the Anomalia Machina series focuses on scaling the application out from 3 to 48 Cassandra nodes. The scale results were impressive: 574 CPU cores (across Cassandra, Kafka, and Kubernetes clusters), 2.3 million writes/s into Kafka during its peak, and 220,000 anomaly checks per second (sustainable). In total, the application handled, a massive 19 billion anomaly checks per day.

Learn More

Best Practices for running managed Kafka on AWS

The first step in deploying open source Apache Kafka on AWS is deciding the correct (Amazon EC2) instance type for Apache Kafka nodes (brokers). This important choice determines the performance and throughput of your cluster, as well as the cost of running it on AWS. It is a crucial step and often involves a trade-off between cost and performance.