Complete guide to OpenSearch in 2024
OpenSearch is an open source search and analytics suite forked from Elasticsearch 7.10 and maintained by the Linux Foundation.
What is OpenSearch
OpenSearch is an open source search and analytics suite forked from Elasticsearch 7.10. Originally maintained by Amazon, it is now a project of the Linux Foundation. It is utilized for numerous purposes, including full-text search, log analytics, and application monitoring. With a RESTful API and extensive JSON support, OpenSearch provides a versatile platform for enterprise-level search and data visualization needs.
It includes features like data processing, indexing, and real-time search capabilities. Its architecture is scalable, allowing it to handle vast amounts of data efficiently. Users can leverage its security model and easy-to-use interface to create dashboards for faster data insights.
Brief history of OpenSearch
OpenSearch was born out of a fork of Elasticsearch 7.10 and Kibana 7.10 after Elastic changed the license to a more restrictive one in January 2021. Amazon Web Services (AWS) and several other companies focused on maintaining and advancing search and analytics capabilities created this open-source project.
The project has quickly evolved, incorporating community feedback to offer enhanced features. Its commitment to open-source principles ensures that OpenSearch remains free to use, modify, and distribute, promoting wider adoption and continual improvement.
Key features of OpenSearch
Built-In Search Capabilities
OpenSearch offers built-in search capabilities that support complex queries and real-time indexing. These capabilities enable users to perform efficient full-text searches, aggregations, and geospatial queries quickly. The system’s scalability ensures that it can manage vast datasets without compromising performance.
In addition to its foundational search features, OpenSearch also supports various search plugins and modules, extending its functionality.
Data Prepper
Data Prepper simplifies data ingestion by transforming raw data into structured formats compatible with OpenSearch. This feature streamlines the data onboarding process, making it easier to index and analyze data. Data Prepper offers support for common data sources and preprocessing tasks, enhancing its utility in data workflows.
This feature integrates with various data pipelines, enabling automatic data transformation and normalization. This interoperability ensures that data is consistently prepared and optimized for indexing, reducing the complexity and overhead associated with data management.
Trace Analytics
Trace Analytics in OpenSearch allows for the detailed analysis of distributed trace data, helping users understand application behavior and performance. By collecting and visualizing traces, this feature aids in identifying bottlenecks and improving application debugging and optimization processes.
Trace Analytics is particularly useful in microservices architectures where tracing inter-service communication is challenging.
Index Management
Index management in OpenSearch ensures that data indexing remains efficient and organized. This feature includes policies for automated index handling, such as rollover, retention, and deletion, enabling smarter resource utilization and data lifecycle management.
Through configuration rules, users can automate index operations, optimizing system performance. These policies help in managing indices based on size, age, and other metrics, simplifying administrative tasks and maintaining a streamlined search environment.
OpenSearch vs Elasticsearch: What are the differences?
While OpenSearch and Elasticsearch share a common origin, significant differences have emerged since the fork in 2021. Here are the key distinctions:
1. Licensing
Elasticsearch transitioned to a Server Side Public License (SSPL) which restricts certain uses of the software, particularly for providing it as a managed service. In contrast, OpenSearch is licensed under the Apache 2.0 License, ensuring it remains free and open-source, encouraging broader community contributions and usage without commercial restrictions.
2. Governance and Community
OpenSearch is governed by a broad community of contributors now led by the Linux Foundation (and originally AWS), with an open and transparent decision-making process. This model promotes collaborative development and inclusivity. Elasticsearch, on the other hand, is developed by Elastic NV, with a more centralized governance model which can limit external contributions and influence.
3. Features and Plugins
Since the fork, OpenSearch has developed features and plugins that differentiate it from Elasticsearch. Notable additions include security plugins with fine-grained access controls, alerting capabilities, and Data Prepper for enhanced data ingestion.
4. Compatibility and Ecosystem
OpenSearch maintains compatibility with the last open source version of Elasticsearch (7.10), ensuring that existing tools and integrations continue to function. However, as both projects evolve, this compatibility may diverge. OpenSearch also emphasizes backward compatibility within its own updates, reducing the risk of breaking changes for users.
5. Performance and Scalability
Both OpenSearch and Elasticsearch are designed for high performance and scalability. However, OpenSearch focuses on optimizing performance for large-scale deployments, particularly in AWS environments. It includes enhancements for better resource management and efficiency, catering to enterprise-level needs.
6. Support and Commercial Offerings
AWS provides commercial support for OpenSearch through its managed service, Amazon OpenSearch Service. This includes professional support, SLAs, and additional enterprise features. Elastic offers similar services for Elasticsearch through its Elastic Cloud offering, but with different pricing and support models.
Learn more in our detailed guide to OpenSearch vs Elasticsearch (coming soon)
How OpenSearch works: Architecture and components
OpenSearch is built on a distributed, scalable architecture that ensures high availability and fault tolerance. Its key components include clusters, nodes, indices, and shards.
Clusters
An OpenSearch cluster is a collection of one or more nodes that work together to store and search data. Clusters provide redundancy and load balancing, ensuring that the failure of a single node does not compromise the system’s overall performance. Each cluster has a unique identifier, known as the cluster name, which helps in managing and connecting to the cluster.
Nodes
Nodes are individual servers that make up a cluster. Each node stores data and participates in the indexing and search processes. There are different types of nodes in OpenSearch:
- Master nodes: Responsible for managing the cluster’s metadata and state, including creating and deleting indices and tracking node availability.
- Data nodes: Store data and perform data-related operations like indexing and searching.
- Client nodes: Act as load balancers that handle search requests and distribute them to the appropriate data nodes. They do not store data themselves.
Indices
Indices are logical namespaces that hold related documents in OpenSearch. Each index can be thought of as a database in a relational database management system (RDBMS). Indices contain multiple types, which are collections of documents sharing a common schema.
Shards
Shards are the fundamental units of storage in OpenSearch. Each index is divided into smaller units called shards, which can be distributed across multiple nodes. Sharding allows OpenSearch to handle large volumes of data efficiently by distributing the load across the cluster.
There are two types of shards:
- Primary shards: The original shards where data is initially written.
- Replica shards: Copies of primary shards that provide redundancy and increase search performance by allowing read operations to be distributed.
Indexing and Searching
OpenSearch uses an indexing mechanism that allows data to be quickly ingested and made searchable. When a document is indexed, it is parsed and stored in the appropriate index and shard. The indexing process includes tokenization, where text is broken down into searchable terms, and various analyzers can be applied to process and store these terms efficiently.
Searching in OpenSearch is efficient due to its distributed nature. Queries are distributed across the relevant shards, and results are aggregated and returned to the user. OpenSearch supports various types of queries, including term queries, range queries, and full-text searches.
Security and Monitoring
OpenSearch includes security features such as fine-grained access controls, encryption, and auditing. Users can define detailed permissions to control access to indices and documents, ensuring data security. Additionally, OpenSearch provides monitoring and alerting capabilities.
Learn more in our detailed guide to OpenSearch architecture
Tips from the expert
Kassian Wren
Open Source Technology Evangelist
In my experience, here are tips that can help you better utilize OpenSearch:
- Leverage custom analyzers: Create custom analyzers to improve the search relevancy based on your specific data and use cases. Use token filters and character filters to fine-tune the search behavior.
- Implement index lifecycle management: Design and implement index lifecycle policies to manage the size and performance of your indices. Automate index rollover, deletion, and other maintenance tasks to optimize storage and performance.
- Optimize shard allocation: Balance the number of primary and replica shards based on your query load and data redundancy needs. Avoid over-sharding to reduce overhead and improve search performance.
- Utilize bulk indexing: Use the bulk API for indexing large datasets. This approach minimizes the overhead of individual indexing requests and significantly improves indexing speed.
- Monitor cluster health with custom dashboards: Create custom dashboards in OpenSearch Dashboards to monitor critical metrics such as indexing rate, query latency, and resource usage. This helps in proactively managing cluster health.
- Implement custom PKI for SSL/TLS: For enhanced security in OpenSearch, utilize your own public key infrastructure (PKI) to set up SSL/TLS. This approach, while requiring initial effort, provides flexibility and ensures a more secure and efficient encryption setup for both node-to-node and REST-layer communications
What is Amazon OpenSearch Service?
Amazon OpenSearch Service is a managed service provided by AWS that simplifies deploying, operating, and scaling OpenSearch clusters in the cloud. It eliminates the complexity of managing infrastructure, allowing users to focus on their core applications and data analytics.
The service offers automated provisioning, software patching, backup, recovery, and monitoring, ensuring that OpenSearch clusters are secure and performant. Users can scale their clusters up or down based on demand, taking advantage of AWS’s infrastructure.
Amazon OpenSearch Service also integrates with other AWS services, such as AWS Lambda, Amazon Kinesis, and Amazon CloudWatch, enabling data ingestion, processing, and monitoring workflows. Additionally, it supports features like anomaly detection, alerting, and machine learning integration to enhance data analysis and insights.
What is Amazon OpenSearch Serverless?
Amazon OpenSearch Serverless is a serverless option within the Amazon OpenSearch Service that allows users to run search and analytics workloads without managing any servers. This model abstracts the underlying infrastructure, providing automatic scaling and high availability without the need for manual configuration.
In a serverless setup, users define their data sources and search requirements, and the service automatically allocates resources to meet performance and capacity needs. This approach simplifies operations, reduces costs by charging only for actual usage, and eliminates the overhead of provisioning and maintaining clusters.
Amazon OpenSearch Serverless is ideal for dynamic or unpredictable workloads where traffic patterns can vary significantly. It ensures consistent performance by dynamically adjusting resources in real-time, providing a hassle-free experience for managing search and analytics applications.
Tutorial: Getting started with OpenSearch
In this tutorial, you will learn how to set up and run an OpenSearch cluster using Docker. This guide will take you through the necessary steps, from preparing your environment to accessing OpenSearch Dashboards.
Prerequisites
Before starting, ensure you have Docker and Docker Compose installed on your machine. You can download and install them from their respective websites.
Step 1: Disable Memory Paging and Swapping
To improve performance, you should disable memory paging and swapping on your host machine. Follow these commands:
Disable memory swapping:
1 |
sudo swapoff -a |
Edit the sysctl configuration file to set the maximum map count:
1 |
sudo vi /etc/sysctl.conf |
Add the following line to the file:
1 |
vm.max_map_count=262144 |
Reload the kernel parameters:
1 |
sudo sysctl -p |
Step 2: Download the Docker Compose File
You will need a Docker Compose file to define and create the containers in your cluster. Download the sample Compose file provided by the OpenSearch Project:
Using curl:
1 |
curl -O https://raw.githubusercontent.com/opensearch-project/documentation-website/2.15/assets/examples/docker-compose.yml |
Using wget:
1 |
wget https://raw.githubusercontent.com/opensearch-project/documentation-website/2.15/assets/examples/docker-compose.yml |
Step 3: Start Your OpenSearch Cluster
Navigate to the directory containing the downloaded docker-compose.yml file. Set up a custom admin password by editing the docker-compose.yml file and then start the cluster:
Open the docker-compose.yml file and set the admin password:
1 2 |
environment: - OPENSEARCH_INITIAL_ADMIN_PASSWORD= |
Create and start the cluster as a background process:
1 |
docker-compose up -d |
Step 4: Verify the Cluster
To confirm that the containers are running, use the following command:
1 |
docker-compose ps |
You should see output similar to this:
1 2 3 4 |
Name Command State Ports ---------------------------------------------------------------------- opensearch-node1 /usr/local/bin/docker-entr ... Up 0.0.0.0:9200->9200/tcp,0.0.0.0:9600->9600/tcp opensearch-dashboards /usr/local/bin/dumb-init -- / ... Up 0.0.0.0:5601->5601/tcp |
Step 5: Query the OpenSearch REST API
Verify that the service is running by querying the OpenSearch REST API. Use the -k flag to disable hostname checking and the -u flag to pass the default username and password:
1 |
curl https://localhost:9200 -ku admin: |
The response should confirm the installation was successful:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
{ "name" : "opensearch-node1", "cluster_name" : "docker-cluster", "cluster_uuid" : "XXXXXXXXXXXXXX", "version" : { "number" : "1.2.3", "build_type" : "tar", "build_hash" : "XXXXXXXXXXXXXXXXXXX", "build_date" : "2021-XX-XXTXX:XX:XX.XXXXXXZ", "build_snapshot" : false, "lucene_version" : "8.8.2", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "The OpenSearch Project: https://opensearch.org/" } |
Step 6: Access OpenSearch Dashboards
Open a web browser and go to http://localhost:5601
. Log in using the default username admin
and the password you set in the docker-compose.yml
file.
By following these steps, you will have a fully operational OpenSearch cluster running on your machine, ready for you to explore and utilize its search and analytics capabilities.
Learn more in our detailed guide to OpenSearch tutorial (coming soon)
Empowering organizations with comprehensive support for OpenSearch
At Instaclustr, our mission is to empower organizations with the most comprehensive support for OpenSearch. We believe that successful implementation of this open-source search and analytics engine lies at the intersection of world-class managed services and expert assistance.
- Managed Services: We take over the management of your OpenSearch clusters’ underlying infrastructure to ensure high availability, scalability, and security. This means that you can dedicate your resources to your core business objectives instead of infrastructure management.
- Expert Assistance: From cluster configuration to performance tuning, our team of experienced engineers is ready to help. We are well-versed in OpenSearch and can provide valuable insights and recommendations to optimize your clusters, whether it’s fine-tuning query performance, optimizing index settings, or resolving stability problems.
- 24×7 Monitoring and Support: With round-the-clock monitoring and support, we detect and address any potential issues promptly, minimizing downtime and ensuring smooth operation of your OpenSearch clusters.
Experience the Instaclustr difference today. Schedule a free consultation with our OpenSearch experts and let us help you optimize your OpenSearch environment.
For more information please see: