What is OpenSearch?

OpenSearch is an open source search and analytics suite to derive insights from vast amounts of data. As a fork of Elasticsearch and Kibana, it offers many similar features with added transparency and community-driven input.

Initially developed by Amazon Web Services (AWS), OpenSearch has become a scalable search engine platform for diverse use cases including log analytics, full-text searches, and monitoring large volumes of data across various domains.

OpenSearch is compatible with the Elasticsearch API, making the transition easy for users familiar with Elasticsearch. OpenSearch supports real-time or near-real-time data processing and analytics. It also incorporates data visualization tools, security functionalities, and the ability to handle complex queries and aggregations.

How OpenSearch works

OpenSearch operates as a distributed system, leveraging a cluster-based architecture to handle large datasets. The core functionality revolves around its ability to index, search, and analyze data using a combination of nodes, indices, and shards:

  • Cluster and nodes: An OpenSearch cluster is made up of one or more nodes, where each node is an instance of OpenSearch. Nodes collaborate to manage data and perform tasks like indexing and query execution. The cluster assigns roles to nodes, such as master nodes for managing cluster state and data nodes for storing and processing data.
  • Indices and shards: Data in OpenSearch is organized into indices, which are logical collections of documents. Each index is divided into shards to enable parallel processing. Shards are distributed across nodes, ensuring scalability and fault tolerance. By using replicas—copies of primary shards—OpenSearch provides high availability and resilience.
  • Indexing data: When data is ingested into OpenSearch, it is stored in a structured format using an inverted index, which optimizes search operations. OpenSearch supports multiple ingestion methods, such as APIs, log pipelines, and connectors to external sources like databases.
  • Query execution: OpenSearch supports a query DSL (domain-specific language) for performing full-text searches, aggregations, and filtering. Queries are distributed across the cluster, and results are aggregated before being returned to the user. This parallelized approach ensures high performance, even for complex queries.
  • Data visualization and monitoring: OpenSearch Dashboards provide tools for creating interactive visualizations and dashboards. These tools allow users to explore their data, identify patterns, and monitor system metrics in real time.
    Security features: OpenSearch includes built-in security capabilities like user authentication, access control, and data encryption. These features are critical for protecting sensitive information and ensuring compliance with industry standards.

Tutorial 1: Installing OpenSearch using Helm

This tutorial provides a step-by-step guide for deploying OpenSearch in a Kubernetes cluster using Helm, a package manager for Kubernetes. Helm simplifies the deployment and management of OpenSearch by using predefined configurations and templates.

Prerequisites

Before you begin, ensure the following:

  • A Kubernetes cluster is set up and running with at least 8 GiB of memory. A smaller allocation, such as 4 GiB, may result in deployment failure.
  • Helm is already installed on your system. Refer to the Helm documentation for installation steps.

Step 1: Add the OpenSearch Helm repository

Start by adding the OpenSearch Helm repository to your Helm configuration:

OpenSearch tutorial screenshot 1

Update the list of available charts from all repositories:

OpenSearch tutorial screenshot 2

Step 2: Search for OpenSearch Helm charts

You can verify the availability of OpenSearch Helm charts by running:

OpenSearch tutorial screenshot 3

The command will return a list of available charts:

Step 3: Deploy OpenSearch

To deploy OpenSearch with the default configuration, execute the following command:

OpenSearch tutorial screenshot 4

Step 4: Customize the deployment (optional)

For custom configurations, you can create a customvalues.yaml file with your desired settings. For example, to set a custom admin password for OpenSearch versions 2.12 or later, include the following in your values.yaml file:

Deploy OpenSearch with the custom configurations:

Step 5: Verify the deployment

Check the status of the deployed pods:

Sample output:

OpenSearch tutorial screenshot 5

Step 6: Access the OpenSearch shell

To interact directly with the OpenSearch cluster, use the following command:

OpenSearch tutorial screenshot 6

Step 7: Test OpenSearch

Finally, test that OpenSearch is running by sending an API request to the cluster:

Sample response:

OpenSearch tutorial screenshot 7

By following these steps, you should have a fully functioning OpenSearch deployment in your Kubernetes environment.

Tips from the expert

Kassian Wren

Kassian Wren

Open Source Technology Evangelist

Kassian Wren is an Open Source Technology Evangelist specializing in OpenSearch. They are known for their expertise in developing and promoting open-source technologies, and have contributed significantly to the OpenSearch community through talks, events, and educational content

In my experience, here are tips that can help you better utilize and optimize OpenSearch for search, analytics, and large-scale deployments:

  • Fine-tune shard allocation for better performance: Avoid oversharding by carefully estimating the size of your data and choosing an optimal number of primary and replica shards. Oversharding leads to increased resource usage, while undersharding can limit performance. Use _cat/shards to monitor shard distribution.
  • Optimize the inverted index with proper mappings: By default, OpenSearch creates an inverted index for all fields, which can increase storage costs and slow down searches. Use explicit mappings to mark fields as keyword, text, or disabled to suit your queries and data retrieval needs.
  • Use ILM (Index Lifecycle Management) for data retention: Implement ILM policies to automate the management of indices based on their lifecycle (hot, warm, cold, or delete phases). This is crucial for log analytics where old data becomes less relevant and reduces the need for manual housekeeping.
  • Leverage Painless scripting for advanced query logic: Use OpenSearch’s scripting capabilities (Painless) for custom scoring, data transformations, or filtering logic that cannot be achieved using standard queries. Be cautious about script execution overhead to avoid performance bottlenecks.
  • Cache query results for repetitive searches: OpenSearch provides query caching for frequent, identical queries. Use request_cache and ensure you structure queries efficiently to take advantage of caching, especially in dashboards with repetitive aggregations.

Tutorial 2: Creating and searching for documents in Amazon OpenSearch Service

Set Up Amazon OpenSearch Service

To use Amazon OpenSearch Service APIs, the AWS CLI must be installed and configured. While not required for the web console, the CLI simplifies interaction with the service for scripting and automation.

  • Install the AWS CLI: Follow the AWS CLI installation guide to set up the CLI on your system.
  • Configure the AWS CLI: Use the aws configure command to securely set your access keys, preferred AWS region, and output format:

    Example configuration for a named profile:

  • Verify the setup: Run a simple command to confirm:

    OpenSearch tutorial screenshot 8

Add a document to the index

You can add documents using tools like cURL, Postman, or the OpenSearch Dashboards developer console.

  1. Access OpenSearch Dashboards: Navigate to your OpenSearch Dashboards URL (e.g., https:///_dashboards/) and log in.
  2. Add a document: Use the PUT command to create an index and add a document:
  3. Response:

    OpenSearch tutorial screenshot 9

Create and automatically generated ID

OpenSearch can generate IDs for documents if not explicitly provided.

  1. Use POST to add a document:
  2. Response:

    OpenSearch tutorial screenshot 10

Update a document using the POST command

To update an existing document, use the document’s ID in a POST request.

  1. Create a document:
  2. Update the document:
  3. Response:

    OpenSearch tutorial screenshot 11

Perform bulk actions

The _bulk API allows multiple actions in a single request, reducing overhead.

Example bulk request:

OpenSearch tutorial screenshot 12

Each action consists of two JSON lines: metadata and data.

Search for documents

OpenSearch supports basic and advanced search queries.

Basic search:

Searches for documents where the name field starts with “o”.

OpenSearch tutorial screenshot 13

Advanced search:

OpenSearch tutorial screenshot 14

Sorted search:

Recreate the index with sortable fields:

OpenSearch tutorial screenshot 15

  • Perform a sorted query:

    OpenSearch tutorial screenshot 16

 

Instaclustr for OpenSearch: Unlocking the power of scalable search and analytics

Instaclustr for OpenSearch offers managed services that take the complexities of deploying and maintaining this robust platform off your plate. With expertise in open source technologies, Instaclustr ensures that your OpenSearch clusters are optimized, scalable, and always available, allowing you to focus on what truly matters: building great products and improving your customer experience.

Instaclustr for OpenSearch

When it comes to using OpenSearch, Instaclustr provides several key benefits:

  1. Fully managed OpenSearch
    Get end-to-end management for your OpenSearch deployment, including setup, scaling, monitoring, and routine maintenance. We handle the grunt work so your team can stay focused on scaling your business.
  2. High availability & scalability
    Achieve high availability with multi-node configurations that provide failover capabilities. Whether you’re running a small application or scaling to enterprise-grade workloads, Instaclustr helps your OpenSearch clusters grow seamlessly with your business needs.
  3. Open source expertise
    Instaclustr champions open source technology, eliminating vendor lock-in and offering transparent pricing. This commitment ensures that your OpenSearch deployment is always community-driven, independently audited, and aligned with the latest developments in the ecosystem.
  4. World-class support
    Receive 24/7 support from seasoned engineers who specialize in OpenSearch. No chatbots; just real people ready to solve real problems and help you make the most of your investment.
  5. Monitoring and optimization
    Instaclustr doesn’t just manage your OpenSearch cluster—we actively monitor and optimize its performance. Advanced analytics and proactive alerts mean you’re always one step ahead when it comes to performance issues.

For more information see: