Ready for another round of OpenSearch lessons? Let’s dive into OpenSearch 102!
You’ve done a great job with OpenSearch 101. You’ve set up an OpenSearch cluster, connected to it, indexed your data, and even initiated a lexical search. Remember when you searched for the term ‘manager’ in the title field? OpenSearch was only looking for that exact word. This, my friends, is a lexical term search.
No wiggle room for misspellings! This is rapid fire search mechanism that seeks out exact matches of the search term in a tokenized version of the search text, leaving no room for errors. Term search has a wide range of use cases, many of which you use in your day-to-day life on the web, such as autocomplete and type-ahead functionality. There are, however, some ways to integrate wiggle room into your term searches; some allow wildcard or regex matching, and some implement algorithms to give a “fuzzy” search that can help with misspellings, for instance.
However, if you want things that not only exactly match your search terms, but are similar, then welcome to the world of lexical full-text search. A tad more lenient, full text tokenizes not only the search text but the search term and returns any relevant intersections of the two groups of tokens. Because these intersections have different levels of overlap, a confidence score is returned, which is the level of confidence that this overlap is what you’re looking for, based only on the tokenized texts it has.
Now that we’ve covered the vocabulary, let’s dig into some code. Open the OpenSearch-101 code folder that you either created from the 101 post or get it from the GitHub.
First, you’re going to need a few things to get started with semantic search. Dummy text that you can perform more complex searches on would be a good starting point. Create a placeholder_text.txt file in your code directory and paste the text from here into the file. Now you have many (well, 8) paragraphs of text to index and search. It is not a terribly interesting read for us humans, but that’s not what we’re here for.
Now you’re going to clean up the code to match our new use case.
First, at the top of the file with the other require statements, you’ll want to require in the ‘fs’ module, which allows us to read from the file system. It should look like this:
1 2 3 4 5 6 |
'use strict' require('fs') require('dotenv').config() var host = process.env.OPENSEARCH_HOST |
Next, use find and replace to replace ‘devrel-team’ with ‘opensearch-102’. This will change the index name, etc. to opensearch-102
, as you can see in this example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
const start = async () => { try { // Check the cluster health const clusterHealthResponse = await client.cluster.health({}) printResponse('Get Cluster Health', clusterHealthResponse) // Check if the 'opensearch-102' index exists const indexExistsResponse = await client.indices.exists({ index: 'opensearch-102' }) if (indexExistsResponse.statusCode === 200) { // Delete the 'opensearch-102' index if it exists const deleteIndexResponse = await client.indices.delete({ index: 'opensearch-102' }) printResponse('Delete existing `opensearch-102` Index', deleteIndexResponse) } catch (error) { console.error('Error:', error.message) } } } |
Now, you’re going to remove the cluster health call; it’s not necessary for this demo. That’s these three lines:
1 2 3 |
// Check the cluster health const clusterHealthResponse = await client.cluster.health({}) printResponse('Get Cluster Health', clusterHealthResponse) |
The code remains the same until you get to the start() function. Before you run this function, you want to pull in the placeholder text with the ‘fs’ module you included earlier. To do this, we’ll use fs.readFileSync()
and then use toString()
and split()
to end up with an array of paragraphs. This code looks like:
1 2 |
// pull the dummy text into an array that we can pass to the OpenSearch cluster let placeholderText = fs.readFileSync('./placeholder_text.txt').toString().split('\n') |
Now that you have the data, it’s time to index it into your OpenSearch cluster. First, you should set the mapping for the new data. Change the createIndexResponse()
function to look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
const start = async () => { try { // Create the `opensearch-102` index const createIndexResponse = await client.indices.create({ index: 'opensearch-102', body: { mappings: { properties: { lineNumber: { type: 'integer'}, text: { type: 'text' } }, }, }, }); } catch (error) { console.log(error.message); } } |
So you’ll be passing in a line number and a line of text for each document you index.
Now that the opensearch-102
index is created, you can index your placeholder data. Go to the indexText()
function and modify it to suit the new placeholder data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
const indexText = async () => { await (async () => { for (let i = 0; i < placeholderText.length; i++) { let response = await client.index({ index: 'opensearch-102', id: i, body: { lineNumber: i+1, text: placeholderText[i] }, }); printResponse(`Added index ID ${i}:`, response) } })() } |
The next change comes in the searchText() function. You want to print more than ‘element._source’ in order to see the relevance scores for your full-text search. After the change the function should look as follows:
1 2 3 4 5 6 7 8 9 10 |
const searchText = async (query) => { const response = await client.search({ index: 'opensearch-102', body: query, }); console.log('\nSearch Results:'); response.body.hits.hits.forEach((element) => { console.log(element); }); } |
Note the change from ‘term’ to ‘match’; this tells OpenSearch that this is not a term search, but a full-text search.
That’s all the code changes; if you want to make sure everything matches, the completed code is in this gist. go ahead and run your program with:
1 |
node index.js |
You should see many added index results like the following:
Then you should start to see your search results:
As you can see, each result comes with a relevance score that tells you how close to the original term the result is.
Try changing the search phrase in the code and see what you can come up with!
In the end, when it comes to lexical searches, you have two major types: term and full-text. While lexical is very literal, looking only for the exact matches of the search term, full-text search is a bit more flexible, tokenizing both search term and text, adding relevance scores to its results to show how close the match is. Also, Full-text search is built right into OpenSearch, and creating a client that makes semantic search calls by changing just a few lines of code.
This wraps up OpenSearch 102! In the next part of our series, OpenSearch 103, we’ll get into dashboards and querying your OpenSearch data. Until then, feel free to spin up your very own OpenSearch cluster with our free trial.
Happy coding!