NoSQL tutorial: Taking your first steps in the NoSQL world

What is NoSQL?

NoSQL, an acronym for “Not Only SQL,” refers to a broad class of database management systems that diverge from traditional relational databases’ structured tabular format. These systems handle unstructured or semi-structured data, making them suitable for big data and real-time web applications.

NoSQL databases offer flexibility with dynamic schema design, enabling quick adaptation to changing data requirements without restructuring. Unlike relational databases, which require vertical scaling, NoSQL databases often excel in distributed environments and horizontal scaling.

This scaling ability makes them suited for modern applications demanding high availability, fault tolerance, and low latency. The diverse types of NoSQL databases address different storage and retrieval needs, such as document storage, key-value pairs, wide columns, and graphs.

This is part of a series of articles about NoSQL database (coming soon)

Key types of NoSQL databases

There are several ways to arrange a NoSQL database.

Document-oriented databases

Document-oriented databases store data as collections of documents rather than traditional tables. Each document is typically a JSON or BSON data structure, capable of containing nested fields and arrays. This allows for a more natural data representation, often mirroring complex real-world structures without the need for table joins.

In document-oriented databases, each document can be indexed for fast querying, supporting complex data retrieval processes. They are well-suited for applications needing flexible schemas, such as content management systems and e-commerce platforms. MongoDB and CouchDB are examples of popular document-based database systems.

Key-value databases

Key-value databases operate on the simplest NoSQL model, where data is stored as a collection of unique keys and corresponding values. This model is useful for simple data retrieval operations and scales across distributed systems. Examples like Redis and DynamoDB are often used in caching and session management scenarios due to their speed and scalability.

The design of key-value databases enables fast read and write operations, suitable for applications requiring quick data access. However, they generally lack complex query capabilities, which restricts their use cases to scenarios where querying is simple.

Wide-column stores

Wide-column stores, such as Apache Cassandra and HBase, store data in rows and columns like traditional databases but offer a dynamic column architecture. Each row can have a large number of columns stored together, which allows for high data density and efficient storage for time-series and big data applications.

These databases handle massive volumes of data across many servers efficiently, making them suitable for applications that require high write throughput, like real-time data analytics. They make it easier to store and access large datasets without predefined schemas.

Graph databases

Graph databases store data in nodes, edges, and properties, supporting complex relationships inherent in social networks, recommendation engines, and fraud detection systems. They represent entities as nodes and relationships as edges, linking data in a way that emphasizes connections. Neo4j and Amazon Neptune are leading examples of graph databases.

These databases can rapidly navigate and analyze interconnected data, making them useful for applications with deep relationship queries, such as route optimization and network infrastructure mapping.

Multi-model databases

Multi-model databases integrate various NoSQL database types under a single platform, allowing users to choose the best model for their use case without needing multiple databases. They can simultaneously handle document, graph, key-value, and wide-column data. Examples include ArangoDB and OrientDB.

These databases offer the advantage of a unified infrastructure while supporting multiple data models, reducing the complexity involved in maintaining different databases. They are beneficial in scenarios where a single application requires the characteristics of multiple data representations.

Learn more in our detailed guide to data architecture principles

Tips from the expert

Ritam Das

Solution Architect

Ritam Das is a trusted advisor with a proven track record in translating complex business problems into practical technology solutions, specializing in cloud computing and big data analytics.

In my experience, here are tips that can help you better leverage NoSQL databases:

Avoid over-indexing in NoSQL: While indexing improves query performance, it can significantly slow down write operations in NoSQL databases like MongoDB. Ensure you only create indices on fields that are queried frequently, and monitor their impact on performance over time.
Optimize schema design for query patterns: NoSQL databases encourage denormalized schema designs, so tailor your document or key-value structure to match your most common query patterns. Avoid unnecessary joins by embedding related data directly, but be mindful of document size limits, especially in MongoDB.
Implement schema versioning for long-term flexibility: While NoSQL databases provide schema flexibility, managing evolving data structures can become complex. Maintain schema versioning within your documents or key-value stores to ensure backward compatibility as your data model evolves.
Leverage CQRS (Command Query Responsibility Segregation) patterns: For read-heavy applications, consider separating your command and query operations across different NoSQL databases optimized for each task. This can enhance both write and read performance, especially when you need fast, distributed data writes and complex read queries.
Use in-memory caching alongside NoSQL: Integrate an in-memory cache (like Redis or Memcached) to complement your NoSQL database for frequent reads. This can reduce the load on the primary database and accelerate response times, particularly for read-heavy workloads.

NoSQL database tutorial: Getting started with Apache Cassandra

Apache Cassandra is a highly scalable, distributed NoSQL database designed for handling large volumes of data across multiple nodes. It offers high availability, fault tolerance, and decentralized architecture, making it suitable for real-time analytics and big data applications. In this tutorial, you’ll set up Cassandra using Docker, create a simple keyspace and table, and execute basic queries using the Cassandra Query Language (CQL).

These instructions were adapted from the Cassandra documentation.

Step 1: Pull the Cassandra Docker image

To start, ensure you have Docker installed. Then, pull the latest Cassandra image:

docker pull cassandra:latest

1	docker pull cassandra:latest

Step 2: Start a Cassandra container

Create a dedicated Docker network for Cassandra:

docker network create cassandra

1	docker network create cassandra

Then, run a Cassandra container within this network:

docker run --rm -d --name cassandra --hostname cassandra --network cassandra cassandra

1	docker run --rm -d --name cassandra --hostname cassandra --network cassandra cassandra

Step 3: Create a keyspace and table

Cassandra uses keyspaces as the top-level container for data. You’ll create a keyspace and a shopping_cart table using CQL.

First, save the following CQL script in a file named data.cql:

-- Create a keyspace
CREATE KEYSPACE IF NOT EXISTS store 
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' };

-- Create a table
CREATE TABLE IF NOT EXISTS store.shopping_cart (
    userid text PRIMARY KEY,
    item_count int,
    last_update_timestamp timestamp
);

-- Insert some data
INSERT INTO store.shopping_cart (userid, item_count, last_update_timestamp) 
VALUES ('9876', 2, toTimeStamp(now()));

INSERT INTO store.shopping_cart (userid, item_count, last_update_timestamp) 
VALUES ('1234', 5, toTimeStamp(now()));

-- Create a keyspace

CREATE KEYSPACE IF NOT EXISTS store

WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' };

-- Create a table

CREATE TABLE IF NOT EXISTS store.shopping_cart (

userid text PRIMARY KEY,

item_count int,

last_update_timestamp timestamp

);

-- Insert some data

INSERT INTO store.shopping_cart (userid, item_count, last_update_timestamp)

VALUES ('9876', 2, toTimeStamp(now()));

INSERT INTO store.shopping_cart (userid, item_count, last_update_timestamp)

VALUES ('1234', 5, toTimeStamp(now()));

Step 4: Load the CQL Script into Cassandra

Use the cqlsh (Cassandra Query Language Shell) to execute the script:

docker run --rm --network cassandra -v "$(pwd)/data.cql:/scripts/data.cql" \
-e CQLSH_HOST=cassandra -e CQLSH_PORT=9042 -e CQLVERSION=3.4.6 nuvo/docker-cqlsh

1 2	docker run --rm --network cassandra -v "$(pwd)/data.cql:/scripts/data.cql" \ -e CQLSH_HOST=cassandra -e CQLSH_PORT=9042 -e CQLVERSION=3.4.6 nuvo/docker-cqlsh

If the Cassandra container is still initializing, this command may fail. Wait a few seconds and retry.

Step 5: Interact with Cassandra using CQLSH

You can start an interactive CQL shell with:

docker run --rm -it --network cassandra nuvo/docker-cqlsh cqlsh cassandra 9042 --cqlversion='3.4.5'

1	docker run --rm -it --network cassandra nuvo/docker-cqlsh cqlsh cassandra 9042 --cqlversion='3.4.5'

Once connected, you’ll see a prompt like:

Connected to Test Cluster at cassandra:9042.
[cqlsh 5.0.1 | Cassandra 4.0.4 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
cqlsh>

Connected to Test Cluster at cassandra:9042.

[cqlsh 5.0.1 | Cassandra 4.0.4 | CQL spec 3.4.5 | Native protocol v5]

Use HELP for help.

cqlsh>

Step 6: Read data from the table

Run the following query in CQLSH to retrieve stored data:

SELECT * FROM store.shopping_cart;

1	SELECT * FROM store.shopping_cart;

Step 7: Insert more data

You can insert additional records into the shopping_cart table using:

INSERT INTO store.shopping_cart (userid, item_count) VALUES ('4567', 20);

1	INSERT INTO store.shopping_cart (userid, item_count) VALUES ('4567', 20);

Step 8: Clean up

When finished, stop the Cassandra container and remove the network:

docker kill cassandra
docker network rm cassandra

1 2	docker kill cassandra docker network rm cassandra

This tutorial covered the basics of setting up Apache Cassandra with Docker, creating a keyspace and table, and executing simple queries. Cassandra’s distributed architecture makes it a powerful choice for applications requiring high availability and scalability.

NetApp Instaclustr: Empowering NoSQL database management

Businesses are dealing with vast amounts of information that require efficient and scalable storage solutions. Traditional relational databases may not always be the best fit for handling the velocity, volume, and variety of modern data. As a result, NoSQL databases have emerged as a popular alternative, offering flexibility, scalability, and high performance.

NetApp Instaclustr is a managed service provider that specializes in open source technologies, offering managed solutions for Apache Cassandra, Apache Kafka®, OpenSearch®, PostgreSQL® and more. As a leader in managed open source, Instaclustr provides a reliable and scalable infrastructure for deploying, managing, and scaling NoSQL databases in the cloud. We simplifies the complexities of managing NoSQL databases, allowing organizations to focus on their core business objectives.

Some of the key benefits of Instaclustr for NoSQL database management include:

Simplified deployment and management of NoSQL databases, providing a hassle-free experience for organizations. We’ll take care of the underlying infrastructure, ensuring high availability, data replication, and disaster recovery, while allowing developers to focus on application development.
Seamless horizontal scalability. Instaclustr leverages NoSQL’s renowned horizontal scalability and enables organizations to scale their databases up or down based on demand, ensuring optimal performance even during peak workloads.
Robust security features to protect sensitive data. We provide encryption at rest and in transit, role-based access control, and integration with various identity providers. Additionally, we help organizations meet compliance requirements by providing audit logs and facilitating data governance.
Comprehensive monitoring and support services, allowing organizations to gain insights into their database performance and health. We offer proactive monitoring, alerting, and troubleshooting capabilities, ensuring minimal downtime and quick issue resolution.
Optimized costs by providing flexible pricing models. With pay-as-you-go options, businesses can easily scale resources based on their needs and pay only for what they use. This eliminates the need for upfront infrastructure investments and provides cost predictability.

Instaclustr use cases:

NoSQL databases, such as Apache Cassandra, are well-suited for real-time analytics, enabling organizations to process and analyze large volumes of data in real-time. Instaclustr provides a reliable infrastructure for deploying and managing Cassandra clusters, empowering businesses to derive valuable insights from their data.
With the proliferation of IoT devices, organizations face the challenge of managing and processing large volumes of data generated by these devices. NoSQL databases, coupled with Instaclustr’s scalability and performance, offer an ideal solution for efficiently storing and processing IoT data.
NoSQL databases excel at handling high-volume transactional workloads. We ensure the availability, scalability, and performance required to support mission-critical applications that demand rapid data processing and low-latency responses.

Instaclustr offers powerful and reliable managed service platform for deploying and managing NoSQL databases. By leveraging the scalability, flexibility, and performance of NoSQL databases, organizations can handle the challenges posed by modern data requirements. Whether it’s real-time analytics, IoT data management, or high-volume transactional systems, Instaclustr empowers businesses to harness the full potential of NoSQL databases, enabling them to drive innovation and achieve their data management goals.

For more information: