20 Advanced MongoDB Techniques with Practical Examples

Published in

Mobile App Circular

17 min readMar 16, 2024

MongoDB is a popular NoSQL document database that offers a flexible and scalable way to store and retrieve data. While MongoDB’s query language and data model are relatively straightforward, there are many advanced techniques and features that can help you unlock the full potential of this powerful database. In this article, we’ll explore 20 advanced MongoDB techniques, each accompanied by practical examples to help you understand and apply them in real-world scenarios.

Aggregation Pipeline

The aggregation pipeline is a powerful feature in MongoDB that allows you to perform complex data transformations and analytics directly within the database. It consists of a series of stages that process documents in a data stream, enabling you to filter, group, sort, and reshape data in various ways.

db.orders.aggregate([
  { $match: { status: "completed" } },
  { $unwind: "$items" },
  {
    $group: {
      _id: "$items.product_id",
      total_quantity: { $sum: "$items.quantity" },
      total_revenue: { $sum: { $multiply: ["$items.quantity", "$items.price"] } }
    }
  },
  { $sort: { total_revenue: -1 } },
  { $limit: 5 }
])

In this example, we use the aggregation pipeline to analyze order data and retrieve the top 5 best-selling products based on total revenue. The pipeline stages include:

$match: Filters the documents to include only completed orders.
$unwind: Deconstructs the items array field, creating a separate document for each item.
$group: Groups the documents by product_id and calculates the total_quantity and total_revenue for each product.
$sort: Sorts the documents by total_revenue in descending order.
$limit: Limits the output to the top 5 documents.

Connect with me:

Linkedin: https://www.linkedin.com/in/suneel-kumar-52164625a/

Indexing

Indexing is a crucial aspect of optimizing query performance in MongoDB. By creating appropriate indexes on relevant fields, you can significantly improve the speed of data retrieval and reduce query execution times.

db.customers.createIndex({ email: 1 }, { unique: true })
db.orders.createIndex({ customer_id: 1, order_date: -1 })

In the first example, we create a unique index on the email field of the customers collection. This index ensures that duplicate email addresses are not allowed and enables efficient lookups and queries based on the email field.

In the second example, we create a compound index on the customer_id and order_date fields of the orders collection. This index can be used to efficiently query orders for a specific customer, sorted by the order date in descending order.

Geospatial Queries

MongoDB provides native support for geospatial data and queries, allowing you to store and query location-based data efficiently. This is particularly useful for applications that deal with maps, location-based services, or any data that involves geographic coordinates.

db.locations.createIndex({ coordinates: "2dsphere" })

db.locations.find({
  coordinates: {
    $nearSphere: {
      $geometry: {
        type: "Point",
        coordinates: [-73.935242, 40.730610]
      },
      $maxDistance: 5000
    }
  }
})

In this example, we first create a 2dsphere index on the coordinates field of the locations collection. This index is optimized for geospatial queries and supports calculations on spherical geometries.

We then perform a geospatial query to find all locations within a 5,000-meter radius of the specified coordinates (longitude: -73.935242, latitude: 40.730610). The $nearSphere operator is used to perform this proximity search, and the $maxDistance option limits the results to locations within the specified distance.

Text Search

MongoDB’s text search capabilities allow you to perform full-text searches on string fields, enabling you to find relevant documents based on keywords or phrases.

db.articles.createIndex({ content: "text" })

db.articles.find(
  { $text: { $search: "mongodb query optimization" } },
  { score: { $meta: "textScore" } }
)
.sort({ score: { $meta: "textScore" } })

In this example, we create a text index on the content field of the articles collection. This index enables efficient full-text searches on the content field.

We then perform a text search query using the $text operator and the $search option to find articles that contain the phrase "mongodb query optimization". The $meta projection operator is used to include the textScore value, which represents the relevance score of each matching document.

Finally, we sort the results based on the textScore value in descending order, placing the most relevant documents first.

Change Streams

Change streams are a powerful feature in MongoDB that allows you to monitor and react to real-time data changes in your collections. This can be particularly useful for building event-driven applications, implementing real-time analytics, or synchronizing data across different systems.

const pipeline = [
  { $match: { "operationType": { $in: ["insert", "update", "replace"] } } }
]

const changeStream = db.orders.watch(pipeline)

changeStream.on("change", next => {
  console.log("Change occurred:", next)
  // Perform actions based on the change event
})

In this example, we create a change stream on the orders collection using the watch method. The $match stage in the pipeline filters the change events to include only insert, update, and replace operations.

We then set up an event listener on the change stream using the on method. Whenever a change event occurs, the callback function is executed, allowing you to perform specific actions based on the change event data.

Change streams can be particularly useful in scenarios such as real-time order tracking, maintaining audit logs, or synchronizing data across distributed systems.

Transactions MongoDB 4.0

introduced support for multi-document transactions, providing a way to ensure data integrity and consistency across multiple operations. Transactions guarantee that all operations within a transaction are treated as a single atomic unit, ensuring that either all operations succeed or none of them are applied.

const session = db.getMongo().startSession()

try {
  session.startTransaction()

  const orderId = db.orders.insertOne({ customer_id: 123, items: [...] }, { session })
  db.inventory.updateMany({ product_id: { $in: [...] } }, { $inc: { quantity: -1 } }, { session })

  session.commitTransaction()
  console.log("Transaction committed successfully. Order ID:", orderId.insertedId)
} catch (error) {
  session.abortTransaction()
  console.error("Transaction aborted due to error:", error)
}

session.endSession()

In this example, we start a new session using db.getMongo().startSession() and initiate a transaction using session.startTransaction().

Within the transaction, we perform two operations: inserting a new order document into the orders collection and updating the inventory quantities for the products in the order.

If both operations succeed, we commit the transaction using session.commitTransaction(). If any error occurs during the transaction, we abort the transaction using session.abortTransaction(), ensuring that no partial data changes are applied.

Finally, we end the session using session.endSession().

Sharding

Sharding is a horizontal scaling technique in MongoDB that allows you to distribute data across multiple shards (partitions) based on a shard key. This enables you to scale out your database horizontally, increasing its capacity and performance as your data grows.

sh.enableSharding("mydb")
sh.shardCollection("mydb.orders", { order_date: 1, _id: 1 })

In this example, we first enable sharding for the mydb database using sh.enableSharding("mydb").

We then shard the orders collection using sh.shardCollection("mydb.orders", { order_date: 1, _id: 1 }). The shard key is defined as a compound index on the order_date and _id fields, with order_date being the primary shard key and _id being the secondary shard key.

With this configuration, MongoDB will distribute the orders collection across multiple shards based on the order_date field, with each shard containing a range of order dates. The _id field is added to the shard key to ensure unique distribution of documents with the same order_date value.

Capped Collections

Capped collections are fixed-size collections in MongoDB that maintain insertion order and support high-throughput operations. They are useful for scenarios such as logging, caching, or maintaining a rolling window of data.

db.createCollection("logs", { capped: true, size: 1000000, max: 1000 })

db.logs.insert({ message: "Log entry 1", timestamp: new Date() })
db.logs.insert({ message: "Log entry 2", timestamp: new Date() })
// ...

In this example, we create a capped collection named logs using db.createCollection("logs", { capped: true, size: 1000000, max: 1000 }). The capped option is set to true to create a capped collection, the size option specifies the maximum size of the collection in bytes (1,000,000 bytes or 1 MB), and the max option sets the maximum number of documents allowed in the collection (1,000 documents).

We can then insert documents into the logs collection using db.logs.insert({ ... }). Once the collection reaches its maximum size or document limit, the oldest documents will be automatically removed to make room for new insertions, maintaining a rolling window of the most recent log entries.

Partial Indexes

Partial indexes in MongoDB allow you to create indexes on a subset of documents based on a specified filter criteria. This can be useful for optimizing queries on specific subsets of data and reducing the overall index size and maintenance overhead.

db.orders.createIndex(
  { status: 1, order_date: -1 },
  { partialFilterExpression: { status: "completed" } }
)

In this example, we create a partial index on the orders collection using db.orders.createIndex({ status: 1, order_date: -1 }, { partialFilterExpression: { status: "completed" } }). The index includes the status and order_date fields, but it is only applied to documents where the status field is equal to "completed".

This partial index can be particularly useful for optimizing queries that frequently filter on the status field, as the index will only contain documents with a "completed" status, reducing the index size and improving query performance.

Collations

Collations in MongoDB provide language-specific string comparison rules, enabling you to handle text data in a culturally relevant manner. This is especially important when dealing with multilingual data or sorting and comparing strings based on specific language rules.

db.createCollection("products", {
  collation: {
    locale: "en_US",
    strength: 2
  }
})

db.products.insert({ name: "Café" })
db.products.insert({ name: "cafe" })

db.products.find().sort({ name: 1 })

In this example, we create a new collection named products with a collation configuration { locale: "en_US", strength: 2 }. The locale option specifies the language rules to be used (in this case, English - United States), and the strength option determines the level of comparison (2 for case-insensitive comparison).

We then insert two documents with different casing for the name field: "Café" and "cafe".

When we execute db.products.find().sort({ name: 1 }), the documents will be sorted in case-insensitive order based on the specified collation rules, ensuring that strings are compared and sorted correctly according to the English language rules.

Schema Validation

While MongoDB is a schemaless database, it provides schema validation capabilities that allow you to enforce data structure and integrity rules on your collections. This can be useful for ensuring data consistency and preventing invalid or malformed data from being inserted into your database.

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "email"],
      properties: {
        name: {
          bsonType: "string",
          description: "must be a string and is required"
        },
        email: {
          bsonType: "string",
          pattern: "@mongodb\.com$",
          description: "must be a string matching the regular expression pattern and is required"
        },
        age: {
          bsonType: "int",
          minimum: 18,
          description: "must be an integer and greater than or equal to 18"
        }
      }
    }
  }
})

In this example, we create a new collection named users with schema validation rules defined using the validator option. The $jsonSchema document specifies the structure and constraints for the documents in the collection.

The schema requires the name and email fields to be present, defines their data types (string), and sets additional constraints like a regular expression pattern for the email field (@mongodb\.com$) and a minimum value for the age field (18).

When inserting or updating documents in the users collection, MongoDB will validate the data against the specified schema rules and reject any documents that violate the constraints.

GridFS

GridFS is a specification in MongoDB for storing and retrieving large files and binary data. It divides files into smaller chunks and stores them across multiple documents, enabling efficient storage and retrieval of large files while maintaining the benefits of MongoDB’s horizontal scaling and replication features.

const bucket = new GridFSBucket(db, { bucketName: "photos" })

const metadata = {
  contentType: "image/jpeg",
  metadata: { description: "Photo of a sunset" }
}

const uploadStream = bucket.openUploadStream("sunset.jpg", metadata)

fs.createReadStream("/path/to/sunset.jpg").pipe(uploadStream)

uploadStream.on("finish", () => {
  console.log("File uploaded successfully!")
})

In this example, we first create a GridFSBucket instance named photos using new GridFSBucket(db, { bucketName: "photos" }). This bucket will be used to store and retrieve files.

We then define metadata for the file we want to upload, including the contentType and any additional metadata properties.

Next, we open an upload stream for the file using bucket.openUploadStream("sunset.jpg", metadata). This creates a writable stream that we can use to upload the file data.

We read the file data from the local file system using fs.createReadStream("/path/to/sunset.jpg") and pipe it into the upload stream using .pipe(uploadStream).

Finally, we listen for the finish event on the upload stream to know when the file upload is complete.

Replica Sets

Replica sets are a way to achieve high availability and redundancy in MongoDB by maintaining multiple copies (replicas) of your data across different servers or machines. In the event of a primary server failure, one of the secondary replicas can automatically be elected as the new primary, ensuring uninterrupted access to your data.

// Configure replica set members
rs.initiate({
  _id: "myReplicaSet",
  members: [
    { _id: 0, host: "host1:27017" },
    { _id: 1, host: "host2:27017" },
    { _id: 2, host: "host3:27017", arbiterOnly: true }
  ]
})

// Check replica set status
rs.status()

In this example, we initialize a new replica set named myReplicaSet using rs.initiate({ ... }). The members array specifies the configuration of the replica set, including the host and port for each member.

In this case, we have three members: host1:27017, host2:27017, and host3:27017. The third member (host3:27017) is configured as an arbiter (arbiterOnly: true), which is a non-data-bearing member that participates in elections but does not store data itself.

After initializing the replica set, we can check its status using rs.status(), which provides information about the current primary, secondary members, and the overall health of the replica set.

Read Preferences

Read preferences in MongoDB allow you to control how queries are distributed and processed across the members of a replica set. This can be useful for optimizing read performance, ensuring data consistency, or distributing read loads across multiple servers.

const client = new MongoClient("mongodb://host1,host2,host3/?replicaSet=myReplicaSet")

// Set read preference for primary only
client.readPreference("primary")

// Set read preference for secondary only
client.readPreference("secondary")

// Set read preference for nearest server
client.readPreference("nearest")

In this example, we create a new MongoDB client instance using new MongoClient("mongodb://host1,host2,host3/?replicaSet=myReplicaSet"). The connection string specifies the hosts of the replica set members, and the replicaSet=myReplicaSet option tells MongoDB to connect to the specified replica set.

We can then set different read preferences using the client.readPreference() method:

client.readPreference("primary") sets the read preference to read from the primary member of the replica set. This ensures that you always read the latest data, but it may increase the load on the primary.
client.readPreference("secondary") sets the read preference to read from a secondary member of the replica set. This can help distribute read traffic across multiple servers and reduce the load on the primary, but the data might be slightly stale.
client.readPreference("nearest") sets the read preference to read from the nearest server, either primary or secondary, based on network latency. This can improve read performance by reducing network overhead.

By setting the appropriate read preference, you can optimize your application’s behavior based on your specific requirements for data consistency, performance, and load distribution.

Elasticsearch Integration

MongoDB provides native integration with Elasticsearch, a popular open-source search and analytics engine. This integration allows you to leverage the powerful search and aggregation capabilities of Elasticsearch while still using MongoDB as your primary data store.

// Enable the Elasticsearch integration
db.adminCommand({ setParameter: 1, internalElasticsearchClusterNodes: "host1:9200,host2:9200" })

// Index a collection
db.products.createIndex({ "$**": "text" }, { weights: { name: 10, description: 5 } })

db.products.find({
  $text: { $search: "mongodb database" }
}, {
  score: { $meta: "textScore" }
}).sort({ score: { $meta: "textScore" } })

In this example, we first enable the Elasticsearch integration in MongoDB using the setParameter command, specifying the Elasticsearch cluster nodes (host1:9200,host2:9200).

We then create a text index on the products collection using db.products.createIndex({ "$**": "text" }, { weights: { name: 10, description: 5 } }). The "$**": "text" syntax creates a text index on all fields, and the weights option assigns higher importance to the name field compared to the description field for text search relevance scoring.

Finally, we perform a text search query using the $text operator, searching for the phrase "mongodb database". The score projection and sort are used to include and order the results based on the textScore relevance score calculated by Elasticsearch.

Online Archive Storage

MongoDB Atlas, the fully-managed cloud service for MongoDB, provides an online archive storage feature that allows you to offload historical data from your operational database to a separate, cost-effective storage layer. This can help reduce storage costs and improve performance for your live database while still allowing you to access and query the archived data when needed.

// Create an online archive for the orders collection
use mydb
db.runCommand({
  createOnlineArchivingOps: "orders",
  startArchivingAfter: new ISODate("2022-01-01T00:00:00Z"),
  stopArchivingAfter: new ISODate("2023-01-01T00:00:00Z")
})

// Query the online archive
db.orders.aggregate([
  { $match: { order_date: { $gte: new ISODate("2022-01-01"), $lt: new ISODate("2023-01-01") } } },
  // ... additional pipeline stages
])

In this example, we create an online archive for the orders collection in the mydb database using the createOnlineArchivingOps command. The startArchivingAfter and stopArchivingAfter options specify the date range for the data to be archived.

Once the online archive is created, you can query the archived data using the same MongoDB queries and aggregation pipeline stages as you would for the live data. The $match stage in the example filters the data based on the order_date field, limiting the query to the archived date range.

Online archive storage can be particularly useful for maintaining historical data for compliance or reporting purposes while keeping your operational database lean and performant.

MongoDB Charts

MongoDB Charts is a fully-managed data visualization service provided by MongoDB Atlas. It allows you to create rich, interactive charts and dashboards directly from your MongoDB data, without the need for a separate data warehouse or business intelligence tool.

// Create a new chart
const datasource = {
  name: "Orders",
  databaseName: "mydb",
  collectionName: "orders",
  queryString: "{ $match: { status: 'completed' } }",
  queryFields: [ "customer_id", "total_amount", "order_date" ]
}

const chart = new Charts.ChartBuilder()
  .withDataSource(datasource)
  .withType("line")
  .withOptions({
    title: "Daily Revenue",
    dateField: "order_date",
    valueField: "total_amount",
    groupByFields: [ "customer_id" ],
    dateRange: {
      startDate: new ISODate("2023-01-01T00:00:00Z"),
      endDate: new ISODate("2023-03-31T23:59:59Z")
    }
  })
  .build()

// Render the chart
chart.render()

In this example, we first define a data source for our chart by specifying the database name, collection name, and a query string to filter and shape the data. We also specify the fields we want to include in the chart data using queryFields.

We then create a new chart using the Charts.ChartBuilder() API, specifying the data source, chart type (line chart), and various chart options. In this case, we set the title, dateField, valueField, groupByFields, and dateRange options to configure the chart's appearance and data transformation.

Finally, we call the build() method to generate the chart object and the render() method to render the chart in the browser or application.

MongoDB Charts provides a wide range of chart types and customization options, allowing you to create rich visualizations directly from your MongoDB data without the need for complex data pipelines or external tools.

MongoDB Realm

MongoDB Realm is a serverless platform that allows you to build modern, data-driven applications using MongoDB Atlas as the data layer. It provides a suite of services and tools for building, deploying, and managing cloud-native applications, including functions, triggers, user authentication, and data synchronization.

// Define a Realm function
exports = function(arg) {
  const orders = context.services.get("mongodb-atlas").db("mydb").collection("orders");
  const customer = arg.customer;

  return orders.find({ customer_id: customer }).toArray();
};

// Define a Realm trigger
exports = function(changeEvent) {
  const orders = context.services.get("mongodb-atlas").db("mydb").collection("orders");
  const updatedOrder = changeEvent.updateDescription.updatedFields;

  if (updatedOrder.status === "completed") {
    // Send a notification or perform additional actions
    console.log(`Order ${updatedOrder._id} has been completed.`);
  }
};

In the first example, we define a Realm function that retrieves orders for a specific customer from the orders collection in the mydb database. The function takes an argument arg that contains the customer value, and it uses the MongoDB Atlas service to execute a find query and return the results as an array.

In the second example, we define a Realm trigger that listens for changes in the orders collection. When an order is updated and its status field is set to "completed", the trigger logs a message to the console. Triggers can be used to perform additional actions like sending notifications, updating other data sources, or invoking external services.

MongoDB Realm simplifies the development and deployment of modern, data-driven applications by providing a serverless platform with built-in services for data access, function execution, and event handling.

MongoDB Atlas Data Lake

MongoDB Atlas Data Lake is a fully-managed service that allows you to create a data lake on top of your MongoDB Atlas cluster. It provides a centralized repository for storing and analyzing structured and unstructured data, enabling advanced analytics, machine learning, and data processing workloads.

// Create a Data Lake collection
db.createCollection("logs", {
  timeseries: {
    timeField: "timestamp",
    metaField: "metadata",
    granularity: "seconds"
  },
  dataTier: "cloud.data_lake.preview"
})

// Insert data into the Data Lake collection
db.logs.insertMany([
  { timestamp: new Date(), message: "Log entry 1", metadata: { source: "app1" } },
  { timestamp: new Date(), message: "Log entry 2", metadata: { source: "app2" } },
  // ...
])

// Query the Data Lake collection
db.logs.find({
  metadata: { source: "app1" },
  timestamp: { $gte: ISODate("2023-01-01T00:00:00Z"), $lt: ISODate("2023-02-01T00:00:00Z") }
})

In this example, we create a new collection named logs in the MongoDB Atlas Data Lake using the db.createCollection() method. We specify the timeseries option to define the timestamp field as the time series field, the metadata field for storing additional metadata, and the granularity option to set the time series granularity to seconds.

We then insert sample log data into the logs collection using db.logs.insertMany(), including a timestamp, message, and metadata fields.

Finally, we perform a query on the logs collection, filtering the data based on the metadata.source field and a specific date range using the timestamp field.

MongoDB Atlas Data Lake provides a scalable and cost-effective way to store and analyze large volumes of data, including structured data, time series data, and unstructured data like logs and sensor data. It integrates seamlessly with MongoDB Atlas, allowing you to leverage the full power of MongoDB’s query language and analytics capabilities on your data lake.

MongoDB VPC Peering MongoDB

VPC Peering is a feature offered by MongoDB Atlas that allows you to establish a private, secure network connection between your MongoDB Atlas cluster and your Virtual Private Cloud (VPC) on a cloud provider like AWS, Azure, or Google Cloud. This enables you to access your MongoDB Atlas cluster from within your VPC without exposing it to the public internet, enhancing security and reducing network latency.

// Connect to MongoDB Atlas from within your VPC
const client = new MongoClient("mongodb://atlas-cluster.example.com:27017/mydb?tls=true&tlsCAFile=path/to/ca-cert.pem")

// Perform database operations
const db = client.db("mydb")
const orders = db.collection("orders")

// Insert a new order
await orders.insertOne({ customer_id: 123, items: [...] })

In this example, we establish a connection to a MongoDB Atlas cluster from within our VPC using the MongoClient constructor. The connection string includes the Atlas cluster hostname (atlas-cluster.example.com), port (27017), and additional options like tls=true for enabling TLS/SSL encryption and tlsCAFile=path/to/ca-cert.pem to specify the path to the Certificate Authority (CA) file for validating the server certificate.

Once the connection is established, we can interact with the MongoDB Atlas cluster just like we would with a locally hosted MongoDB instance. In the example, we perform operations on the orders collection, such as inserting a new order document.

By leveraging VPC Peering, you can ensure that all network traffic between your VPC and MongoDB Atlas is securely routed over a private network connection, reducing the risk of unauthorized access and enhancing overall security. Additionally, VPC Peering can improve network performance by minimizing network hops and reducing latency, making it suitable for latency-sensitive applications or workloads that require high throughput.

Conclusion

MongoDB is a powerful and versatile database that offers a wide range of advanced techniques and features to meet the diverse needs of modern applications. From aggregation pipelines and indexing strategies to sharding, replica sets, and advanced data integration capabilities, MongoDB provides a comprehensive set of tools for managing and analyzing data at scale.

By mastering these 20 advanced MongoDB techniques and applying them in your projects, you can unlock the full potential of this database, ensuring optimal performance, scalability, and data integrity. Whether you’re building real-time applications with change streams, implementing complex data transformations with aggregation pipelines, or integrating MongoDB with external tools like Elasticsearch or MongoDB Charts, these techniques will empower you to tackle even the most challenging data processing and analysis tasks.

Remember, learning and applying advanced MongoDB techniques is an ongoing journey. As new features and capabilities are introduced, it’s essential to stay up-to-date with the latest developments and best practices. Embrace the power of MongoDB, experiment with these advanced techniques, and continue to explore new ways to harness the full potential of this versatile database.