Tuesday, 15 September, 2020 UTC


Summary

Introduction

NoSQL brought flexibility to the tabular world of databases. MongoDB in particular became an excellent option to store unstructured JSON documents. Data starts as JSON in the UI and undergoes very few transformations to be stored, so we get benefits from increased performance and decreased processing time.
But NoSQL does not mean a complete lack of structure. We still need to validate and cast our data before storing it, and we still may need to apply some business logic to it. That is the place Mongoose fills.
In this article we'll learn through an example application how we can use Mongoose to model our data and validate it before storing it to MongoDB.
We will write the model for a Genealogy app, a Person with a few personal properties, including who their parents are. We'll also see how we can use this model to create and modify Persons and save them to MongoDB.

What is Mongoose?

How MongoDB Works

To understand what is Mongoose we first need to understand in general terms how MongoDB works. The basic unit of data we can save in MongoDB is a Document. Although stored as binary, when we query a database we obtain its representation as a JSON object.
Related documents can be stored in collections, similar to tables in relational databases. This is where the analogy ends though, because we define what to consider "related documents".
MongoDB won't enforce a structure on the documents. For example, we could save this document to the Person collection:
{
  "name": "Alice"
}
And then in the same collection, we could save a seemingly unrelated document with no shared properties or structure:
{
  "latitude": 53.3498,
  "longitude": 6.2603
}
Here lies the novelty of NoSQL databases. We create meaning for our data and store it the way we consider best. The database won't impose any limitation.

Mongoose Purpose

Although MongoDB won't impose an structure, applications usually manage data with one. We receive data and need to validate it to ensure what we received is what we need. We may also need to process the data in some way before saving it. This is where Mongoose kicks in.
Mongoose is an NPM package for NodeJS applications. It allows to define schemas for our data to fit into, while also abstracting the access to MongoDB. This way we can ensure all saved documents share a structure and contain required properties.
Let's now see how to define a schema.

Installing Mongoose and Creating the Person Schema

Let's start up a Node project with default properties and a person schema:
$ npm init -y
With the project initialized, let's go ahead and install mongoose using npm:
$ npm install --save mongoose
mongoose will automatically include the mongodb NPM module as well. You won't be using it directly yourself. It'll be handled by Mongoose.
To work with Mongoose, we'll want to import it into our scripts:
let mongoose = require('mongoose');
And then connect to the database with:
mongoose.connect('mongodb://localhost:27017/genealogy', {useNewUrlParser: true, useUnifiedTopology: true});
Since the database doesn't yet exist, one will be created. We'll be using the latest tool to parse the connection string, by setting the useNewUrlParser to true and we'll also use the latest MongoDB driver with useUnifiedTopology as true.
mongoose.connect() assumes the MongoDB server is running locally on the default port and without credentials. One easy way to have MongoDB running that way is Docker:
$ docker run -p 27017:27017 mongo
The container created will be enough for us to try Mongoose, although the data saved to MongoDB won't be persistent.

Person Schema and Model

After the previous necessary explanations, we can now focus on writing our person schema and compiling a model from it.
A schema in Mongoose maps to a MongoDB collection and defines the format for all documents on that collection. All properties inside the schema must have an assigned SchemaType. For example, the name of our Person can be defined this way:
const PersonSchema = new mongoose.Schema({
    name:  { type: String},
});
Or even simpler, like this:
const PersonSchema = new mongoose.Schema({
    name: String,
});
String is one of several SchemaTypes defined by Mongoose. You can find the rest in the Mongoose documentation.

Reference to Other Schemas

We can expect that all middle sized applications will have more than one schema, and possibly those schemas will be linked in some way.
In our example, to represent a family tree we need to add two attributes to our schema:
const PersonSchema = new mongoose.Schema({
    // ...
    mother: { type: mongoose.Schema.Types.ObjectId, ref: 'Person' },
    father: { type: mongoose.Schema.Types.ObjectId, ref: 'Person' },
});
A person can have a mother and a father. The way to represent this in Mongoose is by saving the ID of the referenced document, mongoose.Schema.Types.ObjectId, not the object itself.
The ref property must be the name of the model we are referencing. We will see more about models later, but for now is enough to know a schema relates to one model only, and 'Person' is the model of the PersonSchema.
Our case is a bit special because both mother and father will also contain persons, but the way to define these relations is the same in all cases.

Built-In Validation

All SchemaTypes come with default built-in validation. We can define limits and other requirements depending on the selected SchemaType. To see some examples, let's add a surname, yearBorn, and notes to our Person:
const PersonSchema = new mongoose.Schema({
    name: { type: String, index: true, required: true },
    surname: { type: String, index: true },
    yearBorn: { type: Number, min: -5000, max: (new Date).getFullYear() },
    notes: { type: String, minlength: 5 },
});
All built-in SchemaTypes can be required. In our case we want all persons to at least have a name. The Number type allows to set min and max values, that can even be calculated.
The index property will make Mongoose create an index in the database. This facilitates the efficient execution of queries. Above, we defined the person's name and surname to be indexes. We will always search for persons by their names.

Custom Validation

Built-in SchemaTypes allow for customization. This is specially useful when we have a property that can hold only certain values. Let's add the photosURLs property to our Person, an array of URLs their photos:
const PersonSchema = new mongoose.Schema({
    // ...
    photosURLs: [
      {
        type: String,
        validate: {
          validator: function(value) {
            const urlPattern = /(http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-/]))?/;
            const urlRegExp = new RegExp(urlPattern);
            return value.match(urlRegExp);
          },
          message: props => `${props.value} is not a valid URL`
        }
      }
    ],
});
photosURLs is just an array of Strings, photosURLs: [String]. What makes this property special is that we need custom validation to confirm the values added have the format of an internet URL.
The validator() function above uses a regular expression that matches typical internet URLs, that must begin with http(s)://.
If we need a more complex SchemaType we can create our own one, but we do well to search if it's already available.
For example, the mongoose-type-url package adds a custom SchemaType that we could have used, mongoose.SchemaTypes.Url.

Virtual Properties

Virtuals are document properties that are not saved to the database. They are the result of a calculation. In our example, it would be useful to set the full name of a person in one string instead of separated in name and surname.
Let's see how to accomplish this after our initial schema definition:
PersonSchema.virtual('fullName').
    get(function() { 
      if(this.surname)
        return this.name + ' ' + this.surname; 
      return this.name;
    }).
    set(function(fullName) {
      fullName = fullName.split(' ');
      this.name = fullName[0];
      this.surname = fullName[1];
    });
The virtual property fullName above makes some assumptions for the sake of simplicity: Every person has at least a name, or a name and a surname. We would face problems if a person has a middle name or a composed name or surname. All those limitations could be fixed inside the get() and the set() functions defined above.
Because virtuals are not saved to the database, we cannot using them as filter when searching for persons in the database. In our case we would need to use name and surname.

Middleware

Middleware are functions or hooks that can be executed before or after standard Mongoose methods, like save() or find() for example.
A person can have a mother and a father. As we said before, we save this relationships by storing the id of the object as properties of the person, not the objects themselves. It would be nice to fill both properties with the objects themselves instead of the IDs only.
This can be achieved as a pre() function associated to the findOne() Mongoose method:
PersonSchema.pre('findOne', function(next) {
    this.populate('mother').populate('father');
    next();
});
The function above needs to call the function received as a parameter, next() in order to keep processing other hooks.
populate() is a Mongoose method to replace IDs with the objects they represent, and we use it to get the parents when searching for only one person.
We could add this hook to other search functions, like find(). We could even find parents recursively if we wanted. But we should handle populate() with care, as each call is a fetch from the database.

Create the Model for a Schema

In order to start creating documents based on our Person schema, the last step is to compile a model based on the schema:
const Person = mongoose.model('Person', PersonSchema);
The first argument will be the singular name of the collection we are referring to. This is the value we gave to the ref property of mother and father properties of our person. The second argument is the Schema we defined before.
The model() method makes a copy of all we defined on the schema. It also contains all Mongoose methods we will use to interact with the database.
The model is the only thing we need from now on. We could even use module.exports to make the person available in other modules of our app:
module.exports.Person = mongoose.model('Person', PersonSchema);
module.exports.db = mongoose;
We also exported the mongoose module. We will need it to disconnect from the database before the application ends.
We can import the module this way:
const {db, Person} = require('./persistence');

How to Use the Model

The model we compiled in the last section contains all we need for interacting with the collection on the database.
Let's now see how we would use our model for all CRUD operations.

Create Persons

We can create a person by simply doing:
let alice = new Person({name: 'Alice'});
The name is the only required property. Let's create another person but using the virtual property this time:
let bob = new Person({fullName: 'Bob Brown'});
Now that we have our first two persons, we can create a new one with all properties filled, including parents:
let charles = new Person({
  fullName: 'Charles Brown',
  photosURLs: ['https://bit.ly/34Kvbsh'],
  yearBorn: 1922,
  notes: 'Famous blues singer and pianist. Parents not real.',
  mother: alice._id,
  father: bob._id,
});
All values for this last person are set to valid ones, as validation would fire an error as soon as this line is executed. For example, if we had set the first photo URL to something other than a link, we would get the error:
ValidationError: Person validation failed: photosURLs.0: wrong_url is not a valid URL
As explained before, parents were completed with the IDs of the first two persons, instead of the objects.
We have created three persons, but they are not stored to the database yet. Let's do that next:
alice.save();
bob.save();
Operations that involve the database are asynchronous. If we want to wait for completion we can use async/await:
await charles.save();
Now that all persons are saved to the database, we can retrieve them back with the find() and findOne() methods.

Retrieve One or More Persons

All find methods in Mongoose require an argument to filter the search. Let's get back the last person we created:
let dbCharles = await Person.findOne({name: 'Charles', surname: 'Brown'}).exec();
findOne() returns a query, so in order to get a result we need to execute it with exec() and then wait for the result with await.
Because we attached a hook to the findOne() method to populate the person's parents, we could now access them directly:
console.log(dbCharles.mother.fullName);
In our case we know the query will return only one result, but even if more than one person matches the filter, only the first result will be returned.
We can get more than one result if we use the find() method:
let all = await Person.find({}).exec();
We will get back an array we can iterate over.

Update Persons

If we already have a person, either because we just created it or retrieved it, we can update and save changes by doing:
alice.surname = 'Adams';
charles.photosURLs.push('https://bit.ly/2QJCnMV');
await alice.save();
await charles.save();
Because both persons already exist on the database, Mongoose will send an update command only with the fields changed, not the whole document.

Delete Persons

Like retrieval, deletion can be done for one or many persons. Let's do that next:
await Person.deleteOne({name: 'Alice'});
await Person.deleteMany({}).exec();
After executing these two commands the collection will be empty.

Conclusion

In this article, we have seen how Mongoose can be very useful in our NodeJS and MongoDB projects.
In most projects with MongoDB we need to store data with certain defined format. It's good to know that Mongoose provides an easy way to model and validate that data.
The complete sample project can be found on GitHub.