UPDATES: added a stab at many-to-many associations, 3/2/2018; revised associations section, 10/31/2017.
- SQL databases
- How does an ORM (object relational manager) work?
- Models and Migrations
- Structuring Data
- Tools & Set-Up
- Using the Sequelize CLI
- Sequelize CLI Part II: Generating Models
- Editing Models and Migrations (Not Including Associations/Foreign Keys)
- Model Associations (for Foreign Keys): hasMany, hasOne, belongsTo
- Creating Foreign Keys in Migrations
- Guesses at Many-to-Many Associations: belongsToMany
- Pluralization: Weird or Intuitive?
- Telling the migrations where to run (configuring database environments)
- Calls from within your app
- Testing with Mocha: beforeEach Hooks
- Production: A note on environmental variables
SQL stands for “structured query language”. SQL databases are unified by a common parent language that builds databases that function a lot like Excel spreadsheets. SQL databases are made up of tables, and each table has columns and rows. The shape of the data makes a “grid” of “cells”. SQL DBs are often thought of as “relational” because each table can store references to other tables through numeric coding. Every row of each table has an “id” that can be used as a reference for that specific row in a “cell” of another row, possibly in another table. So SQL DBs are a natural choice for data that has clear relationships and not a lot of “blank cells”, because even if nothing is stored in a “cell”, that space takes up memory in the database. The shape of a SQL DB’s tables determines the possible kinds of data that the DB can store, and that shape must exist before you can store any data. This is what makes migrations so valuable. They can store a set of instructions for determining that shape nearly instantaneously.
How does an ORM (object relational manager) work?
Models and Migrations
A model is a set of criteria that determines what counts as a valid database entry. In a SQL object relational manager, each candidate table row flows through the model, which determines whether the row has the required column-cells and whether those cells are well-formed based on the model’s rules. If the row passes all the model’s checks, it is entered into the database.
A migration is a set of database operations that has an “up” function and a “down” function. The “up” function changes the database and the “down” function attempts to restore the database to how it was before the “up” function ran. You can use migrations for structural database operations, like adding tables, columns, and whatnot, but you can also use migrations to add data (rows) to tables. However, what most people mean when they talk about migrations is the former, structural database operations.
Although migrations are an elegant solution to track series of changes to a database, you’ve got to be just as careful with them as you would be with one-off SQL operations. The reason is this: you can write a “up” migration to drop a table, and a “down” migration to add it back, but unless you include every row that was in the table to start with in your “down” migration, you still lose that table’s data even when the table itself is added back to the database. The situation gets even more complicated when you consider dropping and adding parts of tables.
Despite these limitations, which are inherent to all structural database operations, migrations are an awesome tool to quickly configure and replicate a database environment across a project. For most projects, you’d like to have at least three separate databases. One for messing around with in development, one for testing operations, and one for your finished, deployed app. Why use three? It may seem obvious why separating these concerns makes sense for a large app with customer data, but even in a small project with no important production data, it’s a great idea. It makes it easier for you, the developer, to work your magic. The testing environment will be designed to completely scrap your entire database on each operation, and you’ll run 20-50 of these operations per automated test. Testing plays fast and loose with your data, and you won’t want those tests to interfere with the flow you’ve got going on in your development database. As for production, that database will be on a cloud server somewhere far away from your development database. Simply put, database environments are relatively cheap when you consider the benefits of separating development, testing, and production.
Models are used every time you attempt send data (table rows) through your app and into your database tables. Migrations are used to set up those tables’ columns or to bulk add rows. So while a model and a migration for the same database table may seem repetitive, remember that the model controls what data can enter the database, and migrations actually themselves make changes to the database.
Think long and hard before you generate your models. Because SQL tables determine the shape of possibility for your data, and are relatively cumbersome to manipulate once they contain data, you’ll want to have a clear idea of what you need to record and how it relates. We do this by determining which tables’ data belongs to which other tables’ data, and whether the relationship is one-to-many or many-to-many.
In this project, our database has four tables: Users, Contexts, Tasks, and Sessions. For the moment, don’t worry about the Sessions. If you’re ready to take the plunge and configure express sessions, check out my blog post here.
User is the highest level table in our database. Users don’t belong to contexts or tasks. Contexts belong to users. Think about the view inside the application. A User should only see the Contexts that they’ve added (and the default Contexts added for them upon registration). Imagine one User added a Context of “Grandma’s House”. It would be apropos of nothing in another User’s view. One User can have many Contexts. And one Context can have many Tasks. We should be able to change the Context of a Task, but Tasks are organized into Contexts to make them easier to retrieve. Each Context can have many Tasks, but each Task has only one Context.
So we’ve come up with the following data structure: Many Tasks belong to one Context, and many Contexts belong to one User. Another way to say this is that one Context has many Tasks and one User has many Contexts. We’ll nail down exactly which one applies when we look at defining associations for our Sequelize tables.
Tools and Set-Up
To do this project, you’ll need to be brave, moderately handy with the terminal, and have Git, Node, and a MySQL graphical user interface installed, such as Sequel Pro for Mac or MySQL workbench for Windows. Install Node via your favorite package manager or download it from Nodejs.org. I like to use Node Version Manager, which allows you to switch between versions. Installing Git is outside my scope; get acquainted with Git.
Create a new repository with GitHub or BitBucket and
git clone it to your computer.
cd into your repo and run
npm init to create a new Node project. Answer the questions as best you can while node generates your
package.json file, which will keep a record of all your packages and node configuration. Run the following to add packages:
In addition, install the Sequelize CLI (command line interface) globally:
Finally, we want to add a couple files to our
.gitignore file to prevent saving them in the repo. Run
touch .gitignore and open it in your favorite text editor. Add the following.
The node_modules are large and can be generated in deployment from the package.json file. The config files contain sensitive passwords and such which will be replaced by environment variables on deployment, and so SHOULDN’T be part of the repo.
Using the Sequelize CLI
One of the great things about Sequelize is that it comes with a command line interface that generates skeletal models and migrations for us to expand and customize. Sequelize adds some convention to the configuration-heavy Node environment to make our lives much easier.
Start by running
sequelize init in your project’s root directory. This will add a bunch of directories and files to your project. For now, we’re interested in the
models directory. Open
models. You should find an
index.js file that looks something like this:
This model is the root model that will allow Sequelize to read all the other models you make. It connects your app to the database by way of the lowercase
sequelize instance of the
Sequelize constructor, which, for now, reads the connection configuration set in the config file, places it in the
db object, and exports that connection along with all of your models to be read by other files in your app (
module.exports = db) That db variable also includes the Sequelize constructor itself.
Go ahead and commit to your repo.
Sequelize CLI Part II: Generating Models
We’ve got our index file connecting our connection and our models. Now, to generate models and migrations: on the CLI, type:
This will create an incomplete User model and an incomplete User migration that we will flesh out. Let’s follow suit with the Context and Task models:
And, like magic, you should have several files that weren’t there before in your
migrations directories. The default for generated Sequelize files is camelCase. However, you can specify snake_case by adding the
--underscored flag, and you can keep Sequelize from automatically pluralizing the table name by adding
--freeze-table-names. Read more about auto-pluralization below.
Commit your work. Now we’re going to edit these files so that they fully reflect the relationships between the data that we intend and validate our data as it’s saved (or not saved if invalid).
Editing Models and Migrations (Not Including Associations/Foreign Keys)
Let’s start with the User model file. The generated file reads as follows:
We want to make a few changes here. What are they? Well, first it’s important to understand that Sequelize automatically adds three columns to each table: a unique id that acts as the primary key for the table, and createdAt and updatedAt columns, which are self-explanatory. By default, the id column will start at 1 and assign numbers in ascending order. This pattern is okay for small projects, but it lacks sophistication. Deleted rows will leave obvious gaps in the numbering, rows in different tables will have the same ids, and the unique ids won’t all have the same number of digits. So we’re going to alter the id column to assign a UUID to each row. Read more about primary keys and UUIDs. Sequelize has a UUID datatype that will make this a snap.
Additionally, we want to specify that each email must be unique to the database and each User must have a password.
So, make the following changes:
Now we need to edit the boilerplate User migration to reflect the changes we’ve made to the model in the table’s structure. Notice that while the id, createdAt, and updatedAt columns weren’t present in the original boilerplate User model, they are in the boilerplate User migration. They’re not in the model because the Sequelize node package specifies them as a default part of each model, but they have to be in the migration so that the columns are created in the table. Remember, SQL databases can’t save data unless you’ve previously created columns for that data. Here is the generated boilerplate User migration:
And here is the migration edited to reflect the changes in the User model.
Commit your work. Now we must make a similar set of changes to the Context and Task models and migrations. Here’s our Context model once we’ve expanded the boilerplate.
And the migration:
Commit your work. On to Tasks!
In the Task model, note that we’ve added some extra validation for the Task name. There are all kinds of extra validations you can perform with Sequelize models, but keep in mind that if an entry doesn’t meet every criterion in the model, the row will not be saved at all. It’s important to prevent save failure by using front end validation and back end sanitization to ensure that most data passed to your model can be saved. Writing code to deal with failed database updates is a minefield since there are many reasons an update can fail: at least one for every model validation criterion in addition to actual database errors. However, any mature codebase (which this project is not) will handle failed database updates.
Now for the Task migration.
Commit your work.
Model Associations (for Foreign Keys): hasMany, hasOne, belongsTo
The section on associations in the Sequelize docs is confusing. When should you use hasMany, hasOne, belongsTo, and belongsToMany? The Sequelize docs talk about the source and target models and tables, but doesn’t clearly explain which is which. I THINK that the source table is the table whose rows store the foreign keys corresponding to the target rows’ primary keys.
In Sequelize, an association must be defined on both the target and the source models. For clarity, let’s call them the “model containing the primary key” and the “model containing the foreign key”. The model containing the primary key uses hasMany (1:many) and hasOne (1:1) while the model containing the foreign key uses belongsTo (1:1, 1:many). The name of the foreign key column must be specified on both models, in the associations section. Do not put the foreign key column in the list of columns on the model containing the foreign key. You will get the error that the column already exists, because Sequelize will try to create it twice. Instead, only define the foreign key in the associations section. This project only uses one-to-many relationships, so check out the Sequelize docs section on associations for one-to-one and many-to-many associations. As of this writing, the docs text seems to be right, if not clear or laden with examples. I’ve added a short summary on many-to-many associations in models and migrations a few sections down.
Let’s define the model associations for the User. We’ll handle migrations separately in the section below. We want to associate the User model with the Context model by specifying that a User has many Contexts, and pass the name of the foreign key in the Context model (model containing the foreign key) to the User model (model containing the primary key). We’ve also specified
onDelete: CASCADE, which tells Sequelize that if we delete a User, it should delete all that User’s Contexts. I’m not sure if this needs to be specified on the model containing the primary key, but I’ve included it to be future-proof. The other option is
onDelete: 'SET NULL'.
In the associations section of the Context model, we’ll specify that each Context belongs to a User and each Task has many Contexts. Wait, didn’t we tell Sequelize that each User has many Contexts? Why do we also need to tell it that each Context belongs to a User? Well, not all database relations are one to many. What if we designed our database so that each User had many Contexts and each Context could have many Users? It would be a logistical mess for this particular database (since Tasks belong to Contexts and not Users), but you can start to see why we need to define the data relationship on both sides of the relation for Sequelize to understand how to store foreign keys. Note that we’ve specified the name of the User-Context foreign key again here on Context, the model containing the foreign key, and the name of the Context-Task foreign key here on Context, the model containing the primary key.
Finally on the Task model, we’ll specify the name of the Context-Task foreign key on Task, the model containing the foreign key.
Commit your work. On to migrations!
Creating Foreign Keys in Migrations
Just like you include the id, createdAt, and updatedAt columns in the migration, but not the model, you must explicitly create the foreign key column on the migration containing the foreign key, along with info about which table the foreign key references. Where the association implies the connection on the model, you must tell the SQL DB itself how to constrain the columns you create in the migration. Read more about foreign keys. There are no changes to the User migration because it does not include any foreign keys.
The UserId column on the Context model stores the foreign key, or the id from the row in the Users table that the particular Context row belongs to. We’ve specified a couple of options for our UserId column: it’s a UUID (no surprise there), it references the id in the corresponding row from the Users table, and there’s that
onDelete: CASCADE again. Putting this in the model defines the rules of the association between Context and User for Sequelize, and repeating it in the migration tells the SQL table what to do. If we had somehow specified the association in the CLI boilerplate generation (which may be possible, check out the Sequelize docs), I suspect that the UserId column definition would have been added to the migration exactly as we’ve added it here.
onDelete: CASCADE tells Sequelize that if we delete a User, it should delete all that User’s Contexts.
And we’ll do the same on the Task model.
Commit your work.
Guesses at Many-to-Many Associations: belongsToMany
Many:many relationships break the mold and belongsToMany is defined on both of the models with no association on the join table. Each association must give the name of the join table in a “through” property, with foreignKey and onDelete, etc. I am not sure whether the join table even needs a model (unless you want to define just its primary key to make it a UUID, since foreign key columns are not in the model), but it seems like you should definitely create a migration for it to create the foreign key columns. I haven’t tried it and I think it makes the nomenclature for associations even more weird and counter-intuitive. Here’s a example of how I think it works unrelated to this project.
Pluralization: Weird or Intuitive?
You might also ask why the model name is given as “Users” but the user model filename is “user.js” and the actual text of the User model says “User”, not “Users” in several places. Take a look at the User migration. The actual table in the database is called “Users”. This convention of pluralizing the table name while the model definition is singular comes from Ruby on Rails and is intended to express that the model applies to individual database entries as they enter the database and the table is a collection of these entries. Some find this confusing and it’s possible to “freeze table names” in Sequelize so that this doesn’t happen (see Generating Models above). Take a look at the docs for more info. However, since this article is about how Sequelize brings a little organization/sanity to the Node environment via the Rails “convention over configuration” ethos, I’ve left the dynamic and pluralized while the models are singular.
Telling migrations where to run (configuring database environments)
Now we have a bunch of lovely models and a great database structure with our migrations, but we don’t have a database yet. Let’s fix that. We’re going to create development and test databases, update our config file to reflect the databases we’ve made so we can connect to them, and migrate the structure we’ve created into both databases.
Install MySQL, PostgreSQL, MariaDB, SQLite or MSSQL on your machine. Sequelize works with all of these. I’m going to give instructions using MySQL and the Mac GUI Sequel Pro to manipulate the DB. Here are instructions to install MySQL on a Mac.
Start the MySQL server by heading to the Terminal and typing
mysql.server start. Open up Sequel Pro and enter your connection credentials. Select “Add Database” from the dropdown on the upper left and enter a development database name. Repeat the step and enter a test database name. The production database will be on a server somewhere like Heroku, so don’t worry about it for now.
Now to update our
config.json file. Remember, DON’T save this file in your repo or it’s totally your fault if you get hacked. An example can be found in my repo under
config-example.json. As you can see, it creates several different environments with different databases that your
index.js model can connect to. You can create more with different names and credentials if you have need.
Let’s run our migrations to configure our database in both the development and test environments. Go to the Terminal, cd into your project repo, and type:
This will run your migrations in the development environment. Again for the test environment:
Undo a migration with
sequelize db:migrate:undo. If you need to change a model or add/subtract something to/from a whose migration you’ve already run, use
sequelize migration:create to generate a fresh migration skeleton. They’re prefixed with the date so they’ll stack up in the order you create them, and they’ll run alpha filename order. Beware that if you have multiple migrations in the directory when you hit
db:migrate, they’ll all run immediately. For more fine-grained control, move the later migrations out of the directory and run them one at a time by dropping each one back in the directory when the previous one is complete. See a fuller list of commands over at the docs’ bit on migrations.
In development, don’t be afraid to run and revert migrations over and over and over until you get them right. You’ll sweat and swear, but you won’t break anything. The screenshot below looks super neat and great but in practice you’ll be doing and undoing them over and over until you get the syntax and the desired DB changes exactly right, especially on the development database.
When successful, your terminal should look something like this:
We don’t have to specify the development environment when we run the development migration because development is configured as the default environment in the
Now when you view your databases in Sequel Pro, you should see all the new columns when you select a table.
Calls from within your app
Testing with Mocha: beforeEach hooks
Mocha is a great test suite for Node, which in combination with other utilities like Chai and Nightmare can help you write great automated unit tests for your code. Once you’ve got tests, you can run them every time you change the app to see if you’ve broken anything. I can hardly convey how awesome this is for you as a developer.
One important part of writing automated tests is configuring a perfectly controlled vanilla test environment that is exactly the same for each test. Since each test changes the database, you need to reset it before you run each test. Mocha has a great tool for doing this called a beforeEach hook. Here’s what a good sequelize database reset in a Mocha
beforeEach hook should look like for this Sequelize app.
Production: A note on environmental variables
Even though your
config.json file has an entry for a production database, in practice, you’ll never use it. This is because in production you want to configure your database connection through an environment variable, rather than in plain text. Some people prefer to upload and unzip tarballs of json configuration files and reference them instead of using env vars. I prefer env vars. I think it’s more secure and the only way to go in production. This will ultimately require some changes to your
index.js model. Here’s a snippet from the
index.js model configured to deploy to Heroku using a JawsDB MySQL database:
And that’s pretty much it. Check out the repo at https://github.com/slsriehl/seq-wp. Here are some articles I used to learn this stuff.
- The Sequelize docs
- Alex Booker’s general videos (Code Cast)
- Greg Trowbridge on migrations
- Chloe Echikson’s talk on migrations
- Michael Herman’s ‘Node, Postgres, and Sequelize’
- J. Muturi on Node, Express and Postgres Using Sequelize (scotch.io)
- Loren Stewart on associations
- Hari Das for Codementor
- Bulkan Evcimen on using Sequelize with an existing database