After explaining how to automate data quality tests and how to test primary key constraint in your database, in this article we’ll focus on tests related to foreign key constraints.
Foreign keys (FKs) are used to establish relations between data stored in different tables in our database. A foreign key is usually one attribute (sometimes a set of attributes) in the table that is related to the primary key of another table. Meaning, that the values this attribute could hold are limited to the set of values of that primary key attribute + NULL value. The foreign key value will uniquely define a row in the referenced table.
In our example, attributes user_id and role_id are foreign keys and they reference the user.id and the role.id attributes.
Why Are Foreign Keys Important in a Databases?
Having the database without FKs is a bad practice and one should have a really good reason not having them. There are many advantages of using FKs, but the two main ones are:
- Foreign key constraints takes care of data integrity: You can’t insert a FK value that doesn’t exist in the PK column of the referenced table. Also, when you make changes to that primary key (inserts/updates/deletions) these changes will have impact on the table with the FK constraint. For example, you won’t be able to delete a primary key value that is used as a FK in other table (unless you’ve set it up that way in the database).
- Performance: Foreign key constraints reference PKs that are indexed and indexes make search for records much faster.
A Few Prerequisites Before We Can Establish Our Controls:
- All PK constraints in our database should be defined on a single column called id. The column should be of an integer data type with auto increment property set to yes. In cases when PKs are composed of more than one value, we would also have FKs composed of more than one value and that is usually not the best option (indexes can’t be created over multiple columns due to performance issues). In case you are willing to define these few columns in the table hold unique combination, don’t use the PK to denote that, but the UNIQUE constrain instead. All FKs should use a naming convention such as<referenced_table_name>_id. This way, we’ll be able to identify all FK attributes by their name and also increase the readability of our model.
- Attributes that are not FKs shouldn’t use the naming convention form the 2nd
What FK Related Tests Will Be Performed in Our Database?
- For each PK, test if it used at least once as a FK. Usually, there is not much sense to have a table (with PK defined) that is not referenced by other tables in our database. Still, this could be the case in many-to-many relations, e.g. user_has_role table in our model. One other option is if we have tables that we use as reporting tables, e.g. we make daily business summary and store the result in that table.
- For all FK attributes, test if they are actually named <referenced_table_name>_id.
- For all attributes that are named <any_table_from_the_database>_id, test they have defined the FK constraint that references the id column in that table.
I won’t check that every attribute that contains the table name (e.g. user) or is named <something>_id (where <something> could be anything, not just the table name) must be the FK at the same time. Maybe we want to use these “words” as well as in other attributes. For instance, the column named unique_user_id could be used to store values of a unique card id number for users and these values could even be of a varchar data type which is completely fine.
We can list all Foreign key constraints in the database (variable v_database_name) using a query on the INFORMATION_SCHEMA database (in MySQL) as this one:
SELECT INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME, INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_COLUMN_NAME, INFORMATION_SCHEMA.KEY_COLUMN_USAGE.TABLE_NAME, INFORMATION_SCHEMA.KEY_COLUMN_USAGE.COLUMN_NAME, INFORMATION_SCHEMA.KEY_COLUMN_USAGE.CONSTRAINT_NAME FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE WHERE INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME IS NOT NULL AND INFORMATION_SCHEMA.KEY_COLUMN_USAGE.TABLE_SCHEMA = v_database_name ORDER BY INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME;
You’ll notice that this query provides us with all the information we need: name of the FK column and the table that contains it, as well names of the referenced column and its table. The CONSTRAINT_NAME is nothing more than a name the system generates automatically or we have chosen for this constraint.
Having this in mind we’ll create the stored procedure that will perform tests on the FK we want:
When we run a procedure on our example database we’ll get the following output:
This tells us that all tests turned out well, except the check on the user_has_role.id column. Still, this is just a WARNING because this column is not used as a reference to any other FK. That is acceptable, because this is many-to-many relation between tables user and role.
Now, moving on to making two modifications to our model in order to test if our procedure returns errors when it should. The model now looks like this:
You’ll notice that I’ve deleted the FK constraint between the user_has_role.role_id and the role.id. Besides that, I’ve added the unique_user_id attribute to the user_has_role table. The procedure should return an error that there is a missing FK and a warning that the role.id is not used in any foreign key. After the procedure call, we’ll get the following output:
The list contains:
- WARNING that the id is not referenced anywhere in our model and that is expected behavior.
- WARINING we already had in the previous step for the id column.
- ERROR that the FK is not defined on the rold_id column. This is also an expected behavior and the procedure generates this error based on the name of the column (role_id) as well as the fact that the table with the corresponding name exists (role + “_id” gives exactly the name of the tested column). Please notice that the column unique_user_id doesn’t generate an ERROR because there is not a table named unique_user in our model.Today I’ve described one approach we could use to test foreign key constraints in our database. This approach is closely related to the naming convention and database design rules described in the two previous articles of this series.
From my experience, establishing these controls can prove to be very useful and spare a lot of time. What are your thoughts? Feel free to share your opinion as well as experience on this subject!