logo logo

Database Testing with Great Expectations

main post image

 

“Take nothing on its looks; take everything on evidence. There’s no better rule.”

— Great Expectations – Charles Dickens

Back in April we ran our first East Anglia Quality Assurance meetup where Technical Architect Brendon Bezuidenhout-Green gave a great talk on the importance of database testing. The recording can be found here. Since then I have been on the lookout for tools which can assist with database testing. Subsequently, I came across Great Expectations after listening to a podcast.

Great Expectations is tool which I think can be used by testers to help automate the testing of databases & data. What is it? I hear you ask, well, shall we take a closer look? 🧐

Table of Contents – Database Testing with Great Expectations

  1. What is Great Expectations
  2. How does Great Expectations work?
  3. How can we use Great Expectations?
  4. Getting started with Great Expectations
  5. Adding & Editing Expectations
  6. Summary

What is Great Expectations?

Great Expectations is an open source Python framework for writing automated data pipeline tests. It integrates with many commonly used data sources including, MySQL, Postgres, Pandas, SQLAlchemy and many others (checkout their website for the full list).

As testers, we’re used to writing assertions for our API’s and UI’s to check the outcome aligns with what we expect. Well, Great Expectations lets us do the same for our data. Cool yeah? 😎

How does Great Expectations work?

Great Expectations houses an impressive list of inbuilt assertions, known as expectations (hence the name). Expectations can be run against the structure of the data source or against specific columns of data. For example, running the expectation

expect_column_to_exist(“UserName”)

will check if the UserName column exists in the selected data source.

Where as

expect_column_values_to_not_be_null(“UserName”)

checks that there are no null values in the UserName column.

You can also validate string patterns using regex, check if values full inside a range or set, confirm that all values are unique and so many more. Check out the full list of expectations here.

Not only that, but you can also use the ‘mostly’ parameter to set tolerance levels on your expectation. This is really handy if you know some of your data won’t meet the expectation but you don’t want your test to fail. For example:

expect_column_value_lengths_to_be_between("AccountNumber",min_value=5,max_value=10,mostly=0.80)

The above expectation asserts that the AccountNumber in the data source should be between 5 and 6 characters in length 80% of the time. If the data falls within this tolerance then the expectation will pass.

The great thing is once you’ve set up your expectations you can run it against new data sources.

How can we use Great Expectations?

Now, I’m a huge advocate of making automation testing more accessible to all members of the team, which is why I think Great expectations is so, err. well, great!

Following installation (through PIP) team members can write, manage and execute expectations directly from a Jupyter notebook. This is a nice feature as it removes the need for an external IDE and gives less technical team members a platform to explore the data, write expectations and measure data quality.

Great Expectations

Once an expectation suite has been run, an HTML report known as a data doc gets generated. Data docs detail available expectation suites, previous runs and allow team members to drill down into passed & failed expectations.

Great Expectations data docs

Great Expectations data docs

Looks good, right? We hear a lot about the ‘whole team approach’ to testing these days and its tools like this which really make it possible. And for those who want a bit more no need to worry. Great Expectations can be run from the command line or through Python and can be integrated into your data pipeline.

Want to find out more? Keep reading and we’ll look at how to set up and run our first expectation suite on a small dataset.

Getting Started with Great Expectations

Right, now we know a bit about Great Expectations. Let’s get started and create our first set of expectations. I’m using a Windows machine so some of the commands may be slightly different if you’re on Linux or a mac.

To get started, let’s create a directory from the command line.

getting started with Great Expectations

Now let’s set up a virtual environment and install Great Expectations by running pipenv install great_expectations from inside our new directory.

Install Great Expectations

Cool! Now our virtual environment has been created and Great Expectations is installed. To start our virtual environment we just need to run pipenv shell. Now that we’re inside our environment, we can initiate Great Expectations. The initiation process will allow us to specify a data source. Once selected, Great Expectations will profile some of the data and produce a sample data doc. This is where we want to get to. It’s going to involve a lot of entering ‘y’ into the command line but bear with it, you’re nearly there.

We’ll start by running Great_expectations init in our command line.

First, Great Expectations will ask for permission to create a new structure inside of your directory so just enter ‘y’ to proceed.

Great Expectations permissions

Next, we will be asked if we want to configure a data source. We’ll answer ‘y’ to this too.

Configure Data Source

We’re going to be using a simple .csv file which contains a few users, but you can set your data source to be a database like MySQL or sqlalchemy.

Our data source looks like this:

AccNumber, UserName, EmailAddress,PhoneNumber, DoB

12345, ‘User 1’, ‘[email protected]’,’07784521225′,2000-01-01

21254, ‘User 2’, ‘[email protected]’,’01254123210′,1990-12-23

96541, ‘User 3’, ‘[email protected]’,’01254778963′,1987-05-16

66332, ‘User 4’, ‘[email protected]’,’07985451236′,1975-02-21

12354, ‘User 5’, ‘[email protected]’,’01236321456′,1978-11-16

45215, ‘User 6’, ‘[email protected]’,’01478522114′,1964-06-14

Given that we’re using a .csv file, we can select 1 to select our data.

Data Source - Great Expectations

Now we can select 1 for Pandas (we don’t need to worry too much about this at the moment) and then provide the path and filename for our file.

Data Source - Great Expectations

Getting close now. Enter ‘y’ to add your data source to your configuration. A configuration file can contain multiple data sources.

Data Source - Great Expectations

Now that we’ve added our data source, Great Expectations will profile our data in order to create an example data document with some sample expectations.

Data Source - Great Expectations

We’ll choose our data asset by entering 1 and select our user file by entering 1 again.

Data Source - Great Expectations

Feel free to change the name of the suite, but we’ll hit enter to use the default name user.warnings.

Data Source - Great Expectations

Data Source - Great Expectations

Done! Great Expectations has created our suite. Let’s create our data document now by, yep you’ve guessed it, entering ‘y’ into our command line.

Data Doc - Great Expectations

Data Doc - Great Expectations

Great, we’re all set up. We have a suite of expectations and a data document which gives us a nice html output of all our suites and a history of previous runs. Open the data doc by entering ‘y’.

Data Doc - Great Expectations

The data doc will then open in your browser.

Data Doc - Great Expectations

Adding & Editing Expectations

Before we go on let’s just recap. So far, we have installed Great Expectations, added our data source to our project and allowed Great Expectations profile and created some example expectations.

We can edit our expectations directly from jupyter notebooks. To open our notebook just enter great_expectations suite edit users.warning into the command line.

This will open our notebook. If we scroll down in our notebook we’ll see the auto generated expectations.

Auto Generated Expectations

Let’s add a new expectation to the AccNumber section by adding a new cell via the insert menu and adding the expectation

batch.expect_column_values_to_be_unique("AccNumber")

This will check that each value in our AccNumber column is unique. Cool, now that’s added we can run the entire notebook. By selecting ‘Run All’ from the Cell menu.

Awesome! We now have an output for each of our expectations. Let’s take a closer look at the expectation we added.

Adding expectations

The output has returned success. That’s great! And our data doc has been updated to show our latest run, which includes our new expectation, awesome! 💪

Data Doc with new expectation

Summary

Well, that’s it! 🎉 You’ve made it to the end of the database testing demo but in reality, we’ve only just scratched the surface. We’ve only used a small dataset and the auto generated suite. We’ve covered the basics but now it’s time to go explore… Perhaps add some more data to your dataset and rerun your suite or maybe generate the Python code so you can incorporate it into your existing test process, most of all play around and have some fun ✨

To find out more visit the Great Expectations docs.

Avatar

About the author

Ryan Howard

Ryan is a Test Engineer and founder of How QA Test consultancy with 10 years experience in the software industry. Ryan loves to talk testing and is the co organiser of East Anglia Quality Assurance meetups where people get together to talk all things ‘testing’.

Ryan believes strongly in making automation testing more accessible to all members of the team and to this end is a strong advocate of ‘Codeless’ automation tools. Ryan also blogs about testing tools and has a strong interest in finding great Python Package to assist in his testing efforts.

Happy Testing!

Join TestProject Community

Get full access to the world's first cloud-based, open source friendly testing community. Enjoy TestProject's end-to-end test automation Platform, Forum, Blog and Docs - All for FREE.

Join Us Now  

Leave a Reply

popup image

Best In Class Java, Python and C# SDK

Join thousands of automation developers using TestProject to supercharge open source testing, with a Selenium and Appium SDK, supporting Java, Python and .NET Core (C#)!
Sign Up Now right arrow
FacebookLinkedInTwitterEmail