If you have ever written code that tests database interactions, such as data access objects, you very possibly have run up against one of the most perennial annoyances in testing: In order to accurately test these interactions, a database is required.

For the sake of this article, let’s consider an application that will be using PostgreSQL as part of its environment because that is what the examples will use. Also, although H2 is mentioned extensively, this is meant in no way to denigrate it – used in the right place, it’s a great tool.

The problem

Various approaches to solve this problem have been put forward, but there always seems to be some drawback.

One testing approach would be to use an in-memory database such as H2.

Pros:

  • The database is local to the virtual machine
  • The database lifecycle is managed by the build process
  • The initial state is managed by either the build process or the test

Cons:

  • You’re not accurately modelling the environment
  • Not all features of the production database are supported
  • Different datatypes mean different column definitions
  • Multiple tests touching the same tables can’t be run in parallel without conflicts

If you consider these constraints to be unacceptable, you may consider having an well-known instance of the PostgreSQL database running that is set aside for testing.

Pros:

  • 100% compatibility with the production database

Cons:

  • No guarantee of initial data state
  • Multiple tests within the same build that touch the same tables can’t be run in parallel without conflicts
  • Concurrent builds can lead to inconsistent results
  • Continuous integration builds can be broken by developers running local tests

A further refinement of this approach would be for each developer to have their own instance of the PostgreSQL database.

Pros:

  • 100% compatibility with the production database
  • Developer builds do not intefere with continuous integration builds

Cons:

  • No guarantee of initial data state
  • Multiple tests within the same build that touch the same tables can’t be run in parallel without conflicts
  • Concurrent builds can lead to inconsistent results
  • Developers have to keep their database instance up-to-date (or tooling must be added to manage this)

With each of these approaches, I see the cons as being detrimental enough to partially or completely cancel out the pros.

The take-away

Breaking down the last three paragraphs, we can see the following features are desirable:

  • the database should be tied to the test (not the virtual machine)
    • an implication of this is test parallelization is now possible
  • the database lifecycle should be managed by the build
  • the database should be identical to that used in production

My new favourite solution

Using TestContainers, we can tick off each of these features. Using a JUnit @Rule, TestContainers will start a per-test Docker image that provides a database that lives as long as the test. Because each Docker instance is totally isolated, tests can be run in parallel to speed up builds.

This last point is very important because, as noted above, there always seems to be some drawback. In this case, the overhead of starting the Docker image and everything it contains will increase your overall build time. I would (and do) argue the increased test time doesn’t even come close to impacting on the benefit of having all our desirable features.

Each database supported out of the box by TestContainers has a specific rule, and this rule can be used to obtain all the details needed to connect to the database.

Alternatively…
According to the documentation, it’s possible to have a new container start up by altering the JDBC URL to contain tc:, for example jdbc:tc:postgresql://hostname/databasename. However, this failed in my application due to this line in the driver.

if (!url.startsWith("jdbc:postgresql:")) {

An anecdote

To throw an anecdote in here, I switched an application from using H2 to using Dockerized PostgreSQL in 10 minutes and it had made my life way simpler. We’re using jOOQ for our database interactions, and found ourselves faced with removing the usage of some very nice jOOQ features because H2 didn’t support them.

Let me repeat that. We were faced with changing production code due to limitations in the test environment.

That is not and never will be an acceptable situation, so the discovery of TestContainers was both fortuitous and time-saving. Fortuitous because it gave us exactly what we need, but time-saving? How can I say that when I just said it increases test time? Simple – I don’t need to spend time looking if there is a H2 mode that will support the feature I’m using; I don’t find myself writing code that must later be removed because H2 won’t allow it; I can write my tests and DB-related code and I’m done.

Wow, an entire blog post where you don’t mention Play?

Nope. Here’s an easy way to use it with Play, based on the application I just mentioned.

To start, create a mixin that combines the TestContainer with Play’s database support.

The reason I use a mixin here is because I tend to define DAO tests alongside the interfaces – see my previous post on this approach. It would be nicer if the tests could be defined as mixins because the common DB setup code could then be placed into a common class which could then be extended to implement the test mixins, but JUnit doesn’t recognise tests defined in this way.

So, the abstract test class has no knowledge it has implementations that require a database – it purely tests the contract of the interface.

Back over by our database-specific implementation, we can now make sure that our implementation behaves in the way the contract requires.

Our JooqUserDao implementation will now run against a real instance of the database type used in production.

The TestData interface used in JooqUserDaoTest is just another mixin that loads some data into the database. The implementation isn’t particularly important because it very much depends on your own requirements, but it may look something like this.

6 thoughts on “Database testing with TestContainers

  1. Very interesting idea! I did not know TestContainers. Thanks for sharing 🙂
    Would you tell us how you handled the test data? Most DB-based tests rely on an initial data-set, and expect a transformed data state after the test has run. And it should be possible to run any test anew without having to take care of making it idem-potent.
    Thanks

    1. After the DAO interfaces are written, I write tests for them as described above. In the class-level comments of those tests, I’ll make notes such as “This test expects the following data to be available…”, along with a description of the data. This data matches the domain model used by the DAOs and is unrelated to any any specific persistence mechanism such as a database.

      I then use one or more classes such as TestData (see the end of the post) to fulfill these requirements. Because the data is loaded in the setUp method of the tests, a fresh data set is available for each individual test.

    2. I suggest you to use DbUnit to clean up all tables before each test. It has been working for me in the last 7 years. One tip: DbUnit has very old and ugly API, so I’ve been using this project to help my team when working with it: https://github.com/rponte/dbunitmanager

      This example may help you to understand what I mean: https://github.com/triadworks/vraptor-blank-project/blob/master/src/test/java/br/com/triadworks/issuetracker/dao/impl/ProjetoDaoImplTest.java#L23

  2. Very interesting approach, specially if you need to run tests in parallel. But as you said, the main drawback is time spent running all suite of tests. In majority of projects I’ve been working on this isn’t acceptable. I mean, this may be good for an CI Server but not during the day by day development where every developer needs to run all suite of tests as fast as possible! You know, the faster tests are the better feedback my team gets!

    I’ve been using always the same production database in my tests through Vagrant (as you do and you recommend); and I’ve have cleaned up the tables with DbUnit before each test. It works pretty well and runs very very fast! So using Dbunit seems to me the best solution so far in all my projects.

    One tip, instead of starting up a Docker per test (scenario) I suggest you starting up one per suite of tests and using Dbunit (or other script) to cleaning up all tables before each test. What do you think?

    1. This is also a completely valid approach – it puts you back at the performance level of something like H2 but with the correct database type. The loss of parallelization in the test is offset (and very likely superceded) by the time saved not starting up a container per test.

      To achieve this, it would be a case of altering the code samples above to remove the JUnit Rule and instead set up the database in either a static block or the test constructor.

Leave a Reply

Your email address will not be published. Required fields are marked *