Skip to content

Instantly share code, notes, and snippets.

  • Select an option

  • Save silvareal/a20b6518f9ef781649ab8739a6ed1f29 to your computer and use it in GitHub Desktop.

Select an option

Save silvareal/a20b6518f9ef781649ab8739a6ed1f29 to your computer and use it in GitHub Desktop.
These are the practical guidelines for writing great tests in imperative-styled programming languages

Practical Guidelines For Writing Great Tests In Imperative-styled Programming Languages (e.g Go, Python, Rust, JavaScript)

Common Testing Issues

All common issues with tests from flaky tests to brittle and tests can all be traced back to either coupling or dangling side-effects.

There are several types of coupling that can affect tests as follows:

Testing At The Right Time

Did you know that there's a right time to write tests for any given codebase and there's a wrong time ? From my experience and having collected the experience of others, whenever tests are deployed at the wrong time, bad thing happen and more than that those tests are more than useless. I have often maintained that TDD has its place too. It shouldn't be used every time and for every occassion where a test might be needed. Hence, there's a right time to use TDD and a wrong time to use it (more about this in a later section).

I really don't think this is emphasized enough. More than that, tests should be written in phases and not all at once. TDD should be used to build specific parts of a large software that is already well understood while tests for other parts that are less well understood should come much later.

Impulsively aiming for 100 percent coverage on your tests (whether unit or integration) is never a good strategy to establish long-term utility of said tests (Goodharts' Law comes to mind here).

Did you know that Facebook (Meta) had no automated unit/integration tests in 2011 because tests were very difficult to write ? Now, more than 14 years later, Facebook (Meta) has invested in automated testing. This is an account by former Facebook/Meta employee named Jérôme Cukier who was there between 2013 and 2014.

As a software engineer of 9+ years, I can't say how many times I've seen highly-tangled and highly-coupled codebases with perfect coverage leading to blind trust in a bunch of fickle automated tests! Furthermore, if the codebase for which these tests are written hasn't had much of the coupling untangled such that the setup for each test case doesn't require a lot of upfront code, it simply means there's a problem with said codebase (i.e. the codebase design/structure) and someone needs to go fix that first before writing even a single non-useless automated test.

Therefore, the 3 times to not write tests for any codebase are as follows:

  1. When the codebase has existed without tests for a long time (a.k.a legacy code) and is currently a tangled coupled mess (i.e. changing something in one place breaks it in another place in unpredicatable ways).
  2. When you don't know how to completely achieve and build what you want to build. Hence, when you are still figuring out what the sub-components of the logic should look like or how it each sub-component should work (i.e. alone and together). More importantly, there's a gap in your knowledge of how to implement the logic.
  3. When the feedback loop from the test is slow AF (i.e. slow As Fuck!) compared to how necessary it. is for the test to exist.

For instance, say you want to build a parser for a custom domain-specific language (DSL) at work but you don't even know how to start. You don't know the what the sub-components of the parser should be or what they should look like. You have no idea of the big picture environments where the parser is supposed to fit in and/or function.

Do you really think writing tests is going to help you at that point ?

Maybe your answer is "i don't think so" and you'd be right.

it's not and i wonder why people think otherwise. I blame dogma!

You see, writing tests is an investment as has to be strictly viewed as such. Before anyones invests in a business or asset on the financial market, it isn't abnormal to check a few things about the business or asset. This check saves you from investing in a business or asset that won't bring returns. This is how you should view testing as well.

When a piece of software is going through an intial phase of development and release (i.e. the pre-alpha phase to be precise), it is possible to find that there are hardly any automated tests present within the project folders yet and there's usually a disclaimer that reads like this:

This project is still in active development and so not suitable for use in production systems.

This isn't accidental or uncommon. It certainly isn't strange either. It is perfectly normal to find that tests are missing for a piece of software that has only had one or more pre-alpha release(s). This is not to say that software in its' pre-alpha phase are never tested - quite the opposite. But, the tests are not automated. All tests in a pre-alpha phase are adhoc (i.e. manual - by a human) and done by either QA engineers or by the software engineers in a REPL.

A REPL (or a playground/sandbox) is a testing and debugging environment that offers a very fast feedback loop. It is perfect for the initial stages of software development and release when a lot of things (around data structure, logic design and domain knowledge) are still in flux and decisions are still very volatile (i.e. frequently changing). Lots of REPLs exist but i prefer online REPLs like JSBin, CodeSandbox, PHPSandBox. These days, REPLs are very powerful.

Another option is to rely on where there are no automated tests yet is upstream and intense dogfooding. Dogfooding has a lot of benefits if done right. It can foster a sense of ownership among the software developers and quality assurance folks working on the software.

Another great option is to rely on monitoring and error tracking as well as alerting in place of automated tests. I usually advise you start with auto instrumentation of the software coddebase and then gradually move to manual instrumentation.

Creating automated tests before you have had a chance to properly figure out the structure of the data and logic in the UI or Database or Domain or understand the quirks of your thrid-party depenencies is a sure fire way to paint yourself into a corner after wasting lots of time and effort investing in software tests.

Please, don't do it 🙏🏾. I beg you.

Mocking For Mere Mortals

In one of my earlier articles on software testing (Part 1, Part 2 and Part 3), i wrote about certain guidelines that should aid anyone approaching mocking for automated tests.

  • Mock only what you control and own (i.e. first-party logic and dependecies).
  • If you don't control and own it (i.e. third-party dependencies), wrap it around something you own and control and mock that.
  • If you cannot wrap it around something you own and control (e.g. filesystem), separate you code
  • When writing integration tests, make use of either the real thing or a mock that contains logic similar to the real thing.
  • When writing integration tests, using the real thing is a tradeoff that can should depend on how fast you tests run.
  • When writing unit tests, you should have zero need for mocks at all else your code is tightly coupled and you should fix that.

Relational database Mocks that contain logic:

I find that whenever mocking doesn't work well it is a failure around loose coupling and proper encapsulation.

In UI Testing and CRUD-related testing, it is very common to test side-effects (e.g. text appearing on the UI or UI elements being disabled or whether a piece of state was written to a database). The only reason you will want to assert on side-effects is because you have business logic that is coupled to the UI logic or the DB logic.

There are 2 ways to deal with this.

  1. Structure your code such that when it is being tested, the assertions are happening on variables or items that are explicitly created within the context of the test case.
  2. Make use of a cheaper way (usually with a mock that contains working logic - a fake) to interact with the side-effect in a deterministic manner.

As such, it will never make sense to run assertions against a hollow mock lacking working logic (a.k.a fake) when running integration tests. The costs are just too high. However, using a fake mock can help but can have its drawbacks too.

See a madeup example below (The test case used is based on both Jest and Vitest but still imaginary): 👇🏾👇🏾

/* @HINT: Domain-Driven Design artifacts */
import DataModel from '---';
import DomainModel from '...';

/* @HINT: Test Helpers */
import { transformSQLQueryUsing, mapToValuesClause, mapToWhereClause } from '__';

/* @HINT: Test "imaginary-test-framework" Utilities */
import { expect, test, act, waitFor } from 'imaginary-test-framework';

/* @HINT: This is an import of `pg-mem` NPM library - for in-memory database emulation */
import { newDb } from '../../db';

test('[@kyc]: ensure domain logic can register a new user to the postgresql database', () => {

  /* "Arrange" Step for this test case */
  const dataModel = new DataModel();
  const domainModel = new DomainModel(dataModel);

  
  /* @HINT: Test Fixtures */
  const columns = ['*'];
  const values = ['Helen', 'Agba', 23, false];
  const testCaseValues = {
    first_name: values[0],
    last_name: values[1],
    age: value[2],
    verfied: value[3]
  };
  const expectedResultOnCheck = [{
    ...testCaseValues,
    id: 1
  }];
  
  
  /* @HINT: initialize proxy to the in-memory DB */
  const postgreSQLMemoryDB = newDb();


  const { any, an } = expect.utils;
  let result = null;
  let selectSQLStatement = '';

  const sqlQueryTestTask = async () => {
    /* @HINT: This is making use of the in-memory DB under-the-hood (swaped based on the value of `process.env.NODE_ENV`) */
    result = await domainModel.registerNewUser(testCaseValues);
  };

  /* @HINT: These First 2 assertions are testing the error-handling - Very Important Yet Hardly Contemplated! */
  expect(result).toBeNull();
  const promiseForSQLQueryTestTaskAssertion = act(() => {
    return expect(sqlQueryTestTask).toResolve.andNot.toHaveRaised(any.Error);
  });
  
  
  return waitFor(promiseForSQLQueryTestTaskAssertion).then(() => {

    expect(result).not.toBeNullOrUndefined();
    expect(Array.isArray(result)).toBe(true);
    
    /* @HINT: Using an explicit dependency `dataModel` to assert as opposed to an implicit dependency */
    expect(
      dataModel.getLastExecutedSQLQueryAsString()
    ).toContainAllOfSubStrings([
      dataModel.tableName,
      (columns.length === 1 && columns[0] === '*' ? [] : columns).join(', '),
      mapToValuesClause(values)
    ]);

    try {
      /* @HINT: Transform one SQL statement to another by extracting the table name ONLY from the intial SQL statement */
      /* @NOTE: Only an `INSERT` statement can be transformed into a `SELECT` statement using `transformSQLQueryUsing()` */
      selectSQLStatement = transformSQLQueryUsing(
        dataModel.getLastExecutedSQLQueryAsString()
      )
      .fromContext(columns.slice(0), mapToWhereClause(values.slice(0, 2)))
      .toAnSQLStatementOfType("Select")
    } catch ($e) {
      /* @HINT: Ensure the SQL query statement from `dataModel.getLastExecutedSQLQueryAsString()` is an `INSERT` statement */
      /* @NOTE: This assertion here 👇🏾 is conditionally executed - only when `transformSQLQueryUsing()` throws an error */
      const partialMessage = "Expected an `INSERT` query";
      expect($e.message.substr(0, partialMessage.length)).not.toBe(partialMessage);
    }

    /* @HINT: Execute the `SELECT` statement from above and assert on its' result */
    expect(
      postgreSQLMemoryDB.public.many(selectSQLStatment)
    ).toEqual(
        an.objectContaining(
          expectedResultOnCheck
        )
      )
    );
    
    /* @HINT: Release retained references (just in case) so GC can claim memory back */
    postgreSQLMemoryDB = null;
    dataModel = null;

  }).end();
});

Usually you don't need to worry about Data coupling (e.g. message passing) and Control coupling (flag/switch passing) so much except the message or flag being passed is frequently changing shape or structure (e.g. a flag can be boolean or bitwise - so switching between boolean and bitwise structure over time creates coupling in the flag being passed)

There are several types of dangling side-effects as follows:

  • Memory leaks (e.g. not clearing memory used of previous run tests)
  • State leaks (e.g. not clearing data from centralized storage like a database or localStorage leading to corrupted data)

4 Ways To Reduce/Eradicate Brittle/Flaky Tests

  1. Decouple your test code from your application code.
  2. Decouple your test cases/suites from one another.
  3. Decouple your framework/library logic (e.g. ReactJS/Laravel/Rails/Tauri) from your custom logic (e.g. ui/business).
  4. Decouple the different parts of the application code especially parts that dont go together or change together.

Test code should only be tightly coupled to the behavior of the application code but not to the implementation of it (i.e. Tight Coupling Is Desirable In Writing Tests) so that the only way a test breaks is when the behavior or public interface changes. Also, when writing tests, we might expereince some pain or friction and this friction might be pointing us to ways to fix the design/structure of the code (i.e. Fixing Design With Tests) to make testing the code easier.

How To Quarantine & Fix Tests

One thing to keep in mind as you fix broken or brittle/flaky tests is that you cannot have deterministic output or side-effects if any of the inputs/dependencies to your test cases aren't deterministic (i.e. hidden or uncontrolled).

  • First Step: Skip the test(s) temporarily.
  • Second Step: Ensure you are only using string/number literals INDIRECTLY in assertions.
  • Third Step: Ensure that assertions do not point to non-deterministic side-effects and/or implicit implementation details (e.g message chains).
  • Fourth Step: Fix test by correcting any form of deep coupling in the codebase (move most of the implicit dependencies closer to the entry point of the application software hence to top of the callstack)
  • Fifth Step: Improve assertions by asserting only on deterministic outputs/side-effects (If side-effects are expensive to assert on - e.g. slow-running tests due to having a test database setup and teardown on each test case that runs, then consider testing other deterministic side-effects that lead to the expensive one).
  • Sixth Step: Eliminate any hidden/uncontrolled inputs from testing flow & execution.
  • Seveth Step: Ensure assertions are made for every deterministic side-effects (state transformations)
  • Final Step: Unskip the test(s)

Intentionally Break Your Tests To Reveal Edge Cases

We are often wired as humans to seek what is easy and comfortable. Therefore, while writing tests we test the happy paths in the codebase too often than we test the unhappy paths. This approach will leave an opening for edge cases to linger on within the codebase.

Where do edge cases come from ? Great question. I am glad you asked.

The more generic stuff a piece of code contains, the more likely it is to contain one or more edge cases. If a function handles all sorts of strings (i.e. strings that can contain a wide range of characters), it is more likely to have one or more edge cases too. If another function handles a specific set of strings in a specific way, it is less likely to have one or more edge cases.

Generic stuff leads to "endless possibilities". Specific stuff leads to "limited possibilities". The difference in possibilities is where edge cases come from.

Have you ever noticed that a simple finite state machine with 2 possible states has less bugs than that with 100 possible state ? It's not accidental. It is the difference between "limited possibilities" and "endless possibilities". The more states you tack onto a finite state machine, the more endless the "possibilities" are.

Mid-level & Low-level APIs/modules or dependencies in a software codebase are usually the ones with the most edge cases since they usually contain more generic code (a.k.a non-domain specific logic) and hence more "endless possiblities".

High-level APIs/modules are usually more specific (i.e. specific to the needs of the domain - domain specific logic) and hence usually have less edge cases.

Breaking automated tests should be approached the same way a software engineer would typically approach stress testing. A stress test can only be considered successful if and when the software under stress buckles or comes apart under the stress.

If a function in one of your test cases recieves a string and processes it, pass a very very very lengthy string to it and see how it does under the "stress". Keep increasing the length until the function comes apart.

You can always trust your neighbourhood QA engineer to treat your freshly checked-in code like a black box and break it silly.

So again, how to you squash edge case bugs ?

Answer: By breaking your tests intentionally (😏 duh!!).

How does one break tests ?

Answer: By using the System Under Test (SUT) in ways it wasn't intented to be used (e.g. passing invalid arguments or setting up dependencies incorrectly)

Okay, if you have 5 test cases in total (within a single test suite) for your tiny todo app, well sure - try breaking all 5.

But, what if you have 1500 test cases or more ? Do you go trying to break them all ? Errrm, No.

From among all 1500 test cases, it is important to focus on only the test cases that test System Under Test (SUT) containing generic stuff as stated earlier and try to break them.

Still, this is usually not enough to identify all edge cases. This is because while isolating an System Under Test (SUT) from hidden inuts/effects is usually a win in order to ensure deterministic results from each automated test case. However, this isolation, is also the main reason why so many edge cases can go unnoticied.

This happens because automated test cases are setup so the System Under Test (SUT) doesn't interact with anything else other than the data that it needs in order to demostrate correctness of its' function (in the test/dev environment).

Yet, deep edge cases often lie dormant in the interaction gap between one self-contained piece of logic and another. Hence, these edge cases are only often activated in the production environment where one self-contained piece of logic now interacts with a dozen others.

Testing in Production

What made Docker so popular was that it made the difference between a dev/staging/test environment and a prod environment very little with a few steps. Prior to Docker, it took a lot of effort to achieve this. Yet, for a long time, there hasn't been much tooling for test environments specifically in this regard (except for recently with Test containers which can sometimes make running tests on a CI slower).

Yet, even with technologies like Test containers, production systems are a whole different beast entirely. Why ? Well, there's no more isolation like with test cases anymore. Everything and anything can happen now. Different pieces of logic from different module are running at around the same time and perhaps clashing into each other.

Moreso, in the face of multi-threading, the amount of possiblities explode and the amount of combinatiorial outcomes that could potentially becomes a bug or a fatal and expensive error ballon in size.

This is where methods like Choas Engineering and also Stress Testing can improve the odds of unearthing more edge cases.

Use Monitoring/Observability Tools To Help Create New Test Cases

Just like an online REPLs can be used to test and debug in the absense of automated tests, so too monitoring/observability can be a great ancillary and/or forerunner to setting up automated tests.

This is because monitoring/observability offers oppourtunities to learn more about the system as a whole. Each time something goes wrong in the systems (as users make thier way through the system(s)), it is a good oppoutunity to collect data and details on what happened and what should become a novel automated test case.

In my view, monitoring/observability should be first-class considerations while starting on any greenfield software project.

This approach makes testing in production much more approachable too.

Testing At The Right Level

All Software

The major issue is that we don't have a good or great definition for what a unit is. In fact, the error of writing a unit test for something that isn't a unit is so widespread. The truth is from my experience integration tests offer the best out of the 3 types of tests. Still, unit tests have their place as edge case detectors (sometimes).

Any software codebase is mostly composed of what i call "integrations" no matter how simple or complex it gets.

The lowest-level dependencies in our application software pass as true units especially if you were to inspect the dependency (directed) graph of said codebase (i.e. The nodes in a dependency graph for a codebase that don't have any edge leading away from them are the lowest-level ddependencies and hence true units).

Upon this inspection for the dependency graph, you may find that the number of true units in any codebase is very small. Yet, the number of integrations in any codebase is very high. At the point where the entry point (or root source file) of any software codebase is exposed is one giant integration too. Therefore it makes sense that true integrations out-number true units. Hence, integration tests will be greater in number. Its just math. E2E tests are actually integration tests at a higher level of integration - think about it.

Many software engineers expect to have a lot of confidence about a codebase from unit tests but then are sad to find out that unit tests offer the least. confidence while integration tests offer the most. Also, unit tests are really the low-hanging fruit of testing, offering the least benefit and should be treated as such.

The reason why unit tests don't work out as well as software engineers hope in terms of confidence and utility (not being so brittle) is because every software component or module is treated as a unit. The assertions are not done on

This makes testing at the right level very important. Especially, if you need your tests to be highly beneficial to you now and into the future.

  • Only integration tests should be written for true integrations.

  • Only unit tests should be written for true units.

  • Yet unit and integration tests should never be viewed as mutually exclusive.

What Is A Unit Really ?

The definition of a unit of software has been very loose so many years. Even the greats like Uncle Bob have admited to this in 2023. See below:

Screenshot 2025-05-27 at 12 18 11 PM

The existing definition is that a unit is any self-contained function or a class. Yet, this is not a very accurate definition as we now know. So, can we ever have a proper definition for a unit ? Yes, i believe so. This will be a standard definition that can apply to any software codebase of any shape, size or environment.

A true unit contains no software state that it explcitily controls and no dependencies that explicitly belong to the software codebase in question (i.e. they only contain third-party dependencies).

Therefore, a true pure function (e.g. in a functional programming language) is a true unit but a custom ReactJS hook is not a true unit. Also, a leaf coomponent (or presentational component) in ReactJS is a true unit but any other component that has state and controls it is not a true unit.

A unit maps directly to the leaves of a dependency tree/graph for any software codebase where the entry point (i.e. main function/method) of said software codebase maps to the root of the dependency tree/graph.

Testing Pains Arising From Too Many Implicit Dependencies

How many times have you added a wrapper <div> to the markup of a web UI only ti break you UI integration tests ?

It's important to make your tests resilient to change. Kent C. Dodds has mentioned in his article from 2019 that the semantic structure. (i.e. the accessibility tree) of any web UI is more resilient to change because it tends to change less frquently than say class attributes.

If the dependency tree (or dependency graph) for a given software codebase is generated, it can tell a story about how well said codebase is designed. When a codebase makes excessive use of implicit dependencies farther away from the the entry point of the software codebase (or if at runtime - farther away from the base of the call stack), it creates a lot of for readability as well as testability.

The reason why many pieces of code are hard to test is because they depend on other stuff implicitly rather than explicitly. Also, implicit dependencies can leak from within a seemingly self-contained piece of code due to a failure of proper ownership and tight encapsulation. This further creates problems when trying to interact with said piece of code via a test case.

For instance, take a look at the code below

test('test me', () => {
  /* Act: 👇🏾👇🏾 */
  const result = systemUnderTest();
  
  /* Assert: 👇🏾👇🏾 */
  expect(result.internal.implicit.stuff).toEqual(null);
});

Let's assume that the assertion in the test case above (written in JavaScript) passed the last time it was run.

As you can see, the assertion is being done on an implicit dependency tucked inside the object (reference type) returned by calling the systemUnderTest(..) function. The message chain is lengthy and messy obviously.

If for some reason the function (i.e. systemUnderTest(..)) is rewritten (not refactored) deeply and no longer returns an object (which assigned to result above) whose internal property points to some other object that doesn't have an implicit property ? What do you think will happen to the assertion above ? Will it still pass ?

Your guess is as good as mine. In fact, your guess should be the same as mine which is: The test fails!

It is important to note that this sort of thing can happen whether you use TDD (a.k.a write the test first, then, the code last) or not.

It is also important to know that there's a difference between a refactor and a rewrite. I have met software engineers who insist that refactor testing doesn't work. Well, that's usually because they had mistaken a rewrite for a refactor.

Refactoring doesn't change publicly observed behavior of a self-contained piece of code. If ever it does, then, it isn't a refactor.

What we did above (i.e. the systemUnderTest(..) test case) is a rewrite because it changed publicly observed behavior. Now is there something we can do to remedy the situation of this test case failing ?

Yes there is. It's to fix the poor encapsulation leaking out of the systemUnderTest(..) function. How ? See below:

test('test me', () => {
  /* Act: 👇🏾👇🏾 */
  const result = systemUnderTest();
  
  /* Assert: 👇🏾👇🏾 */
  expect(result.getInternalImplicitStuff()).toEqual(null);
});

So, now we can simply call result.getInternalImplicitStuff(...) can get what we need to assert on. Yipeee! We did it!!

But if what result.getInternalImplicitStuff(...) returns changes again later, are we back to square one ?

No, we are not. The fix above is a great fix.

Great tests are very sensitive to breaking changes in predictable ways (i.e. breaking chnages around the public interface only). Howerver, horrible tests are very sensitive to breaking changes in unpredictable ways.

As long as what result.getInternalImplicitStuff(...) returns doesn't change, our test case doesn't fail on that assertion.

Remember, i said earlier that: "Your tests should be tightly coupled to external behavior not internal implementation". Still, life is not always that perfect.

There are times when we could make exceptions to asserting on implicit depedencies. Take a look at the test case below

import api, { HttpStatuses, MimeTypes } from '...';

test('http request for adding to cart', async () => {
  // Test Fixture
  const items = [{
    product_id: "siI940nzzJjd",
    category: "furniture",
    name: "flexible armchair (ergonomic)",
    price: { amount: 230_000, currency: "kobo" }
  }];

  const response = await api.addToCart({
    items,
  });
  
  expect(response.status).toBe(HttpStatuses.CREATED);
  expect(response.headers.contentType).toBe(MimeTypes.JSON);
  expect(response.body).toBe({
    message: "Cart Item Created Successfully",
    data: { items, stage: "pre-checkout", sharable: true, id: "Jsn4097jNsn" },
    meta: [{ apiVersion: 1749115903182, depracations: 0 }]
  });
});

Based on everything we have said so far, the HTTP test case above ought to be an example of a bad test right ? We are clearly asserting on an implicit dependency (implementation detail) that is likely to change right ? Yes, all correct. However, how exactly does it change ? This question is important as we will see soon enough.

This implementation detail here (i.e. response) is such that it's structure doesn't change much over very long periods of time compared to what we looked at earlier (i.e. result.internal.implicit.stuff). You see result.internal.implicit.stuff can change in more unpredicatable ways than response.status or response.body. Why ? Well for one, response is based on a well-known standard protocol called HTTP which has not changed from v1 to v1.1/v2. From v1 to v2 (for over 10+ years), the concept and structure of a HTTP response has not changed. Secondly, even though the contents of response.body can change more frequently than say the content of response.status, it doesn't by a lot (i.e. the structure and data type usually doesn't change that much over long periods of time). Yet, the structure of result.internal.implicit.stuff can change within a week to result.inner.internal.implicit.stuff and keep changing every other week after that.

Therefore, even though response is an implicit dependency, it can be asserted on just fine. Hence, the HTTP test case above is a good test (not a bad one). Yet, it is not a great test.

Firstly, looking at the last assertion, it is too specific in what it specifies. It is specifying the entire structure of the JSON response body which is risky because this can chnage much often. The assertion should be a little bit loose and less specific.

Secondly, response.status refers to a single idea (i.e. HTTP status codes) which have remained the same for 10+ years till date (i.e. little to no change in 10+ years). Yet, the status property name might be decided by an MVC framework/library (e.g Laravel, Django, ExpressJS, Pheonix) and not by us. The same applies to response.headers and response.body.

To move the test from good to great, we have to decouple our test from the MVC framework/library by using macros provided by the MVC framework/library to defined custom accessors (i.e. custom methods that return the values from these properties). The main outcome of this is to give use a public API that we own and control. Remember, i did said earlier that "If you don't control and own it (i.e. third-party dependencies), wrap it around something you own and control". This ensures that when the framework decide to release a breaking change (e.g. changing response.status to response.statusCode), our test case is unaffected by it.

Now, take a look at the same HTTP test case modified:

import api, { HttpStatuses, MimeTypes } from '...';
import { successMessage, apiType } from "constants/contract-copy";

test('http request for adding to cart', async () => {
  // Test Fixture
  const cartEntry = {
    items: [{
      product_id: "siI940nzzJjd",
      category: "furniture",
      name: "flexible armchair (ergonomic)",
      price: { amount: 230_000, currency: "kobo" }
    }],
  };
  const response = await api.addToCart(cartEntry);
  
  expect(response.getStatusCode()).toBe(HttpStatuses.CREATED);
  expect(response.getContentType()).toBe(MimeTypes.JSON);
  expect(response.getEntityBody()).toMatchObjectStrictly({
    message: successMessage(apiType.ADD_TO_CART),
    data: expect.objectContaining({
      items: expect.any(Array).ofLength(cartEntry.items.length).withContentsOfType(Object),
      id: expect.any(String).withPattern(/^[A-Za-z0-9]{13}$/),
      stage: expect.any(String).withAnyOfOptions(
        [
          "pre-checkout",
          "in-checkout",
          "post-checkout",
          "pre-transit",
          "in-transit",
          "delivered"
        ]
      ),
      sharable: expect.any(Boolean)
    }),
    meta: expect.arrayContaining([
      expect.objectContaining({
        apiVersion: expect.anyOf('UnixTimestamp'),
        depracations: expect.anyOf('UnsignedInteger')
      })
    ])
  });
});

Notice that the updated assertion (aove) no longer asserts in a very specific manner but in a less specific manner. We can leave the more specific assertions to end-to-end UI tests.

Therefore, while this article is partially correct about using UI tests, it is incorrect in asking that assertions should never be done on HTTP responses.

RELATED SIDE NOTE: The 2 things which the functional paradigm of programming got absolutely right are the use of function composition and the total ban of implicit dependencies.

Implicit dependencies are nasty stuff! Apart from the fact that they can come with the issue of context dependence, they can also cause all sorts of problems from readability to testability. However, they can be tamed especially in any imperative-styled language (e.g JavaScript, C#, Java, PHP, Go, e.t.c). The way you tame them is to well make them explicit and push them further and closer to the entry point of your softaware codebase.

In case you haven't caught on yet, every import in every source file for every imperative-styled language is an implicit dependency. Below are examples of a few:

Ruby

require 'time'
require 'date' # implicit dependencies

JavaScript

import React from "react"; // implicit dependency

PHP

use Ramsey\Uuid; // implicit dependency

Golang

import (
  "fmt"
  "sync"
) // implicit dependencies

Python

from threading import Lock
import sys
import json # implicit dependencies

These implicit dependencies make our source code hard-wired to the implamanetation they provide irrespective of whether dependency inversion is deployed or not. Hence, we should have guidelines for how we manage and tame them.

The guidelines are as follows:

  1. The farther away the imports in your source file(s) (e.g. Java, PHP, JavaScript, Go, Ruby, Python e.t.c) are from the entry point of your apps' codebase, the lesser you should employ them as imports (or implicit dependencies) and make them explicit dependencies.
  2. Use dependency injection as a way to manage the explicit requirements for dependencies making source code more transparent.
  3. For custom-made implicit dependencies that cannot be made explicit (especially in environments where dependency injection is not a feasible strategy), at the point of export, ensure you export a fake/mock version for testing purposes.

See examples below:

import axios from 'axios';

const envProd = "PROD";
const envDev = "DEV";


expost const axiosHttpClient = (process.env.NODE_ENV !== envProd || process.env.NODE_ENV !== envDev)
? exportMockAxios()
: exportRealAxios();

Best Practices In Testing

Testing Library (DOM Queries)

/* Using indirection to share data between your tests and markup */
export const buttonAriaLabel = "Service appartment drive";
export const roles = {
  heading: "heading",
  button: "button",
  presentation: "presentation",
  application: "application",
  paragraph: "paragraph"
};
import { roles, buttonAriaLabel } from '...';

export const Button = ({ children, type = "submit", ...props }) => {
  return (
    <button type={type} role={roles.button} aria-label={buttonAriaLabel}>
      {children}
    </button>
  );
};
import { render } from "@testing-library/react";
import { Button } from '---';
import { roles, buttonAriaLabel } from '...';

test('testing a button', () => {
  const { getByRole } = render(<Button>Click Me</Button>);
  /* Using shared variables to make DOM queries */
  const button = getByRole(
    roles.button,
    { name: buttonAriaLabel }
  );
});

Jest (Assertions)

/* Using variable rather than literals in assertions */
expect(x).toBe(y);

Cypress (Setup/Teardown)

/* Using `beforeEach` and `beforeAll` hook(s) to clean up or reset initial state */
let customer = {};

beforeEach(() => {
    // Set application to clean state
    cy.setInitialState()
      .then(() => {
        // Create test data for the test specifically
        return cy.setFixture('customer');
      })
}):

Jest (Mocking)

/* Using proper fakes (mocks that do contain logic/implementation) in tests */
import { createMemoryHistory } from "history";

const history = createMemoryHistory();

jest.mock("react-router-dom", () => ({
   ...jest.requireActual("react-router-dom"),
   useHistory: () => history,
}));

jest.mock("react-router", () => ({
   ...jest.requireActual("react-router-dom"),
   useHistory: () => history,
}));

Jest (Setup/Teardown)

/* Using `beforeEach` and `beforeAll` hook(s) to clean up or reset initial state */
beforeAll(() => {
  stubCallback.mockClear();
  stubOnceCallback.resetAllMocks();
});

/* Using `afterEach` or `afterAll` hook to release memory */
afterEach(() => {
  server.close();
  // release retained reference so Garbage Collector (GC) can collect
  server = null;
});

Antipatterns In Testing

Testing Library (DOM Queries)

/* Trying to use sring literals (which are like to change) to directly query the DOM */
const incrementButton = getByText('Increment');
const counterDisplay = getByText('0');

Jest (Assertions)

/* Trying to manipulate timers for a number of seconds/milliseconds */
jest.advanceTimersByTime(500);

/* Trying to use strings/number literals DIRECTLY in assertions on either expected & actual values */
expect(1).toBe('1'); // (e.g. 1 and '1')

Jest (Setup/Teardown)

/* Trying to clean up or reset initial state within a `afterEach` or `afterAll` hook */
afterAll(() => {
  stubCallback.mockClear();
  stubOnceCallback.resetAllMocks();
});

/* Trying to release memory within a `beforeEach` or `beforeAll` hook */
beforeEach(() => {
  server.close();
  // release retained reference so Garbage Collector (GC) can collect
  server = null;
});

Cypress (Assertions)

/* Trying to start a web server from within Cypress scripts using `cy.exec()` or `cy.task()` */
cy.exec();
cy.task();

/* Trying to pause for maual intervation or wait for a number of seconds/milliseconds */
cy.pause();
cy.wait(500);

/* Trying to intercept and wait with a timeout (or without a timeout - https://docs.cypress.io/api/commands/wait) */
cy.intercept("GET", "http://localhost:7771/api/agent/report/copy-style").as("getStyle");
cy.wait('@getStyle', { timeout: 3000 });

// Playwright has a better API that waits without timeouts but using async/await (https://playwright.dev/docs/api/class-page#page-wait-for-response)
const responsePromise = page.waitForResponse('https://example.com/resource');
await page.getByText('trigger response').click();
const response = await responsePromise;

// Playwright has a. better API using async/await (https://playwright.dev/docs/api/class-page#page-wait-for-url)
await page.click('a[href][role=button]');
await page.waitForURL('**/target.html');

/* Trying to get a reference to a DOM node using a static/unique selector to click() */
cy.get('.btn.btn-large').click()

/* Trying to use strings/number literals DIRECTLY in assertions on either expected & actual values */
cy.get('#main_title').should('have.text', 'Main Section'); // (e.g. 'Main Section' and '#main_title')

Playwright (Assertions)

/* Trying to get a reference to a DOM node using a static/unique selector to click() */
await page.click('.btn.navigation');

JUnit (Mocking)

/* Trying excessively to use the sleep API for threads */
Thread.sleep(3000);

Jest (Mocking)

/* Trying to use an empty mock (i.e. a mock that doesn't contain logic/implementation) in tests */
jest.mock("react-router-dom", () => ({
   ...jest.requireActual("react-router-dom"),
   useHistory: () => ({
     location: {
       key: 'default',
       search: '',
       hash: '',
       pathname: '/'
     },
     state: null,
     back: jest.fn(),
     push: jest.fn(),
   }),
}));

jest.mock("react-router", () => ({
   ...jest.requireActual("react-router-dom"),
   useHistory: () => ({
     location: {
       key: 'default',
       search: '',
       hash: '',
       pathname: '/'
     },
     state: null,
     back: jest.fn(),
     push: jest.fn(),
   }),
}));

TDD Is Not The Same As Test-First

I think that the dogma surrounding TDD can put people off most times. TDD has a learning curve however much of that learning curve is learning how to write good/great tests and learning about goodd/great software design. In my experience, if a software engineer doesn't know how to write good/great tests, he/she/they would hate TDD. It's destiny.

Yet, TDD has been thoroughly misrepresented over the years as test-first especially with the completely context-dependent 3 laws of TDD. I believe this is why DHH wrote his quite famous article in 2014 titled: TDD is dead, long live testing!.

TDD is touted as very misunderstood by those who strictly swear by it. Well, it is misunderstood for good reason. Most TDD eggheads (i'm one of them by the way) have a blindspot in the way they explain TDD to others. They start off by stating that TDD helps you write and create executable specifications for observable behavior for any piece of code with no additional context.

Then, the TDD eggheads go further to explain the importance of writing a test first before writing the code because it enables more focus on designing the external interface of a piece of code before worrying about the internal implementation. TDD makes us intentional about the programming interface rather than write some code anyhow we want and get what we get (which may not be. very testable).

Now, in as much as i agree (in part) with everthing other TDD eggheads say, here's the blindspot: TDD is not the same thing as test-first. TDD is not about writing a test first devoid of context. It requires additional set of skills in addition to being very skillful at writing good/great tests.

Here's the punchline (i.e. the crucial stuff) which Kent Beck left out of his book on TDD:

  1. If you don't know how to write great tests, you wouldn't be able to TDD effectively.
  2. TDD requires a specific approach - without it, you'll be frustrated. It requires also that you know what proper software design entails. Without knowledge of software design thinking and principles, doing TDD will still lead to nothing valuable.
  3. TDD gives you fast feedback about design not about testing. Writing tests alone doesn't fix your codebase design issues.
  4. Testing alone doesn't care if you write the tests before or after you write code. However TDD only cares if the tests informs on design issues.
  5. TDD doesn't survive rewrites, it survives refactors. This means that if you refactor without chnaging the public interface or the means of collaboration between public interfaces, then your TDD tests are still useful after the refactor. But, if you rewrite and change the public interface, then you'd have to delete the TDD tests and start over.
  6. TDD can be used with integration tests too and not only unit tests.
  7. Test-first (not TDD) can be used in situations where feedback is non-existent.

TDD should be used only when these 2 pre-conditions are met:

  1. You know enough aout how to create the behaviour you need. This means that there are no gaps in your knowledge about how to create the desired logical behaviour.
  2. You are employing outside-in reasoning which is an aspect of software design. This means that you are not pre-occupied with the implementation details at first. You are focused on the public interface and how it interacts with other artefacts/interfaces first before considering implementation details.

In my experience, whenever these 2 pre-conditions weren't met, TDD instantly becomes a pain to use.

I would often find myself putting TDD aside during spikes or when building a PoC (proof-of-concept) where i don't yet fully know how to create the desired behaviour. This meant i won't write tests at all while creating a PoC. The act of building the. PoC helps. me fill the gaps in my knowledge so that later i can use TDD effectively and employ better software design than when i was writing the PoC.

Furthermore, there's a reason why TDD doesn't work for some engineers yet for others it works. The reason is the difference in approach to writing code between two types of engineers.

  • The top-down engineer.
  • The bottom-up engineer.

The top-down engineer is a very conversant with and/or mindful of the big picture while designing or building software. This type of engineer starts by designing the interface for the software logic and looking at use-cases where the API would work or is meant to fit.

The bottom-up engineer doesn't see the need to be conversant with and/or mindful of the big picture while designing or building software. This type of engineer starts by focusing too much on the implementation for the software logic rather than the interface. This type of engineer prefers to think with their hands.

TDD provides the oopourtunity to think about how our code will present itself to the rest of the software program and how we can decide what to encapsulate and what to expose. TDD enables software design in this way. TDD can help us tell when implicit dependencies are getting in the way of good design.

TDD is less about writing tests first because sometimes writing a test first might not be the best move in a given instance. In order to write a test first and write it well, you have to deeply understand the function of the code you are writing the test for. Otherwise, you risk writing a test that is rigid and full of wrong assumptions and/or deductions.

This is why writing a test at the right time is very crucial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment