Validating examples in documentation using JUnit 5 and JShell

The documentation on evitaDB site is getting bigger and bigger. The more examples we add, the more we're afraid they will become obsolete or broken. How do we tame this beast of hundreds of examples?

Because evitaDB is built on a Java platform, it has all the advantages and disadvantages of Java. Java is a statically typed and compiled language. Before you can execute a piece of code, you have to compile it and load it into a class loader. Our code examples are scattered across various MarkDown files - sometimes directly embedded, sometimes referencing a separate file in the repository. At first, we thought there was no easy way to verify the validity and consistency of the documentation code examples. Initial thoughts were to write a custom Maven plugin that would generate test code, wrap the examples in a Java template file, compile them with Javac, and then run them as part of our test suite.

Fortunately, there is easier and more dynamic approach. Java provides a JShell REPL (since version 9) allowing you to interactively enter, compile, and run Java source code. The JShell can also be executed programmatically, although it's not its primary use-case (that's why the information about this approach are scarce and hard to find). We suspected the JShell might be a way for overcoming the difficulties with the compilation step and were determined to at least try it.

The next piece of puzzle is a JUnit 5 and its great support for dynamic tests that we learnt along the way.

So let's dive in.

The complete working code is available in our repository in class .

The example files are usually placed in a separate file that is referenced by a special Next.JS component named SourceCodeTabs. The exact format is

The notation is optimized to work for simple MarkDown rendering on GitHub when Next.JS components are not interpreted. The referenced file usually has a .java or .evitaql extension, but there are multiple files with the same name that differ only by an extension. These files are rendered on the documentation portal in an interactive widget that allows you to switch between different language versions of the example:

The extensions of these files are the key component for recognizing which language the example was written for.

Extraction of the code samples from the MarkDown

Extracting the code to verify is the easiest part of the process. It consists of a deep traversal of the folder containing the documentation files using Java File Walker:

... reading the file contents to a string, and extracting the code blocks either directly from the MarkDown file itself or from the externally referenced file (see the createTests method body). Extraction involves only a little bit of RegEx pattern magic.

Generating JUnit 5 dynamic tests

The next piece of the puzzle is the dynamic creation of JUnit tests - ideally with a single separate test per block of code example. Fortunately, the authors of the JUnit 5 framework have already thought of this and prepared support for dynamic tests.

Basically we need to wrap a lambda that does the test itself in a DynamicTest wrapper. The DynamicTest.dynamicTest(String, URI, Executable) method accepts three arguments:

name of the test to display (equivalent to @DisplayName)
file locator (URI) that allows you to navigate to the source when you click on the test name in the IDE
lambda in the form of Executable, which represents the test itself

Creating a single stream of all code snippets is not practical in our case - there are too many of them, and listing the tests in the IDE quickly becomes cluttered. That's why we're using another JUnit 5 invention - DynamicContainer, which is designed to aggregate multiple related tests into a single "node". The node, in our case, represents a particular source markdown file where the code blocks are placed (either directly or by reference). This way, we can quickly identify the proper document to which the failed example belongs.

The executed tests looks like this:

IntelliJ Idea test execution tab

Compilation and execution code snippets via. JShell

The Java source code snippets need to be compiled and parsed and for this we use the JShell REPL.

The full source code is available on GitHub.

Initialization

First, we need to prepare the JShell instance using JShell.builder():

Note that we explicitly inject LocalExecutionControlProvider. By default, the JVM creates a separate process (fork) to evaluate the Java snippets. While this is good for isolation and security, it is inconvenient for us because we can't set debug breakpoints on the code executed by the script block. The debugging capability allows us to fix bugs in our source code examples much faster and easier.

Next, we initialize the classpath for the JShell instance by copying the full classpath of the JVM that the JUnit test suite runs in, and finally we initialize all the imports that our Java examples need using this file: . For initializing imports, we use the same logic as for executing the source code itself.

Source code preparation and execution

The source code needs to be split into separate statements that can be executed in JShell instance. The JShell doesn't accept the full source code and is similar to the IDE's evaluate expression dialog. For this operation we ask the JShell instance itself to split the code appropriately:

Method returns a list of separate Java expressions that together make up the original Java sample code. The next step is to evaluate/execute them in JShell instance:

JShell generates a list of events for each evaluated Java snippet (expression) describing what happened. The most important information for us is whether the snippet was applied successfully and is part of the JShell state - this can be verified by calling event.status().isActive(). If the event is not active, a serious error has occurred and we will use the JShell diagnostic to distill the root cause of the problem.

Next, we check if an exception was thrown during the expression evaluation by checking the event.exception() method. If the exception is found and the event state is VALID, we add it to the list of snippets that have been applied by JShell and are part of its state.

The same logic applies to the case where neither a compilation error nor an evaluation exception occurred and the snippet was successfully evaluated. The snippet is added to the list of snippets that have affected the state of JShell and are returned as a result of this method.

Tear down

Initializing and preparing the JShell instance is costly, so we reuse a single instance for all examples in the same documentation file. We don't want to share the same JShell instance for multiple documentation files because our ultimate goal is to be able to run our documentation test in parallel once the JUnit 5 issue #2497 is resolved. Reusing the same JShell instance for multiple tests in a single documentation file raises the question of proper state cleanup so that the relics of one example don't affect the other examples that run after it.

When the test is finished, we need to clean up the JShell instance that will be reused for another example in the same documentation file. Fortunately, the JShell provides a way to drop evaluated statement that effectively eliminates its impact on the state of the JShell instance.

Actually there are two phases of tearing down the context:

The first phase iterates over all snippets and finds all VarSnippets - i.e. expressions that declare variables in the JShell context and tries to detect whether these variables implement the java.lang.AutoCloseable interface and need to be closed properly. There is a catch though, the JShell doesn't provide facility to access the variables by reference and work on them directly in the code that invoked the snippets. The JShell can only return toString outputs of these variables. The reason, by our guess, is that the JShell interpreter can (and usually does) run as a forked process and the Java objects live in a context (classpath, process memory boundaries, etc.) of different JVM instance. Therefore, we need to run the close logic as another JShell expression.

Interestingly, if you try to replace AutoCloseable.class.isInstance(" + varSnippet.name() + ") with varSnippet.name() + " instanceof AutoCloseable", you will end up with a compilation exception for variables that are not instances of AutoCloseable. Although this is correct source code for normal Java code, the JShell interpreter/compiler behaves differently, and we have to work around this problem by using methods on a Class interface. See JDK issue #8211697.

Finally, we need to drop both the AutoCloseable unclosing snippets and all the sample source code snippets that were evaluated and affected the state of the JShell instance. The drop operation behaves similarly to a database rollback operation and rolls back all operations that affected the JShell instance (but not the side effects associated with network or file system calls).

Test prerequisites, chaining

Some examples build on the context of other examples in the same document - naturally as the described use case unfolds. We certainly don't want to clutter up the example code by repeating the statements that set up the environment for the example itself - the examples should be as short as possible.

That's why we introduced the requires attribute for our SourceCodeTabs component, which allows you to specify one or more other files that must be executed before the code in the example itself is executed.

Example translation

Another special feature of our documentation is the ability to switch between different flavors and even content of the documentation by selecting different preferred language to view the documentation in:

Preferred language selector

Preparing examples for 4-5 different languages is a tedious and time-consuming task that we want to avoid. Therefore, we have implemented a special mode that allows us to generate examples for different languages (Java, REST, GraphQL) from the base EvitaQL query statement.

Similarly, we can automatically generate the MarkDown snippets documenting the expected output for the queries applied to our demo dataset. These MarkDown snippets play another important role as you will read in the next chapter.

The sample translation is a manual operation that is not invoked by our CI workflows. The generated files need to be reviewed by a human and committed to Git along with the documentation. However, once the examples/files are generated, they are compiled and executed by our test suite, so their correctness is automatically verified. See for yourself how the generated snippets look like:

Verification, assertions

The examples that query our demo dataset usually contain the preview of the result that the reader can expect if they run the query themselves. Since our dataset is not static, but evolves over time (new data is added, the structure changes, etc.), the results may differ over time. We are also working on the evitaDB query engine and the results may differ as we change the internal logic of the engine. In any case, we want to be aware when this happens and examine the difference between previous results and the results produced by the current version of the dataset and the evitaDB engine. We need to either correct the documentation (although semi-automatically by re-running the example translation) or realize that we either need to correct our evitaDB engine or document the backward incompatible behavior.

That's why we automatically compare the expected result MarkDown snippets with the actual results of the evitaDB query given in the example, and if these results differ, we mark the tests as failed and force human intervention.

There is a special GitHub workflow associated with documentation verification. The tests, which check the executability of our source code examples and compare their output to the last checked output, run automatically whenever there is a change in the folder. They also run every Sunday night, so that we can detect possible changes in the demo dataset, even if the documentation itself hasn't changed.

Summary

So far we've written hundreds of pages of documentation and executable examples. The documentation still grows over time and having a process that covers our backs and monitors for malfunctioning examples or deviations in the declared answers is pretty important. There is nothing worse than outdated or incorrect documentation.

The pleasant side effects of the described process is that it saves a considerable amount of tedious work in defining the examples for all languages we support and also adds another layer of integration testing, that has uncovered some bugs in the evitaDB core / query parser.

The queries run against our demo dataset and verify that our demo server API is working correctly and that we haven't broken it with some update, and that's also pretty important - we want our demo server to be available to anyone who wants to play with it.

The work has already paid off, and we're sure it's going to pay off even more in the long run.

Author: Jan NovotnýDate updated: 21.5.2023