RESTful pagination with cucumber-jvm, jersey and elasticsearch

Hi all,

It’s been a while since my last post. There are good reasons for that. I had many topics in mind :
RESTful pagination. This one already hides 2 wide concepts : pagination and HATEOAS.
Cucumber. I already know 2 Cucumber implementations : Java and Ruby. It’s not trivial to choose when provided with that much options.
I ended up building a pagination system with cucumber-jvm as acceptance test framework, jersey as REST framework, jetty as container, in-memory h2 database and in-memory elasticsearch searchengine.

1 – Pagination

Pagination is a well known challenge. It is implemented by many infrastructure components (elasticsearch, mysql, etc) or frameworks (hibernate, etc).
There are 2 levels of pagination : data level and presentation level.

The data pagination first selects ids (defined by query and sort specs), calculates an ids subset (defined by pageIndex, itemsPerPage) and then returns corresponding data. It is fully stateless. One must provide all required informations, each time. When not provided, the applicaiton runs with defaults.
Running a 7 results pages search then asking “next”, then asking page 6 results in 3 data queries and 3 http queries.

The presentation pagination first selects all matching results, then returns the subset corresponding to the desired pageIndex if the query hasn’t changed. The “if the query hasn’t changed” part supposes that the system maintains a “query history”. That query history is not stateless.
Running a 7 results pages search then asking “next”, then asking page 6 results in 1 data query but still 3 http queries.
It was the preferred way of running queries when heavy load was an issue because of RDBMS bottlenecks. Nowadays, many applications take advantage of NoSQL stores and distributed search engines.

That’s for the why. Let’s see how one could implement pagination.
To paginate we need a total, a pageIndex and an itemsPerPage. Whe also need to know what is considered “invalid input” (negative values, zero divide, etc) and what to do with it. That’s the difficult part : what to do with negative/zero values.
Below, an excerpt of SearchResults which holds the knowledge to calculate page indices:

  private int countPages() {
    int
        itemsPerPage =
        (getQuery() == null) ? 0 : getQuery()
            .getItemsPerPage();
    if (itemsPerPage       return 0;
    }
    int totalItems = getTotalItems();
    int moduloResult = totalItems % itemsPerPage;
    int divideResult = totalItems / itemsPerPage;
    return moduloResult == 0 ? divideResult : divideResult + 1;

  }

  public int getPageIndex() {
    if (countPages()       return -1;
    }
    final int userRequestedPageIndex = getQuery().getPageIndex();
    if (userRequestedPageIndex < getMin() || userRequestedPageIndex > getMax()) {
      return 0;
    }
    return userRequestedPageIndex;
  }

  public int getFirstPageIndex() {
    if (getPageIndex() == -1) {
      return -1;
    }
    return 0;
  }

You can dive more on github.

2 – RESTful pagination

We do know why and how to paginate in a classic architecture. In a REST architecture the results representation (be it json, xml, html or any other one) should include links related to other pages (first, previous, self, next, last) and links to the items.
This can be done in a hundred way. This post used aop to generate links. Jersey and RESTEasy both have a native mechanism based on annotations to generate links in the output. I chose to build links by hand because search representations are a bit more complex than entities.
The builder is responsible for building a SearchRepresentation from SearchResults. Below the main method :

  private static Link buildPaginationLink(Relations relation, SearchResult searchResult,
                                          UriInfo uriInfo) {
    int pageIndex = -1;

    switch (relation) {
      case first:
        pageIndex = searchResult.getFirstPageIndex();
        break;
      case previous:
        pageIndex = searchResult.getPreviousPageIndex();
        break;
      case self:
        pageIndex = searchResult.getPageIndex();
        break;
      case next:
        pageIndex = searchResult.getNextPageIndex();
        break;
      case last:
        pageIndex = searchResult.getLastPageIndex();
        break;
    }

    if (pageIndex == -1) {
      return null;
    }

    Link link = new Link();
    link.setRel(relation);
    UriBuilder
        builder =
        uriInfo.getBaseUriBuilder().queryParam("q", searchResult.getQuery().getQuery());
    if (CollectionUtils.isNotEmpty(searchResult.getQuery().getSort())) {
      for (OrderBy item : searchResult.getQuery().getSort()) {
        builder.queryParam("sort", item.toString());
      }
    } else {
      builder.queryParam("sort", OrderBy.DEFAULT.toString());
    }

    builder.queryParam("pageIndex", pageIndex);
    builder.queryParam("itemsPerPage", searchResult.getQuery().getItemsPerPage());
    String href = builder.build().toString();
    link.setHref(href);
    return link;
  }

3 – Cucumber ecosystem

Cucumber is a Ruby framework that allows one to edit and describe scenarios and features in plain text and implement them in Ruby.
Cucumber uses Gherkin (Ragel based) parser for textual scenarios and Cucumber takes it from there:
– loads the configuration (features location, implementations locations, etc)
– matches textual steps against steps implementation
– suggests implementation for pending steps
– runs implemented stories
– gathers results and ouputs them
I won’t dive into the Cucumber features, I’d rather present the java parts.
The bridge between Java and Ruby is JRuby. This is how cuke4duke was born. It is an addon to Cucumber, making it possible to write step definitions in several different JVM languages including Ruby (thanks to JRuby).
While Cucumber and Cuke4Duke are written in Ruby, Cucumber JVM is written in java removing the need for a Ruby environment (dependencies, runtime, etc).

3.1 – Cuke4duke

Cuke4Duke is a very nice piece of software because it makes possible to implement features in Java or Ruby.
However, to me, the killer feature remains the maven plugin because it integrates nicely with our CI platform.
Despite this nice features cuke4duke has major flaws:
– from a user point of view: I’m a maven intensive user and I just don’t understand how gem calls itself a dependency manager when it breaks almost every time. It’s the main reason I won’t share a cuke4duke example with you. My app has been broken between the time I decided to share it with you and the time I’m writing these lines. It’s also the main reason I decided to explore Cucumber JVM.
– from a developer point of view: it’s a nightmare to test compatibility between components (cucumber, ruby, jruby, cuke4duke, rvm, gem, gherkin, maven) and the team stopped doing it. Your app can easily break because of too many dependencies. Reducing layers enforces stability.

For those very reasons (and surely for other ones), the cucumber team decided to create Cucumber JVM.

3.2 – Cucumber-jvm

Cucumber JVM‘s goals are exactly cuke4duke’s ones without the noise and the layers.
The Gherkin part is generated in java from the Ruby version. The Cucumber part is written by hand, it’s the main reason Cucumber JVM is not at Cucumber-Ruby’s level and I guess it will never be. Contributions are always more than welcome.
Below are the required steps to add Cucumber JVM support to your project:
– setup maven: dependencies and maven plugin

            <dependency>
                <groupId>info.cukes</groupId>
                <artifactId>cucumber-jvm</artifactId>
                <version>${cucumber-jvm.version}</version>
                <scope>test</scope>
            </dependency>
            <dependency>
                <groupId>info.cukes</groupId>
                <artifactId>cucumber-core</artifactId>
                <version>${cucumber-jvm.version}</version>
                <scope>test</scope>
            </dependency>
            <dependency>
                <groupId>info.cukes</groupId>
                <artifactId>cucumber-java</artifactId>
                <version>${cucumber-jvm.version}</version>
                <scope>test</scope>
            </dependency>
            <dependency>
                <groupId>info.cukes</groupId>
                <artifactId>cucumber-html</artifactId>
                <version>${cucumber-html.version}</version>
                <scope>test</scope>
            </dependency>
            <dependency>
                <groupId>info.cukes</groupId>
                <artifactId>gherkin</artifactId>
                <version>${gherkin.version}</version>
                <scope>test</scope>
            </dependency>
            <dependency>

– create features: src/test/resources/features is a good location as soon as you tell the runner where to find them.

...
  @done
  @#1
  Scenario Outline: A customer can create a classified
    Given I am a customer
    And a valid classified
    And I send ""
    When I try to create the classified
    Then the creation is successful
    When I load the classified as ""
    Then its status is "draft"
  Examples:
    | format                |
    | application/json |
    | application/xml  |
...

– setup maven plugin: Cucumber JVM has a client you can launch with various arguments (report formats, features location, steps implementations locations, tags). This is a perfect usage for the exec-plugin

            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>exec-maven-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>integration-test</phase>
                        <goals>
                            <goal>java</goal>
                        </goals>
                        <configuration>
                            <classpathScope>test</classpathScope>
                            <includeProjectDependencies>true</includeProjectDependencies>
                            <includePluginDependencies>true</includePluginDependencies>
                            <mainClass>cucumber.cli.Main</mainClass>
                            <arguments>
                                <argument>--format</argument>
                                <argument>html:target/reports</argument>
                                <argument>--format</argument>
                                <argument>pretty</argument>
                                <argument>--glue</argument>
                                <argument>org.diveintojee.poc.cucumberjvm.steps</argument>
                                <argument>target/test-classes</argument>
                                <argument>--tags</argument>
                                <argument>${cucumber.tags}</argument>
                            </arguments>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

Well you’re done with setup, you can write a feature and run maven exec plugin. In the above example cucumber cli will:
– look for features in target/test-classes/ (including subdirectories)
– look for steps in package org.diveintojee.poc.cucumberjvm.steps
– match them
– suggest an implementation for every non implemented step (example below)

You can implement missing steps with the snippets below:

@When("^I search classifieds for which \"([^\"]*)\" contains \"([^\"]*)\"$")
public void I_search_classifieds_for_which_contains(String arg1, String arg2) {
    // Express the Regexp above with the code you wish you had
}

There you go, your tdd cycle can begin : test, code, refactor.

4 – Feedback

Cucumber is a very good piece of software, especially for Ruby developers.
Cuke4Duke was a very nice glue although it was never pleasant to come back 4 month later and not be able to build your project from scratch because dependencies were broken.

In Ruby, the ability to set values in a shared context really eases the task. There is no such thing in Java because it is considered a bad design: shared objects are not thread-safe.
Everyone should be aware of that “context sharing” particularity before using Cucumber. Why is it important? Well because step definitions are unique among all scenarios. The statements below operate on the same object. You can only run them one after the other if you want to avoid side-effects.

Feature:

Scenario:
....
When I set "title" to "XBox 360, 85$, Never used, still packed, 8$ shipping"

Scenario:
....
When I set "title" to "Sega Genesis, 250$, Collector version. Includes Sonic The Hedgehog, 8$ shipping"

The same step

@When("^I set \"([^\"]*)\" to \"([^\"]*)\"$")
public void I_value_field_to(String fieldName, String value) {
    ShareContext.advert.setTitle(value);
}

will be executed
You can understand that all scenarios share the same global state. You end up splitting the implementation into classes that referer to a collection of static members. With this limitation I strongly suggest you to reset members before every scenario or you might get impacted by the values of the previous scenario. I guess that’s not the idea of “test in isolation” you had in mind.
In Ruby you just refer to “global” variable with “@my_variable”. It automatically gets added in the “World”.
In Java you have to create your “World”, add public static members, and reference them in steps implementations. At least that’s the way I see it
.
The above deleted assertions are false. Aslak Hellesoy pointed that to me. One can avoid context sharing issues by using various available DI technologies: picocontainer (which is the default one), spring, guice, weld,etc. Because a whole test context is re-created before every scenario, you don’t have to worry about value not being reset when running the next scenario.

If you absolutely need finer control on your execution you should consider using JBehave. But be aware that with great power comes … a little complexity :).

I won’t explicit the search engine part because I already did it in previous posts. I still hope you learned something about RESTful pagination and HATEOAS is really not easy to reach but suitable tools may help you.

The full source code is available on github. There really is a lot on that poc. Feel free to pick the parts you want to digg.

Cheers.

Louis.

3 thoughts on “RESTful pagination with cucumber-jvm, jersey and elasticsearch

  1. You don’t have to use static variables and you shouldn’t.

    Search the cucumber mailing list for picocontainer and dependency injection for details.

    Aslak

    1. Hi Aslak,

      Thanks for bringing up that. I did not want to make false assertions but it reflected my “bad” understanding of how cucumber works.

      I’ll have to rethink my design on steps … and correct my post.

      Cheers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s