Add search features to your application, try Elasticsearch part 4 : search

Elasticsearch relies on Lucene engine.
The good news is that Lucene is really fast and powerful. Yet, it’s not a good idea to expose such power to the user. Elasticsearch acts as a first filter but remains quite complete.
When you don’t master an API, a good practice is to have control over what you expose to the user. But this comes with a cost, you’ll have to:
– implement a query language
– implement a language parser
– implement a query translator (translate into something understandable by elasticsearch)
– run search
– translate Elasticsearch results into a custom structure

The task seems daunting but no worry: we’re going to take a look at each step.

1 – The query language

Once you’ve delimit the perimeter, it’s simpler. I imagined something like:

http://domain/search/adverts?query=reference:REF-TTTT111gg4!description~condition legal!created lt 2009&from=2&itemsPerPage=10&sort=created+desc

 

query := (Clause!)* ;
Clause := (Field Operator Value)* | (Value)+ ;
Field := ? fieldname without space ? ;
Operator := (:|~|lt|gt|lte|gte) ;
Value : ? anything form-url-encoded ? ;

The “query” param is optional, if not specified, the default search should return all elements.
The “from” param is optional, if not specified, the 1st page is assumed
The “itemsPerPage” is optional, if not specified, a page will contain 10 results
The sort param is optional, if not specified, the result will be sorted by id desc.

Well even so, it is not trivial. For the purpose of the poc I simplified my requirements:
– I did not use a ENBF parser like ANTLR : parsers deserve their own post.
– I did not implement all the operators.

Below, the piece of code used to split clauses:

List extractSearchClauses(final String queryString) {
	if (StringUtils.isEmpty(queryString)) return null;
	final List clauses = Arrays.asList(queryString.split(CLAUSES_SEPARATOR));
	final Collection cleanClauses = Collections2.filter(clauses, new Predicate() {

		/**
		 * @see com.google.common.base.Predicate#apply(java.lang.Object)
		 */
		@Override
		public boolean apply(final String input) {
			return StringUtils.isNotEmpty(input)//
					&& !input.trim().equals(SearchOperator.EXACT_MATCH_OPERATOR.toString())//
					&& !input.trim().equals(SearchOperator.FULL_TEXT_OPERATOR.toString()) //
					&& !input.trim().endsWith(SearchOperator.EXACT_MATCH_OPERATOR.toString())//
					&& !input.trim().endsWith(SearchOperator.FULL_TEXT_OPERATOR.toString());
		}

	});

	return new ArrayList(cleanClauses);

}

2 – Translate to Elasticsearch language

First let’s establish a few rules
– an empty clauses list means returning all the elements (…/adverts?)
– a single clause that contains no field and no operator means a full text search on all searchable fields (/adverts?q=condition+legal)
– a multiple clause means a boolean AND query between clauses (…/adverts?query=reference:REF-TTTT111gg4!description~condition legal)

Elasticsearch comes with a rich search API which encapsulates the query building in a collection of QueryBuilders.
Below an example of search instructions:

...
((BoolQueryBuilder) queryBuilder).must(queryString(clause.getValue())
	.defaultField(clause.getField()));
...

3 – Running search

The SearchRequestBuilder is an abstraction that encapsulates the well known search domain which specifies:
– pagination properties (items per page, page number),
– sort specification (fields, sort direction),
– query (built by chaining QueryBuilders).

Once you’ve configured your SearchRequestBuilder you can run the actual search

...
final SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
...

4 – Transfer results to a custom structure

Ideally, we should return a search result that contains total hits and pagination results (previous, next, first, last). Those are the only information needed by the user.
But remember : the index stores a json byte array (not mandatory but I chose it because I build RESTful services), not an object. We have to re-build our object from JSON representation.
Again, writing a Converter really helps.
I did not implement pagination as it’s another entire concern : building a RESTful search response that respects HATEOAS principles. I’ll blog on that later.

Example of Converter invocation:

...
final SearchResult result = this.searchResponseToSearchResultConverter.convert(searchResponse);
return result;
...

And below the Converter source (I could have used a transform function …):

...
    public SearchResult convert(final SearchResponse source) {
        final SearchResult result = new SearchResult();
        final SearchHits hits = source.getHits();
        result.setTotalHits(hits.getTotalHits());
        for (final SearchHit searchHit : hits.getHits()) {
            final Advert advert = jsonByteArrayToAdvertConverter.convert(searchHit.source());
            result.addItem(advert);
        }
        return result;
    }
...

This post closes a series of 4 on elasticsearch first contact.
We discussed the concepts, but before designing anything we wanted to get familiar with our new tool.
Once more comfortable with Elasticsearch, we started serious work: attaching indexing tasks to application events first, then building a simple search endpoint that uses Elasticsearch under the hood.

I can’t say that I totally adopted the tool because there still is a lot to validate:
– searches : I don’t know all the specifics/semantics/differences between all the pre-defined QueryBuilders.
– facets : how do they work in Elasticsearch ?
– I always heard it is insanely fast with high volumes. I want to see it with my own eyes.
– JPA was disappointing (not an Elasticsearch problem) : maybe I could use CDI …
– I still have to figure out how to cleanly setup different clients instanciation modes : memory and transport clients. Using Spring profiles is a solution but I’m not a big fan of profiles …
– I wish I could test a mysql river. I’d like to compare the river to the events mechanism.

I try not to be too exhalted but I have to say it’s a real pleasure once you’ve past the first pitfalls mostly related to:
– node/client management: like jdbc connections, you’re responsible for opening/closing your resources otherwise you may have unexpected side effects,
– mapping design: analysed and not_analysed properties have a huge impact on your search,
– and blindness: in memory testing is for experienced user who already know the API. I would suggest a real time-saver tool : Elasticsearch Head. This tool helped us understand how data was organized/stored, what data was currently in the index, if it was correctly deleted, etc. The price to pay: only works with transport configuration, not in-memory.

Anyway I hope you enjoyed the reading. If so feel free to share. If not, let me know why (I might have some inacurate informations) as soon as we learn something.

The full source is on github.
Run the following command to launch Jbehave search stories

mvn clean verify -Psearch
Advertisements

4 thoughts on “Add search features to your application, try Elasticsearch part 4 : search

    1. Hi Peter,

      Nothing prevents you to do so.
      It’s only a matter of responsibilities.
      JPA will manage your dabase data lifecycle (CRUD).
      Elasticsearch will manage your index data lifecycle (CRUD).
      One note though: put the index operations out of the DB transactions and after the db operations. It will help you providing consistency to your index as well. If the DB operation fails the index should not perform it’s operation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s