Safely re-index with elasticsearch

I’ve been using Elasticsearch in production for about 1 year. The component does a very good job at indexing/searching but lacks a built-in solution for continuity of service: you can’t hold incoming write request an resume them at will.

In my team, we started thinking about a home-made solution. We really wanted to keep our webapp as robust as possible and avoid to much sub-system dependencies such as queues, webservices, etc. Elasticsearch was already a big one, from an ops perspective (mainly because it’s young). We also really needed the interruption-free feature because we could not afford even a minute of service interruption.

After some time we had to face it: with no built-in solution around we had to build one and guess what? We could no more avoid queues because they really are robust component and are a perfect match for a {pause/resume|consumer/producer} paradigm.

We had to transform this workflow:
– write to database
– index to elasticsearch on commit
– read from elasticsearch

Into this one:
– write to database
– produce write orders to a queue
– consume write orders from a queue
– index to physical index
– read from alias
More complex but meets the key requirement of stopping/resuming queue consumption in order to go on maintenance (update config, reindex, etc).

1- The scenario

When I face a vast problem like this one I always create jbehave stories. They give me guidance and bound my DoD (definition of done):

Scenario: search classified by title should succeed
Given I create the following classifieds:
| title           | description         |
| awesome title 1 | great description 1 |
| awesome title 2 | great description 2 |
| great title     | awesome description |
When I search for classifieds which "title" matches "awesome"
Then I should get the following classifieds:
| title           | description         |
| awesome title 2 | great description 2 |
| awesome title 1 | great description 1 |
When I create a valid classified:
| title           | description          |
| awesome title 3 | whatever description |
When I search for classifieds which "title" matches "awesome"
Then I should get the following classifieds:
| title           | description          |
| awesome title 3 | whatever description |
| awesome title 2 | great description 2  |
| awesome title 1 | great description 1  |
When the system stops consuming messages
And I create a valid classified:
| title          | description          |
| whatever title | whatever description |
When I search for classifieds which "title" matches "whatever"
Then I should get no results
When I trigger a reindex operation
And the system starts consuming messages
And I search for classifieds which "title" matches "whatever"
Then I should get the following classifieds:
| title           | description          |
| whatever title  | whatever description |

Let’s take a look at the solution

2- The database events listeners

The example uses hibernate event system to get notified of write operations (create, delete, update) and consequently send informations to producer.

First write listeners

public class PostDeleteEventListener implements org.hibernate.event.spi.PostDeleteEventListener {
...
    @Autowired
    private ClassifiedsProducer classifiedsProducer;

    @Override
    public void onPostDelete(PostDeleteEvent event) {
        classifiedsProducer.write(
                new WriteClassifiedCommand((Classified) event.getEntity(), Operation.delete));
    }
}

Then register them

public class HibernateListenersConfigurer {
...
    @Autowired
    private PostDeleteEventListener postDeleteEventListener;

    @PostConstruct
    public void registerListeners() {
        HibernateEntityManagerFactory hibernateEntityManagerFactory = (HibernateEntityManagerFactory) this.entityManagerFactory;
        SessionFactoryImpl sessionFactoryImpl = (SessionFactoryImpl) hibernateEntityManagerFactory.getSessionFactory();
        EventListenerRegistry registry = sessionFactoryImpl.getServiceRegistry()
                .getService(EventListenerRegistry.class);
...
        registry.getEventListenerGroup(EventType.POST_DELETE).appendListener(this.postDeleteEventListener);
    }
}

You’re set for database events.

3- The queing system

The producer registers consumers and notifies them when writing to queue (observer pattern)

....
    @Override
    public void registerListener(WriteClassifiedEventListener writeClassifiedEventListener) {
        this.writeClassifiedEventListeners.add(writeClassifiedEventListener);
    }

    public void write(WriteClassifiedCommand writeClassifiedCommand) {
        writeClassifiedsQueue.add(writeClassifiedCommand);
        notifyListeners();
    }
....

On the other side, the consumer starts/stops consuming and reacts on messages.

....
    public void stopConsumingWriteCommands() {
        classifiedsProducer.unregisterListener(this);
    }
    public void startConsumingWriteCommands() {
        classifiedsProducer.registerListener(this);
        onMessage();
    }
    public void onMessage() {
        WriteClassifiedCommand command = classifiedsProducer.consume();
        if (command == null) return;
        final Operation operation = command.getOperation();
        Classified classified = command.getClassified();
        switch (operation) {
            case delete:
                searchEngine.removeFromIndex(classified);
                break;
            case write:
                searchEngine.index(classified);
                break;
        }
    }
....

4- The search engine and the full re-index problem

Being able to start/stop message consumption is nice but not enough. Using it as-is only adds complexity. The true value of the feature lies in the ability for Elasticsearch to go on maintainance: create/drop indices, etc.
Re-indexing transparently implies creating a new index with up-to-date settings/mappings, filling it with fresh data, switching to new index, dropping old one. It is the only viable solution as stated here: “Also, it is not recommended to delete “large chunks of the data in an index”, many times, it’s better to simply reindex into a new index“.
For the read part it is made possible by using Elasticsearch aliases. An alias abstracts user from knowing physical indices names.
For example, given an alias ‘estate’ that references ‘estate-201201’, ‘estate-201202’ and ‘estate-201203’ if one queries to estate the result will be composed of matches from both 3 indices. That’s really a killer feature to me even if it comes with its little pitfalls. In such configurations beware of duplicates! The indices data behind the alias mus be disjoint.
When I’m done with my fresh index creation it just has to join the alias and the old index just has to leave it.

        final IndicesAliasesResponse indicesAddAliasesResponse =
              indicesAliasesRequestBuilder
                  .addAlias(newIndexName, indexRootName)
                  .execute()
                  .actionGet();

        if (!indicesAddAliasesResponse.acknowledged()) {
            throw new RuntimeException("Failed to add index '" + newIndexName + "' to alias '" + indexRootName + "'");
        }
...
            final
            IndicesAliasesResponse
                    indicesAliasesRemoveResponse =
                    indicesAliasesRequestBuilder.removeAlias(oldIndexName, indexRootName)
                            .execute().actionGet();

            if (!indicesAliasesRemoveResponse.acknowledged()) {
                throw new RuntimeException("Failed to remove index '" + oldIndexName + "' from alias '" + indexRootName + "'");
            }

            DeleteIndexResponse
                    deleteIndexResponse =
                    indicesAdminClient.prepareDelete(oldIndexName).execute().actionGet();
            if (!deleteIndexResponse.acknowledged()) {
                throw new RuntimeException("Failed to delete index '" + oldIndexName + "'");
            }
...

Unfortunatley aliases won’t work for the write part. We must know the real index name when writing to an index. While this is perfectly logical it is quite inconvenient because when introducing a new index the system must stop writing to the old index and start writing to the new one. We chose to use a config property that holds the name of the current write index, provided juste after the old/new index switch.

indexConfig.put("write-index", newIndexName);

We were then able to use that index name for writing operation (index one entity, remove one entity from index and full reindex)

    public void index(AbstractEntity entity) {
...
            Map<String, Object> config = elasticSearchConfigResolver.getConfig();
            Map<String, Object> index = (Map<String, Object>) config.get(CLASSIFIEDS_ALIAS);
            String writeIndex = (String) index.get("write-index");
...
    }

    public void removeFromIndex(AbstractEntity entity) {
...
            Map<String, Object> config = elasticSearchConfigResolver.getConfig();
            Map<String, Object> index = (Map<String, Object>) config.get(CLASSIFIEDS_ALIAS);
            String writeIndex = (String) index.get("write-index");
...
    }

    @Override
    @Transactional
    public void reIndexClassifieds() throws IOException {
...
        Map<String, Object> config = elasticSearchConfigResolver.getConfig();
        Map<String, Object> index = (Map<String, Object>) config.get(CLASSIFIEDS_ALIAS);
        dropCreateIndexCommand.execute(indicesAdminClient, CLASSIFIEDS_ALIAS, index);
        String writeIndex = (String) index.get("write-index");
...
    }

We endup with this winning re-index workflow:
* stop consumming
* resolve current index name
* resolve new index name
* create new index
* apply index settings+mappings
* populate new index
* add new index to alias
* remove old index from alias
* drop old index
* start consumming

Finding that solution took us time but we’re pretty satisfied with it. The example uses a Queue rather than an external queing component like RabbitMQ or ActiveMQ but the principles are still valid.
There still remain rooms for improvements and we will work on that in the next iterations.
I’d like to thank Nicolas, one of our information system architect which is damn brilliant and always available for advices.

I hope this post will give hint to anyone confronted to an Elasticsearch interruption-free reindex requirement.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s