Hibernate Search with Lucene

This article is an extract of my book Hibernate and Java
Persistence by Example
.

The book is available in English as eBook (PDF document) and in
German as paper book. The eBook is continuously updated and covers
the newest features of Hibernate. You will find a complete table of
content on my website http://www.laliluna.de

Source code: http://www.laliluna.de/download/hibernate-search.zip

Introduction

Lucene is a popular full text search
engine. You can index documents, websites or arbitrary other data.
The index can be searched with a API. Hibernate Search integrates
Lucene Search with Hibernate. Entities can be indexed easily and with
a special session, you can perform a full text search for your
entities. A lot of databases provides already their own mechanism for
full text search. But those solutions are not portable across
databases and Lucene is probably more powerful and flexible than
proprietary solutions.

Let’s have a first look at code sample
before we talk about more details. You can find the full source code
in the project LuceneSearch.

First, we need to configure the Lucene
Search. If we use the AnnotationConfiguration to build the
Hibernate session factory, there is only one property to be defined
in the hibernate.cfg.xml. It is the location where the Lucene
index should be stored.

&lt;property name="hibernate.search.default.indexBase"&gt;<br>			/tmp <br>&lt;/property&gt;

If you use a normal Configuration to build the session factory, it is
required to configure a couple of event listener. Have a look in the
reference documentation of Hibernate Search for more details. As it
is possible to use XML with an AnnotationConfiguration as
well, I propose to use this kind of configuration even with XML only
mappings.


SessionFactory factory = new AnnotationConfiguration().configure().buildSessionFactory() 


In the next step, the entities which should be searched have to be
annotated.


Entity<br>Indexed
public class Article {
Id<br> @DocumentId<br> @GeneratedValue(strategy = GenerationType.AUTO)<br> private Integer id;<br><br> @Field(index = Index.UN_TOKENIZED, store = Store.YES)<br> private String title;<br><br> @Field<br> private String content;<br>// getter setter methods are missing<br>}</pre> <p> <i>Indexed marks an entity to be indexed, DocumentId </i>is required and defines how a document is identified in the Lucene index. @Field specifies that a field should be indexed. In the sample you can find two different settings for @Field.</p> <p><i>Field with default values


The content is indexed using the
standard analyser. This analyser splits the text into words,
transform them to lower case, removes characters like ;.’ and removes
a couple of very frequent English words like ‘a, is, in’


Only the indexed content will be stored
in the Lucene index but not the content itself. If you use a tool
like Luke to have a look at your Lucene index, you cannot see the
original content.

The title is not tokenized or
transformed.

@Field(index = Index.UN_TOKENIZED,
store = Store.YES)

As a consequence you cannot search
individual words of the title, but we can search for the precise
title or do a wild card search – find all titles starting with Foo.

In contrast to the field content,
the title is stored in the index (store = Store.YES ) and
we can see it if we browse the index using Luke. I will tell you more
about Luke at the end.

So, our entity is indexed and we can
start to do full text searches in Hibernate. A Lucene search consists
of three steps

  • Creating a search session

  • Creating a Lucene query

  • executing the query.

Here is a code sample

 Session session = SessionFactoryUtil.getFactory().getCurrentSession();<br>//		create a full text session<br>		FullTextSession fSession = Search.getFullTextSession(session);<br>		fSession.beginTransaction();<br>//		create a luceneQuery with a parser<br>		QueryParser parser = new QueryParser("title", new StandardAnalyzer());<br>		Query lucenceQuery = null;<br>		try {<br>			lucenceQuery = parser.parse("content:hibernate");<br><br>		} catch (ParseException e) {<br>			throw new RuntimeException("Cannot search with query string",e);<br>		}<br>//		execute the query<br>		List&lt;Article&gt; articles = fSession.createFullTextQuery(lucenceQuery, Article.class).list();<br>		for (Article article : articles) {<br>			System.out.println(article);<br>		}<br>		fSession.getTransaction().commit();

A search session is created from an open Hibernate session. Basically
it is just a wrapper adding the search specific methods to the
session. We use a StandardAnalyser to analyse the search
string, which is the same used to index the content field.
Finally we execute the full text query.


The field title was not
tokenized. A search for title needs to use a different approach. You
can use a precise search


List<Article> articles = fSession.createFullTextQuery(
new TermQuery(new Term(“title”, “About Hibernate”)), Article.class).list();
for (Article article : articles) {
System.out.println(article);
}


or a wildcard search


List<Article> articles = fSession.createFullTextQuery(
new WildcardQuery(new Term(“title”, “About*”)), Article.class).list();
for (Article article : articles) {
System.out.println(article);
}


You can adapt the indexing and the search string analysing to your
needs. For example we could specify that the indexing of the field
title goes through a toLowerCase filter. Emanuel Bernard has
demonstrated a couple of new features on the Devoxx conference. You
can use word stemming – run, runner, running – to find words with
the same stem, phonetic searches with Soundex or Metaphone algorithm
to find words with a close sound or approximate searches with ngram
search. I will cover more of this approaches in the next updates.

Luke let you browse your index and perform searching on it. It is
very useful to test your searches or debug a problem.



http://www.getopt.org/luke/