Google App Engine Search API Tutorial

Ever since Google put up its famous search page in the 1990s, the world has not been the same again. Search is an integral part of most web applications and needs to be implemented in a fashion that allows for flexible search, delivers results quickly and can keep handling an ever increasing amount of data.

Implementing Search in your application is not an easy task and like me, you might have been rolling your own code in App Engine to implement search.

search-marketing

In this tutorial, we are going to take a look at the App Engine Search API that is in General Availability (GA) status since SDK 1.8.5 Release. The API is fairly detailed and we shall be covering the basics to get started with it over here. Once this is in place, you can go ahead and suit it to your requirements.

Sounds good? Let’s move forward.

Prerequisites

  • Basic understanding of Java Web Development, which includes Servlets, JSP, WAR file structure, etc.
  • You have a working development environment for Google App Engine. This includes the Google Eclipse plugin.
  • We will be writing a REST based Web Service that returns JSON encoded data. You do not need to be an expert in that but need to know what REST like / JSON data is about. We are using GSON, a Java library to convert your Java objects to JSON format.

What this Episode covers

  • Overview of App Engine Search API
  • Web Application that saves Employee Records in a Search Index and provides a UI to search them (all via the Search API)

Please note that focus is on demonstrating the functionality and production grade code will need to address a lot more things.

My Development Environment

  • Eclipse Juno
  • Google Eclipse plugin with App Engine SDK 1.8.7
  • Mac machine (but Windows will do too!)

Attention-> You may have another version of Eclipse and older versions of SDK but make sure that the App Engine SDK version is 1.8.5 or higher.

Search Application – Employee Directory in Action

Let us check out the application in action first. This will help to understand the code much better.

The application is hosted for you at http://searchapi123.appspot.com/

Visit the above link. It should bring up the screen shown below:

Screen Shot 2014-01-03 at 12.49.50 PM

As you can see, we are going to write a web application that implements a Search. The Search allows us to query Employee records that are populated in the Search index. All the Employee Records are not present in a datastore (which you could do too!) but instead for the purpose of this blog post, they are simply inserted into an Index via the App Engine Search API.

To search for records, all you need to do is enter a Search term in the input field above and it will match the term against any of the attributes of the Employee i.e. UserId, First Name, Last Name, etc. So partial search on a value works too.

I have about 3 records (definitely you should put in 1000s of records) and when I provide  search term say “CA” for region, it shows me the matching records as below:

Screen Shot 2014-01-03 at 3.19.38 PM

The search results come up next:

Screen Shot 2014-01-03 at 3.20.22 PM

You can try out by typing in any partial text and it will do the match across all the attributes of an Employee record. Note that the Filter input field that you see to the right is for filtering local results in the browser itself – it does not submit the form to the Search API.

Let’s get on with the code.

Download Full Source Code

I suggest that you begin with a full download of the project source code.

Go ahead & download the code from : 

https://github.com/rominirani/EmployeeDirectory

This is an Eclipse project that you can import directly. For the sake of reducing the code size, I have removed the App Engine SDK Jars from the WEB-INF\lib folder. So depending on the version of your App Engine SDK in the Eclipse development environment on your machine, please link to the appropriate App Engine SDK that you have for the project to build successfully. I have included the gson.jar in the WEB-INF\lib directory though.

If you are successful with importing the project into your Eclipse setup, the project directory should look something like this:

Screen Shot 2014-01-03 at 2.19.30 PM

Search API – An overview

Let us first discuss the Search API. As per the official documentation “The Search API provides a model for indexing documents that contain structured data. You can search an index, and organize and present search results.”

I have intentionally put key concepts that you need to know in bold. If you understand them, you are all set to get a grasp of the API.

  • A document is any object (in our case an Employee record is the document) that contains a unique id and user defined fields. Each field has a type and a value. A Text field is one type. The Search API supports other types too like HTML, GeoPoint, Number, Date, etc.
  • Once you have a list of documents (Employee records), you need to build the index. The Index will contain these documents. We will see in our code how to add the documents to an Index.

This contains one part of the puzzle i.e. building the index. Once the index is built, you will typically do the following:

  • Construct queries to search the index. You can search by unique Id or by various criteria.
  • The search will finally throw up results, which you can then navigate through, pick up the document record (and its attributes) and present the results. Remember that you are searching an index and the attributes of each record in the index might be IDs that you can then use to lookup other date.

Coming back to our example, the document in our case is going to be an Employee Record. We are going to model the Employee as having several fields which we have identified like:

  • userId
  • jobTitleName
  • firstName
  • lastName

and so on.

Building the Search Index

Now that we are clear on the concept, let us look at the first part i.e. building the Search index or in other words, building the document and adding them to the Search Index.

To help create Employee documents, we are going to use a JSON data structure to help populate about 3 Employee records. The JSON Format is shown below for the 3 records and you can notice that it is an Array of 3 records and each of the records has the respective attributes for the Employee records.

Now, take a look at the http://searchapi123.appspot.com/addemployee.jsp page. You will notice that it allows you to enter the JSON data as given below:

Screen Shot 2014-01-03 at 3.39.16 PM

So all I have done is paste the JSON data into that and submitted the form to a servlet shown below:

You will notice that it simply extracts out the JSON text and calls a method as given below:

ImportEmployeesIntoIndex.processEmployees(employeeJSONData);

Now, let us look at the magic that happens inside of ImportEmployeesIntoIndex.java class.

Let us take a look at the important parts of the code:

  • The processEmployees method parses the JSON array and iterates through each record.
  • For each record, it simply extracts out the relevant attributes for each Employee record.
  • As mentioned earlier, for each record that we want to put in the Search index, we need to build a Document object. So we build a new Document object and set its unique ID as the user ID.
  • Then in a cascading manner, we also specify the additional fields to the Document. We are adding Text fields for each of the attributes and we set the value too. Remember , each field has a type (text) and a value.
  • Now that we have built the Document object, we need to add the document to the Index. To do that, we are invoking a utility class SearchIndexManager, which we shall see in a moment. The method takes two parameters, an INDEX name (because we could build multiple indices) and the document to add to that index.

Let us take a look at the SearchIndexManager class since that is a wrapper class that I have written around the App Engine Search API. The source code is shown below:

Given below are details for each of the methods:

  • indexDocument : This method was invoked to insert each Document into the Index. The usual pattern of working with the App Engine Search API is that first we need to build the IndexSpec i.e. which Index we need to work with. The name is enough here. The pattern is the same, specify the Index and retrieve the Index from the Search Service. Finally, we are simply invoking the put method of the Index object that takes the input document as the parameter.
  • retrieveDocument: This method retrieves a single document from the index given its unique Id. We create the Index object and then we invoke the get method.
  • retrieveDocuments: This method retrieves a list of documents from the index given a search criteria. We use this method to search for documents that match the search term that we provide in the search page. We create the Index object and then we invoke the search method.
  • deleteDocumentFromIndex : This method is used to remove any document from the Index. We create the Index object and then we invoke the delete method passing in the document Id.

Note that the methods put, delete, get, search that I have demonstrated are is just a fraction of all the methods that are available. The Search criteria is also very rich and you can actually specify one or more conditions to search for specific documents. Check out the Queries section.

App Engine Administration Console

If you access the App Engine Admin Console for your application and navigate to the Data -> Text Search option in the console. You will notice that it will show the Index as given below:

Screen Shot 2014-01-03 at 2.28.56 PM

If you click on the Index, you will see the records present over there:

Screen Shot 2014-01-03 at 2.30.00 PM

The best part is that the Text Search option is available in the Local Development Server also. So it is a great way to test out your code locally. Shown below is my Admin console from local development server:

Screen Shot 2014-01-03 at 4.05.36 PM

And if you click on the EmployeeDirectoryIndex, you will see the records:

Screen Shot 2014-01-03 at 4.06.38 PM

Searching the Index for Employee Records

Let us now look at the final pieces of the puzzle that performs the search and displays the records in the UI. This part has nothing to do with the Search API and more about writing a simple REST service that the front-end (employees.jsp) can use to display the results. Keep in mind that all the core Search API stuff i.e. building the index and searching for records (by ID or by search criteria) was covered earlier in the SearchIndexManager class.

First up, let us look at the EmployeesDirectoryIndexService class (a Servlet) that accepts a searchText request parameter and performs the search. The GIST is given below:

The key points are mentioned below:

  • We extract out the searchText request parameter.
  • If the parameter is provided, we invoke the retrieveDocuments method of our SearchIndexManager utility class, which simply invokes the search method on the Index object.
  • The return value that we get back is an list of ScoredDocument. We simply use the methods provided by the ScoredDocument to retrieve the values. We populate a Data Transfer Object (DTO) i.e. Employee.java class , invoke the Setter methods and populate a list.
  • Finally we use the GSON library to marshall the List into a JSON representation and send that across to the Browser Response stream.

The User Interface (and pardon me for the colors!) – employees.jsp – is a straightforward jQuery based front-end code that uses the jQuery Data Tables plugin to show grid of the search results. Notice that it simply invokes the REST service :

“sAjaxSource”:
“/employeesdirectoryindexservice?searchText=<%=searchText%>”

In reality, you can even skip the UI and simply expose this REST service as a Search Service into your Employee Records. Sounds cool, doesn’t it ?

If you don’t believe, try the following in your browser:

http://searchapi123.appspot.com/employeesdirectoryindexservice?searchText=CA

Voila ! We have a live Search Web Service too! Try with some different searchText values.

The Search API is governed by Quotas and there is a free quota too. So do understand the limits before trying out the same. There is a Best Practices page too.

In fact, this leads into a nice segway into future articles that I have planned around Google Cloud Endpoints that will demonstrate that you do not have to do too much heavy lifting to write your own REST layer. But we will keep that for another episode.

Hope you liked this App Engine episode.

Till the next time,  stay tuned and give me feedback. If you run into issues, do drop a note, I will do my best to get it working for you.

About these ads

14 thoughts on “Google App Engine Search API Tutorial

  1. Hello Romin,

    Very nice tutorial, much more informative than the documentation.
    However, I noted that in the demo, partial search is not working. Could you help me fix that?

    1. Thank you for the feedback.

      I do believe that the word partial text search causes some confusion. Do look at the documentation at : https://developers.google.com/appengine/docs/java/search/#Java_Documents_and_fields and specifically go to the section titled “Special treatment of string and date fields” where it discusses how the string fields are tokenized.

      I take the following snippet from their actual documentation : “The string is split into tokens wherever whitespace or special characters (punctuation marks, hash sign, etc.) appear. The index will include an entry for each token. This enables you to search for keywords and phrases comprising only part of a field’s value. For instance, a search for “dark” will match a document with a text field containing the string “it was a dark and stormy night”, and a search for “time” will match a document with a text field containing the string “this is a real-time system”.

      What this means is that it does not do partial text search on a word. For e.g. if the title is “Developer” and if you have indexed that, I don’t believe it allows you to enter just “Dev” and then expect a match. However, if you have “Java Developer” in the title, then entering either “Java” or “Developer” should address it.

      People have raised this point and there are some approaches of you tokenizing the values yourself into all possible combinations and building the index. You can search on the web for these approaches and see if it works for you.

    1. Hi Marcos – thanks for the feedback and sharing your library. It looks super easy and boosts productivity to a high level while using the Search API. I am going to try it out and/or might plan a parallel post soon using your library.

  2. Hi Romin Irani,

    Thank you so much for this excellent article. I would like to ask you whether it’s possible to search the contents of an attachment using the same API. I have a requirement where users can upload different kinds of files and they can search for a particular attachment by specifying some text present in the files or documents. Wondering if that’s possible.

    1. That will be difficult and I am not sure if the API is suited to do that. There are several constraints to the API at this point i.e. :
      1) The Documents that you put in the Index can be of 1MB size only.
      2) The fields for a document can be of a certain type only. And the only ones that might apply to your case are : Text Field and / or HTML Field. Again these fields have limits on their sizes.

      Given that, I am not sure how you could use the API in its current form to search within documents. Some thoughts that come to my mind are if you can associate a set of tags or keywords for each kind of document, then you can put those in a field for the document, which can then be searched. Another option is to see if you can use Text Summarization to condense your file contents into some smaller text content that contains the summary and then add that as a field in the document.

      Hope this helps.

    1. Yes – definitely you can test the same functionality locally too. I have mentioned that in this blog post.

      Just start the development server, build your index and go to the Admin console at http://localhost:/_ah/admin. There you will find a link in the left side menu for “Full Text Search”. Click that and if you have added documents to your index, you will see the indexes present over there. Give it a try.

      Just one thing to note : certain functions do not work locally. Check out : https://developers.google.com/appengine/docs/java/search/devserver for more details.

      1. By testing locally, I mean in is it possible to write JUnit tests for search? Sorry for the confusion. Testing manually is certainly possible but I couldn’t find any equivalent of LocalDatastoreServiceTestConfig for search.

        I tried running Junits with just local datastore config and my index is returning extra/incorrect results.

  3. First of all Thank You very much for providing very nice and clear explaination about gae and rest app. I wanted to learn how to create restul app using google app engine , thanks for providing nice tutorial.

    1. Thanks for the comments.

      I suggest that you look at Google Cloud Endpoints – which is a great way to build out RESTful APIs powered by App Engine. I have written detailed tutorials on that. You can start at :

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s