Episode 8: Using Memcache in your GAEJ applications

Welcome to Episode 8. In this episode, we shall cover an important service that is provided in the Google App Engine. The service is called Memcache and is used allow applications to manage their data in a cache.

What is a Cache and why do we need one?

As per the Wikipedia definition, a cache is a temporary storage area where frequently accessed data can be stored for rapid access. Once the data is stored in the cache, it can be used in the future by accessing the cached copy rather than re-fetching or recomputing the original data.

The decision to use a cache in your application should come after carefully determining which operations would be benefit from it. Look at the following scenarios:

1. If you have stored information in a database and which is not updated frequently, then it might make sense to put some of that information in a cache so that if multiple requests come for the same data then you can simply look up the memory cache and retrieve it from there, rather than make repeated database calls that are expensive (and which may return the same data).

2. If you invoke external web services and you determine that the same data could be requested, then a cache would help here too to avoid expensive network calls.

There is a lot of material available on the use of a cache in software applications and I suggest reading from those sources to arrive at a good caching design in your applications. We shall keep our discussion here limited to a simple use case of using Memcache, which is a Caching Service provided by Google App Engine and how to implement it quickly in your application.

Before we begin (Important!)

We will be introducing MemCache in an existing application that we have written. This was the Dictionary Service Application that we implemented in Episode 4. I strongly urge you to read up that episode and understand what the application was about and have that project ready to make changes accordingly.

To recap, our GAEJ Dictionary application is shown below:

1. Navigate to http://gaejexperiments.appspot.com/dictionary.html. This will show a page as shown below:

s2

2. Enter the word that you wish to lookup the definition for in a dictionary and it will return you the result as shown below:

s3

Introducing a Cache

The request/response flow for the above Dictionary application is explained below via the diagram shown:

CacheEpisode1

The steps were as follows:

1. The user makes a request to the GAEJ Experiments application by sending a request for the word.

2. The GAEJ Dictionary Service Servlet receives the Request and makes a call to an External Dictionary Service hosted at (http://services.aonaware.com/DictService/DictService.asmx).

3. The Response from the Dictionary Service i.e. the definition of the word is received by the GAEJ Dictionary Service Servlet.

4. The Response is sent back to the user application.

The flow is straightforward and typically found in most applications. Now, what happens if several users make a request for the same word? As per the application flow, each request from the user will result in a call to an external service for the definition, irrespective of whether it was the same word or not. This is wasteful in terms of network resources and the application response too. Coupled with the fact that the definition of a word is not going to change overnight :-), it would be nice to return back the definition from the GAEJExperiments application itself, if the word definition had been looked up already. Enter a cache!

So, what we are introducing in our application flow now is a cache called GAEJ Dictionary Cache and the modified application flow is shown below:

CacheEpisode2

The steps now would be as follows:

1. The user makes a request to the GAEJ Experiments application by sending a request for the word.

2. The GAEJ Dictionary Service Servlet receives the Request and checks if the word and its definition is already present in the Cache. If it is present in the Cache, then we short circuit and go to Step 6.

Optional Steps (3,4,5)
3. If the Servlet does not find the definition in the Cache, then it makes a call to the External Dictionary Service hosted at (http://services.aonaware.com/DictService/DictService.asmx).

4. The Response from the Dictionary Service i.e. the definition of the word is received by the GAEJ Dictionary Service Servlet.

5. The Servlet puts the word and its definition in the Cache, so that future requests to determine if the word/definition is present in the Cache are fulfilled.

6. The Response is sent back to the user application.

To summarize, we introduced the Cache that functions as follows:

  • All word definitions looked up from the External Service are placed in the Cache.
  • If a word is already present in the Cache, then it is returned from the Cache itself and an external network call is saved.

The Memcache Service API

A cache is typically implemented as a Map. A Map is a generic data structure that contains a key and its value. You look up the Cache by specifying the key and if it is found, then the value associated with that key is returned. What I am describing here is an over simplification of what a Cache is. There is a lot more to a Cache implementation then just putting a value and getting a value. But we shall keep it simple here.

The Memcache Service API is simple enough to use and there is good documentation available on it here. The Memcache Service implements JSR-107 (JCache Interface). The JCache classes are present in the javax.cache package and that it what you will use.

At a high level, all you need to do is follow these steps:

1. Get a handle to the Cache implementation:

The snippet of code to do that (reproduced from the Documentation) is shown here:

 

import java.util.Collections;
import javax.cache.Cache;
import javax.cache.CacheException;
import javax.cache.CacheFactory;
import javax.cache.CacheManager;

Cache cache;

try
{
CacheFactory cacheFactory = CacheManager.getInstance().getCacheFactory();
cache = cacheFactory.createCache(Collections.emptyMap());
}
catch (CacheException e)
{
// ...
}

 

The code is simple. We get a handle to the CacheFactory instance and then create a Cache. Notice that we are creating a Map i.e. an empty Map. Once we have the Map, then all we need to do is play around with the (key,value) pairs.

2. Put a value or get a value from the Cache

Shown below is how we would use the cache in a simple manner. We invoke the put(key,value) method on the javax.cache.Cache instance. Similarly, to extract a value, we need to invoke the cache.get(key) value. It will return us the value if found, which we can then typecast to the appropriate class.

 

String key;      // The Word
String value;    // The Definition of the Word

// Put the value into the cache.
cache.put(key, value);

// Get the value from the cache.
value = (String) cache.get(key);

 

GAEJDictionaryCache.java

Let us first discuss a utility class that I have written that encapsulates the Caching API. I have made this class a singleton and it is called GAEJDictionaryCache. The source code is shown below:

 

package com.gaejexperiments.networking;

import java.util.Collections;
import java.util.logging.Level;
import java.util.logging.Logger;

import javax.cache.Cache;
import javax.cache.CacheException;
import javax.cache.CacheFactory;
import javax.cache.CacheManager;

public class GAEJDictionaryCache {
public static final Logger _log = Logger.getLogger(GAEJDictionaryCache.class.getName());

private static GAEJDictionaryCache _instance;
private Cache cache;

private GAEJDictionaryCache() {
try {
CacheFactory cacheFactory = CacheManager.getInstance().getCacheFactory();
cache = cacheFactory.createCache(Collections.emptyMap());
}
catch (CacheException e) {
//Log stuff
_log.log(Level.WARNING, "Error in creating the Cache");
}
}

public static synchronized GAEJDictionaryCache getInstance() {
if (_instance==null) {
_instance = new GAEJDictionaryCache();
}
return _instance;
}

public String findInCache(String word) {
if (cache.containsKey(word)) {
return (String)cache.get(word);
}
else {
return null;
}
}

public void putInCache(String word, String definition) {
cache.put(word,definition);
}
}

 

Let us discuss the key parts of the code:

  1. The Singleton design pattern should be obvious over here and the application needs to use the getInstance() method to obtain a handle to this singleton.
  2. The constructor of this class is private and called only once. In that an instance of the Cache is created.
  3. There are two utility methods written : findInCache and putInCache.
  4. The application invokes findInCache by providing a key value. If the key is found, the value is written via the cache.get(key) method. For our dictionary application, the key is the word that you wish to look up the definition for.
  5. If the application wants to put a key,value record into the Cache, then it invokes the putInCache(…) method that takes the key and the value. For our dictionary application, the key is the word and the value is the definition of the word.

By encapsulating the MemCache Service API in this fashion, you can create a reusable class that takes care of Cache API details. You could then improve upon this class and provide advanced Cache features and APIs and reuse it in all your future GAEJ applications.

Modifying the Original GAEJDictionaryService.java class

All that remains now is for us to modify the existing GAEJDictionaryService.java class by introducing the appropriate cache usage. The modified code is shown below and I suggest to look at the comment //MEMCACHE to make it easier for you to see what I have changed.

 

package com.gaejexperiments.networking;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.StringReader;
import java.net.URL;
import java.util.logging.Logger;

import javax.servlet.ServletException;
import javax.servlet.http.*;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

@SuppressWarnings("serial")
public class GAEJDictionaryService extends HttpServlet {
public static final Logger _log = Logger.getLogger(GAEJDictionaryService.class.getName());
public void doGet(HttpServletRequest req, HttpServletResponse resp)
throws IOException {

String strCallResult = "";
resp.setContentType("text/plain");
try {

//Extract out the word that needs to be looked up in the Dictionary Service
String strWord = req.getParameter("word");

//Do validations here. Only basic ones i.e. cannot be null/empty
if (strWord == null) throw new Exception("Word field cannot be empty.");

//Trim the stuff
strWord = strWord.trim();
if (strWord.length() == 0) throw new Exception("Word field cannot be empty.");

//MEMCACHE
//First get a handle to the Cache
GAEJDictionaryCache _cache = GAEJDictionaryCache.getInstance();

//Determine if the value is present in the Cache
String strWordDefinition = _cache.findInCache(strWord);

//If the word/definition is present in the Cache, return that straightaway, no need for external network call
if (strWordDefinition != null) {
//Return the definition
_log.info("Returning the Definition for ["+strWord+"]"+" from Cache.");
strCallResult = strWordDefinition;
}
else {
_log.info("Invoking the External Dictionary Service to get Definition for ["+strWord+"]");
//Make the Network Call
String strDictionaryServiceCall = "http://services.aonaware.com/DictService/DictService.asmx/Define?word=";
strDictionaryServiceCall += strWord;
URL url = new URL(strDictionaryServiceCall);
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
StringBuffer response = new StringBuffer();
String line;

while ((line = reader.readLine()) != null) {
response.append(line);
}
reader.close();

strCallResult = response.toString();

DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(strCallResult.toString())));

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//Definition[Dictionary[Id='wn']]/WordDefinition/text()");

Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
strCallResult = nodes.item(i).getNodeValue();
}

//MEMCACHE
//Need to check depending on your logic if the values are good
//Currently we will assume they are and put it in the cache
//For e.g. if the word is not found, the Dictionary Service gets the word back.
//So you could use that logic if you want.
_cache.putInCache(strWord, strCallResult);
}

resp.getWriter().println(strCallResult);

}
catch (Exception ex) {
strCallResult = "Fail: " + ex.getMessage();
resp.getWriter().println(strCallResult);
}
}

@Override
public void doPost(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
doGet(req, resp);
}

}

 

Let us discuss the modified flow in brief:

  1. We first extract out the word request parameter and do some basic validation to make sure that it is not empty.
  2. We get the handle to the Cache by calling our utility class that we just wrote.
    //First get a handle to the Cache
    GAEJDictionaryCache _cache = GAEJDictionaryCache.getInstance();
  3. We check if the word is present in the cache. The findInCache method will return null if not found.
    //Determine if the value is present in the Cache
    String strWordDefinition = _cache.findInCache(strWord);
  4. If the definition is returned from the cache, then we simply return that and no network call is made.
  5. If the definition is not found, then the network call is made, the response stream is parsed out for the definition. And the most important step is to put this definition in the cache, so that the findInCache(…) method will get the definition, the next time another request for the same word is made.

    _cache.putInCache(strWord, strCallResult);

Try it out

You can try out the modified Dictionary Service application at: http://gaejexperiments.appspot.com/dictionary.html. The first time that you search for a word, it might take some time to get back the definition but once present in the cache, it is returned much faster. Do note, that if the same word has been looked up by another user, then the word and its definition will be present in the cache.

Cache Design considerations

A cache if implemented well will save your application from repeated resource intensive operations and also improve application response times significantly. Of course, simply putting in a cache comes with risks. Some of the factors, you need to take into consideration are:

1. Determine precisely which operations in your application would benefit from a cache.

2. Analyse if the data that you are putting in a cache changes frequently or not. If it changes frequently, then it might negate the benefits of a cache.

3. Control the size of the cache and prevent it from becoming too large or unmanageable. The way we have implemented it, the cache will keep growing and we are leaving it to the App Engine run time to truncate our cache due to memory limitations if any.

4. Take into consideration several other factors like what is an interval across which you want to refresh the cache, cache expiration policies, etc.

In short, monitoring the cache and tuning its parameters is required from any production worthy application.

Moving on

This brings Episode 8 to an end. Hope you enjoyed reading it.

Read more Episodes on App Engine Services

 

Dictionary Application in Action

In order to maintain a consistent style across the episodes so far, let us first watch the application in action to better understand what we will be building over here. Follow these easy steps:

  1. Navigate to http://gaejexperiments.appspot.com. This will result in a page as shown below:s1
  2. Click on Dictionary Service link. This will lead you to a page shown below:s2
  3. Enter any word that you wish to lookup in the dictionary and click on Lookup Dictionary. For e.g. engine. This will display the meaning of the word as shown below:s3
About these ads

About rominirani

Google Developer Expert Cloud 2014. Harnessing the power of software by learning, teaching and developing simple solutions. I love learning about new technologies and teaching it to others.
This entry was posted in Cloud Computing, Google App Engine. Bookmark the permalink.

8 Responses to Episode 8: Using Memcache in your GAEJ applications

  1. Shponter says:

    Great stuff! I find your blog very usefull and interesting. Waiting for more :)

  2. Pingback: Tweets that mention Episode 8: Using Memcache in your GAEJ applications « Google App Engine Java Experiments -- Topsy.com

  3. Wadael says:

    Another ep. to makes eager to read the next one.
    You’ll kill the business of GAEJ books, do you know ?

    Thanks :)

    • rominirani says:

      Thanks for the feedback.

      I hope what I am writing here will make forthcoming GAEJ Book authors write a little more, so that we can all learn more :-)

      I plan some more episodes down the line and combine all the episodes into a book and give it away for free!

      Cheers
      Romin

      • Wadael says:

        This is generous.
        Congratulations.

        I bet all books will miss chapters like :
        -“easy offline testing (with storage)”
        -“load evaluation”
        -“how to detect GAEJ regressions ?”
        -“keeping up to date with corrections”

        This is a few of the cons I’ve heard (and share) about GAEJ at a local barcamp.

      • rominirani says:

        You have raised the bar with the chapters that you have listed. The pressure is now on me to write about them :-)

      • Wadael says:

        Oh no, do not think that, pressure is close to low.
        Why ?

        Because, for
        -”easy offline testing (with storage)”
        Impossible because offline and online behaviour is different :(
        JDO is a PITA when you think ‘relationnal’ (or want to query on several fields)

        -”load evaluation”
        Or how to evaluate when the free quota will be over (in terms of users served) ? Which limit will be reached first (bandwith, cpu, db, requests ?)
        Ok, it may be possible but result will only be an evaluation as online and offline behaviours are different.

        -”how to detect GAEJ regressions ?”
        I have heard people with more experience on GAEJ than me that some features ceased working from a version to another, or their behaviour was different (sorry, no more details).
        Should we write tests to verify the API we use are working as we expect ?

        -”keeping up to date with corrections”
        Unlike others, this chapter is possible.
        As you have noticed too, your robot (and mine) introduced themselves twice. I spent time looking in the code. I tried to set a workaround.
        It has been corrected before I knew it was a bug of GAEJ and not of my code.
        As you have pointed, there is a bug tracker for GAEJ.
        It’s just that I (like many?) did not know where it is.

        Other chapter ideas :
        – best practices :)
        – making GAEJ relational (look for Gilead)
        – how to get unlimited quota
        – …

        Best Regards,

        Jérôme

  4. Pingback: uberVU - social comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s