First published on IBM developerWorks.
With Google's current and ever expanding index of over 3 billion Web pages, it is easy to understand why it is one of the best tools to conduct industrial-strength research on the Internet. In addition to its quantity of Web pages, the quality of the search results is high due to its proprietary search algorithm that is based on relevancy and popularity. Recent reports indicate that Google answers 200 million search requests daily (an average of 2,300 per second!) in 88 different languages. And according to SearchEngineWatch.com, Google handles 75 percent of all Web-based search queries.
Obviously, many Web users are familiar with Google and its search features. What may not be as well known is the fact that you can incorporate Google's search functionality into your own Domino applications using Google's Web API service. (You can download this API free as we explain in the next section.) This article describes how to add Google's search functionality to a Domino application. We briefly introduce the Google APIs and how they work. Then we examine a sample application that integrates Google search functionality with a Domino application. We conclude by offering some ideas how you can expand and customize our simple example. (All examples shown in this article are contained in a Domino database you can download from the Sandbox.)
This article assumes that you're an experienced Domino developer familiar with writing Domino agents. And although advanced Java experience is not required to understand our example, a basic knowledge of Java is highly recommended.
The Google APIs
To use Google's APIs, download the Google Web APIs developer's kit (googleapi.zip) from Google's Web APIs download page and create a free Google account for a license key. The key allows you a maximum of 1,000 searches daily. (You must include this license key in your query when making the search request. You do this programmatically, so users don't have to type in the code each time they perform a query.) The developer's kit comes with documentation and examples that give you a basic understanding of how Google's automated searches work.
After you download googleapi.zip, unpack it to the c:googleapi directory. You can then quickly invoke the Web APIs by typing the following command at the MS-DOS prompt:
java -cp googleapi.jar com.google.soap.search.GoogleAPIDemo <key> search Lotus Domino > result.html where < key > is your registration key, "Lotus Domino" is the item you want to search for, and result.html is the name of the file into which we want the search results to be written. For instance, the following example:
Listing 1. Search results
shows the first 10 items in the display results from an estimated 663,000 items.
However, if you try the same query on the Google Web site, you may get a different result. For example, the same query for "Lotus Domino" at Google results in an estimated 596,000 items. (The Google API Support team has confirmed that both the Web APIs and the Google site itself use the same search engine and index pages, but slight differences in results may be caused by the many data centers that Google operates.)
A quick look at GoogleAPIDemo.java in the googleapi directory:
Listing 2. Snippet from GoogleAPIDemo.java
reveals that the doSearch() method invokes the Google search, after the license key and the search query string attributes are set.
Before we move on to our example, let's take a quick look at several methods in Google's Web API that you will use:
- setKey() sets the user license key.
- setQueryString() sets the query string.
- doSearch() invokes the Google search.
- toString() returns a formatted representation of a Google search result.
- getResultElements() returns an array of result elements.
- getSnippet() returns a snippet that shows the query in context on the URL where it appears.
- getSummary() returns the ODP summary if this document is contained in the ODP directory.
- getTitle() returns the title of the search result, formatted as HTML.
- getURL() returns the absolute URL of the search result.
- setStartResult() sets the index of the first result to be returned. For instance, if there are 500 results, you may want to start at 100.
- setMaxResults() sets the maximum number of results to be returned per query. The maximum value per query is 10. If you perform a query that doesn't have many matches, the actual number of results you get may be smaller than what you request.
- setFilter() enables or disables the "related-queries" filter. This filter eliminates results that are very similar.
Let us begin our example by creating a Notes agent called simpleSearch in a new database named GoogleSe.nsf. You can download this database from the Sandbox. This database contains the simpleSearch agent, as well as the goSearch agent and the forms and view discussed later in this article. We built this application on Windows 2000, but as a Java application it can run on any platform. No special proxy setting is required to run the Google APIs from Lotus Domino. When the APIs are invoked, they use whatever configuration Domino has.
First, copy the following lines of code from the file GoogleAPIDemo.java in the developer's kit (note that clientKey is initialized by the license key sent to you after you register with Google):
Listing 3. simpleAgent code
In Domino Designer, create the simpleSearch agent and paste in the preceding code. Then execute the agent. (The search string in this example is initialized to "Lotus Domino.") To view the results of this query, open the Java Debug Console after simpleSearch finishes running. You should see results similar to the following:
Figure 1. simpleSearch results
Our simpleSearch agent provides a quick way to view your search results, showing the first 10 of the estimated 720,000 items found. For example, the first item contains the following:
Listing 4. Example search result
If you run the agent locally, make sure googleapi.jar is in the Notes program directory and is assigned to JavaUserClasses in your local Notes.ini file. (JavaUserClasses has a limitation of 256 characters, so if adding googleapi.jar to it exceeds this limit, it won't be recognized by Domino.)
Adding a Google-like interface
The search results returned by simpleSearch are informative, but are obviously unsophisticated in terms of presentation. However, you can greatly improve the usability of these results by presenting them in a format similar to the one used by Google. To do this, we create a new agent called goSearch, using the code in simpleSearch as the starting point. We also add two forms, Search and Result, to the GoogleSe.nsf database. (The complete GoogleSe.nsf database, containing the forms as well as the goSearch and simpleSearch agents, is available for download from the Sandbox.)
Search and Results forms
First, add a query form (which we call Search) to your GoogleSe.nsf database. This allows users to enter and save their query strings:
Figure 2. Search form
Then add an output form named Result to display the results of the queries:
Figure 3. Results form
The goSearch agent
Now let's build an agent that offers more sophisticated functionality than the simpleSearch agent, including automation to process a list of queries and a more "Google-like" look and feel. To begin, copy the code in simpleSearch and paste it into goSearch. Then add the following code to have the results title appear in the same text color used by Google and to render HTML in rich text fields in Notes:
Listing 5. Code for displaying results in Google format
Next, add the following code to the goSearch agent. This snippet incorporates the query results into the Result document:
Listing 6. Code for incorporating query results into Result documents
Finally, add the following to execute all query strings saved as Search documents and generate corresponding results as Result documents.
Listing 7. Code to execute queries and generate Result documents
The preceding example allows for batch queries in Google's search engine and batch results saved for later analysis. Because results are saved, it also lets you analyze trends by monitoring the frequency with which certain topics are searched. You can do this by sorting and displaying Result documents by query string and date, as shown in the following AllResultsByTitleDate view:
Figure 4. AllResultsByTitleDate view
The Title field is a rich text field. Therefore, you need to create the new text field TitleText to store the unformatted text value in Result documents, so you can use it as a column value in the preceding view:
ndoc.appendItemValue("TitleText", title.getFormattedText(false, 0, 0));
Other ideas to consider
The goSearch agent can be scheduled to run at an off-peak hour to reduce traffic and improve efficiency. Another idea to improve the goSearch agent is to call the parser whenever an HTML formatted string is encountered:
Listing 8. Code to call parser in goSearch
This results in a more compact agent class.
You can further extend the concept of rendering HTML input strings into Notes rich text fields by expanding the set of HTML tags that this parser can handle (for example, to <h1>…</h1> or <p>) through the use of the delim1 and delim2 parameters.
You can harness the power of Google in Domino to query high volumes of search strings. Their results are stored as documents and, over a period of time, can be used in trend analysis. In addition, you can monitor when new information for specific topics is available as Web pages.
Equally important, your Domino applications can now offer Google search functionality to your users. This provides search features with which your users are likely to be very familiar, so they should find these features easy to use. This functionality is also known to be highly stable and reliable. You should consider using Google's API in any Domino application in which fast, large-volume Web searches are a requirement.
- Download the Google Web APIs developer's kit (googleapi.zip) from Google's Web APIs download page.
- Download the sample code used in this article.
- "Integrating Amazon Web Services with your Domino applications" is the first of a three-part article series explaining how you can use Amazon.com's Web functionality in your Domino applications.
Sui-Man Chan has worked in the transportation and banking industry since the early 1990's and has been involved in various stages of application and systems development. Sui comes from a background of mainframe, client-server, and Web-based platform systems. She has been involved with IBM Lotus Domino application development since Release 3 in 1995.