Making the Windows Server 2003 Indexing Service Useful
Before I Begin
Before I get started, I feel obligated to mention that creating a Web based query interface isn’t exactly my idea. Microsoft published a knowledgebase article (http://support.microsoft.com/kb/q238791/) that explains how you can use the ASP.NET ISXXO.QUERY object to query an index. Many developers have designed Web pages similar to the one that I am about to show you. Although I wrote the code found in this article myself, some of the techniques used were borrowed from Microsoft and from various other programming Web sites.
Another thing that I want to mention before I get started is that this article assumes that you have read my indexing service article from last month. If you haven’t read that article yet, please do so before attempting the techniques shown in this article.
Preparing the Web Server
Even if you aren’t planning on indexing Web content, this is a Web based application, and it will have to run on an IIS server. Hopefully, your IIS Server is already up and running and you can add the Web pages that we will be creating to the default Web site. If no though, you may have to install and configure IIS.
To install IIS, open the Server’s Control Panel and select the Add / Remove Programs option. When the Add or Remove Programs dialog box appears, click the Add / Remove Windows Components button. After a brief delay, Windows will open the Windows Component Wizard.
Select the Application Server option and click the Details button. At this point, select the ASP.NET check box., as shown in Figure A. Now, highlight the Internet Information Server (IIS) option and click the Details button. As you can see in Figure B, IIS has a lot of components. At a minimum, you will need to select the Common Files, Internet Information Services Manager, and the World Wide Web Service.
Figure A: ASP.NET must be installed in order for this application to run
Figure B: At a minimum, you will need to select the Common Files, Internet Information Services Manager, and the World Wide Web Service
Now, just click OK twice, click Next, and follow the prompts. You may be prompted to insert your Windows Server 2003 installation CD to complete the installation process.
After the installation process completes, you will have to prepare IIS to run the application that we are building. Setting up Web sites is an art in and of itself. Since I’ve got a lot of material to cover, we are just going to make the new application a part of the default Web site and use a minimal configuration to get the application up and running rather than using an elaborate, high security / high performance configuration.
To configure IIS, select the Internet Information Services (IIS) Manager command from the server’s Administrative Tools menu. When you do, the IIS Manager console will open. Navigate through the console tree to your server | Web Sites | Default Web Site, as shown in Figure C.
Figure C: Navigate through the console tree to the default Web site
The first thing that we have to do is to make sure that the default Web site is running. The easiest way to do this is to right click on it and make sure that the Start option on the shortcut menu is grayed out. If the Start option is displayed in black then click Start to start the site.
Next, we have to assign the default Web site an IP address (technically, we don’t have to, but doing so makes life easier on everyone). To do so, right click on the default Web Site and select the Properties command from the resulting shortcut menu. When you do, you will see the Web site’s properties sheet. Select the server’s private IP address from the IP Address drop down list found on the properties sheet’s General tab, as shown in Figure D. It is important to use the private IP address because you don’t want to accidentally allow outsiders to index your server. Now just click OK, close the IIS Manager, and you are ready to start setting up the Web application.
Figure D: Assign the server’s private IP address to the Default Web site
The Query Form
The Web application that we are creating consists of two separate files. Both files should be saved to the server’s C:\Inetpub\wwwroot folder. By default, the IUSR_servername account has read access to this folder. When a user connects to the site anonymously, Windows Server uses the permissions associated with the IUSR_servername account to determine what the user can and can’t access.
The first file that you will need to create is a simple HTML file. This file allows the user to input a query string and then passes that query string on to an ASP file that I will discuss in a moment for processing. As you can see in the source code below, there is really nothing fancy about this file. It simply allows the user to input a text string. The text string is assigned the variable name searchstring. The Form Action command then passes the contents of the searchstring variable to the results.asp file.
<title>Index Service Query Tool</title>
<form action="results.asp" method=post>
<p>Enter the text that you want to search for<br>
<input type=text name="searchstring" size="50" maxlength="100" value=" "><br>
<button type=reset>Clear Form</button>
The Results Page
The Results page is where all of the magic happens. The Results page is coded in ASP (Active Server Pages). I don’t want to turn this article into a crash course in ASP, but I will tell you that ASP pages are processed on the server and the output is sent to the user in HTML format. If you look at the code below, you will see that it contains a mixture of HTML code and ASP code. I used as much HTML as I could in an effort to simplify the page for those who may not be familiar with ASP. Blocks of ASP code are separated from HTML code by the <% and %> markers. ASP files should be saved with the .ASP extension rather than the .HTM extension so that the server knows to process them as ASP. The following file should be named RESULTS.ASP.
The file above is broken into three sections; initialization, query, and results. The Initialization section looks like this:
' This section sets the various configuration variables
pagesize = 5000
In this section, we are defining the variables that will be used in the query. The Formscope variable is set to / which tells the query to start at the top of the index. The pagesize and recordsize variables tell the application the maximum number of search results to return. The searchstring variable is inherited from our QUERY.HTM file that I talked about earlier. The catalogtosearch variable specifies the name of the index that you want to search (you defined the index name in last month’s article). The searchrankorder variable is set to rank[d] which means that the rankings will be presented in a descending order with the most relevant results being displayed first. Finally, the origsearch variable keeps track of the user’s original search string.
Now, let’s talk about the query section, shown below:
'This section performs the query
q.columns="doctitle, filename, size, write, rank, directory, path"
In this section, there are two calls to the ixsso object. This is the indexing object that does all of the work. Notice that underneath these calls, there are a number of q.variables that are being defined (q.query, q.catalog, q.sortby, etc.). These are the variable names that the ixsso query is expecting to use. Technically, we could assign values to these variables directly. Instead, I chose to assign values to the variables that I talked about earlier in an effort to make the code easier to read. This section then sets the q. variables to reflect the values assigned to the alternate variables earlier.
One line that’s especially worth paying attention to is this one:
q.columns="doctitle, filename, size, write, rank, directory, path"
This line tells the query what information you want the indexing service to display. The Doctitle option tells the indexing service that you want to know the document’s title. The filename and size options pull the document’s filename and byte count accordingly. The write option pulls the document’s date and time stamp. The rank option pulls the document’s relevance to the search query based on a score ranging from 1 to 1,000. The directory pulls the directory that the file is found in, while the path option tells the query to pull the directory and filename as a whole. I will talk more later on about how I am using this information.
The last section, shown below, displays the query results:
'This section displays the results
response.write"<p>Your search for <b>" & origsearch & "</b> produced "
if rs.recordcount=0 then response.write "no results"
if rs.recordcount=1 then response.write "1 result: "
if rs.recordcount>1 then response.write(rs.recordcount) & " results: "
<table border=1><tr><td><b>Title</b></td><td><b>Filename</b></td><td><b>Date / Time</b></td><td><b>Size</b></td><td><b>Relevance</b></td><td><b>Directory</b></td></tr>
do while not rs.EOF
response.write "<tr><td>" & rs("doctitle") & "</td><td>" & "<a href=" & "'" & rs("path") & "'" & ">" & rs("filename") & "</a>" & "</td><td>" & rs("write") & "</td><td>" & rs("size") & "</td><td>" & rs("rank") & "</td><td>" & rs("directory") & "</td></tr>"
This section creates a record set based on the query results. It then sets the rs.recordcount variable to reflect the number of records in the record set and displays that variable as the number of results. Next, there is some simple HTML code that creates a table header. After that there is a loop that writes the contents of each record in the recordset. When the loop completes, the variables are nulled out and the script ends.
So what does all of this code look like in action? Keep in mind that I designed this code to be simple, not to produce pretty output. You are of course free to customize the code in any way that you see fit. The query page looks like what you see in Figure E. The results screen is shown in Figure F.
Figure E: This is where the user enters the query
Figure F: These are the query results
When you run the query yourself, you may find that not all documents have titles. Some older versions of Microsoft Office did not automatically create meaningful titles, so that is the reason why.
You might also notice in the output that we are displaying the contents of the doctitle (Title), filename, write (date & time), size, rank (relevance), and directory variables. However, as you might recall, the code also had the Indexing service to assign a value to a variable called path. The path variable contains the full path and file name of the file result being returned. The reason that I had the code to pull this variable is because I used it to hyperlink the filename in the output.
Normally, when you create an HTML hyperlink, the syntax looks something like this:
<A href=”path and filename”>text that is hyperlinked goes here</a>
I simply used the path variable to provide the path and filename for the hyperlink, and used the filename variable to display the filename. The code that accomplishes this is this line (this is an excerpt from a much longer line):
"<a href=" & "'" & rs("path") & "'" & ">" & rs("filename") & "</a>"
In this article, I have explained that although the Indexing Service can speed content queries, it is not easily accessible to users. I then went on to show you how you can create a Web based tool that allows users to run queries against the indexes that you create. Users can access the tool by opening a Web browser on a computer that’s connected to your corporate network and entering http://server’s IP address/query.htm