Earlier this week, I received a phone call from a reporter with one of the major IT related Web sites. She was doing a story on Microsoft's decision to stick with the Jet database format in the next version of Exchange. She was contacting me because she wanted my thoughts on any limitations to the current Exchange database format. I will spare you the details of our conversation, but one of the things that came up was the database's size limit. I explained that Exchange Server 2003 Enterprise Edition is not subject to the 16 GB storage limit imposed by other versions of Exchange, but has a 16 TB store size limit instead. I went on to explain that although Exchange Server 2003 Enterprise Edition allows you to create a huge database that the current practical limit to an Exchange database is about 35 GB.
We ended our call and I didn't think much more about it until I started being bombarded with E-mails from people disputing my statement about a 35 GB practical limit to the Exchange information store. It seemed that everyone (including the site's editorial staff) wanted me to prove my statement. The problem was that I couldn't seem to recall where I had read about the limitation. I could have read about it on the Microsoft Web site, or it could have been in one of the many E-mail based newsletters that I receive. The point is that I had no good way of searching for the information. I did eventually find a reference to the disputed statistic at: http://support.microsoft.com/default.aspx?scid=kb;en-us;823144 but this still wasn't where I had originally found the information.
The reason why I am telling you about all of this is because Google and several other companies want to solve this particular problem. The idea is that you should be able to search your own computer just as easily as you search the Internet. There are several utilities available that will allow you to search your own computer (including a crude search tool built into Windows), but Google has recently released its own desktop search program.
Before you can really understand the risks associated with Google's Desktop Search, you need to know a little bit about how it works. Google Desktop Search is what you might call a distributed application. Part of the application gets installed onto your own computer, while part of it runs off of Google's Web site. The idea behind coding the application in this manner is that you can visit Google and search either the Internet or your own PC through a single interface.
After installing Google Desktop Search, an index is made of the contents of your PC. Included in the index are E-mail messages contained in Outlook and Outlook Express (messages in the Deleted Items folder are not indexed, nor are notes, contacts, journal entries, or to do lists). The software also indexes Microsoft Word, Excel, and PowerPoint files, as well as plain text and AOL Instant Messenger chats.
More interestingly though is the way that the software's index is updated and the way that documents are cached. The Google Desktop Search works similarly in that it caches at least one copy of anything that gets indexed. The caching is intended as a time saver because it allows you to preview a document without actually having to open it. The other thing that the caching feature allows you to do is to maintain a document history. For example, suppose that you accidentally modified a spreadsheet incorrectly. If you needed to get the original data back, you could open a cached copy from a date prior to the modification. The data would probably look rather funky because it is being displayed outside of Excel, but it can help you rebuild your document.
Documents aren't the only thing that Google Desktop Search caches. Any time that you receive an E-mail or visit a Web page, the information is automatically cached. This allows you to view previous versions of rapidly changing Web pages. For example, if you wanted to see what MSN's headline was yesterday, you could just look at a cached copy.
As you can see, Google Desktop Search can index a wealth of information and place it all at your fingertips for easy access. The problem is that a major security hole has already been discovered.
The security hole (which has supposedly been fixed) was based on the fact that part of Google Desktop Search was Web based. A recent study at Rice University analyzed packets as they were transmitted between Google and a machine running Google Desktop Search. It was soon determined to be fairly easy to spoof such packets and trick a machine into delivering desktop search results to a remote machine over the Internet.
To prove the concept, Rice University developed a Java applet that could theoretically be run on a malicious Web site. If someone who was running Google Desktop Search were lured to the Web site, the site's owner would be able to collect enough information to be able to remotely search the victim's computer.
Although Google claims to have fixed the problem, this particular security hole could have proved disastrous. Imagine the consequences if someone with malicious intent were able to remotely search your hard disk. Even if you didn't have any sensitive documents on your hard disk or any confidential E-mail messages, the cache contains a full history of all of the Web sites that you have visited.
At first, someone being able to search your Web history probably sounds harmless. Sure, it could be a little embarrassing if you are in the habit of visiting a lot of adult Web sites, but that's the worst that can happen, right? Wrong!
Any time that you shop online, check your bank account, or log into a site requiring a membership, the transmission of anything sensitive is usually encrypted with SSL. The problem is that although SSL encryption is enabled, the encrypted Web page is displayed in its decrypted format when you look it. This is what Google Desktop Search caches. In other words, Google Desktop Search can cache any Web page, even if it was originally SSL encrypted. This means that if someone can query your local machine, they can gain full access to such pages.
OK, so there have been some security issues with Google's Desktop Search software, but Google has supposedly fixed all of those issues, so what's the problem? The problem is that the security holes which have already been demonstrated raise future concerns for privacy. Especially when you consider that other companies are working on similar products.
At the risk of sounding paranoid, I want to take a moment and discuss some of the future risks that I perceive with this type of indexing. Surely, no one would dispute that Spyware has become a huge problem over the last couple of years. Spyware modules exist that are able to log keystrokes and send them to a database somewhere on the Internet. There are also spyware modules that are able to manipulate Internet Explorer or other operating system files.
I'm not picking on Google because there are several companies that are currently developing similar global desktop search products. However, it stands to reason that sooner or later, one of those search products will become dominant, just as Google has become the dominant search engine for the Internet. Once that happens, and the indexing program becomes wide spread, I think that it will only be a matter of time before someone figures out how to read the contents. If someone can create a spyware module that can read this database, and is then able to infect a large number of computers with the module, then the person who created the spyware module could theoretically search millions of PCs simultaneously.
Sure, the person who is doing the search would have to have a specific thing that they were searching for, but there are all kinds of useful things that a hacker could search for. For example, Google allows you to search a number range. Several months ago, it was possible to search a range of 16 digit numbers, starting with the first four digits of your Visa card number and steal credit card numbers through the search engine. While Google has taken steps to remove that vulnerability, it may be possible to perform that type of search against your own local machine; or against someone else's local machine.
Although I am glad to see companies coming out with innovative search solutions for local machines, I believe that these companies will have to exercise extreme caution to avoid creating a wide spread security nightmare. Companies and individuals should avoid loading such software onto any machine containing sensitive data until the kinks have been worked out.