While the primary job of an IT professional who works for a company is to make the technology side of things work as smoothly as possible, it’s also a good idea if they keep an eye on the company’s bottom line. A healthy business is one that keeps costs under control and utilizes financial resources efficiently. When the spending of one individual or department in the company gets out of hand, everyone employed by the enterprise can suffer. I recently talked with Bob Collins, an IT pro who had a story to tell about how he helped a business with which he had a contractual working relationship avoid an unnecessary half-million dollar expenditure on a software upgrade. Yes, you heard me right — he saved the company HALF A MILLION DOLLARS. He accomplished this first through careful technical sleuthing concerning an upgrade that seemed to go wrong, but he also did it by recognizing that occasionally you will run into dishonest vendors who aren’t always to be trusted in what they promise and deliver.
Bob grew up in small-town USA and has four degrees: AS-Electronics, BA-Religion, BS-Psychology, and BS-Electrical Engineering. Bob’s work history has all been IT related, and over the years he has planned, installed, and managed everything from the datacenter to the desktop from the ground up. He has specialized in UNIX system administration (UX, Solaris, AIX, and Linux) and database administration (Oracle up to 12c and SQL Server up to 2014). Bob has also worked on the hardware for all of these operating systems and database products, so he’s pretty much a jack-of-all-trades when it comes to both hardware and software in the OS and database areas.
Bob begins his story by providing a bit of background info about himself and then moves into the nitty-gritty sort of troubleshooting stuff one commonly finds down in the mucky trenches of IT. I hope you enjoy Bob’s story and learn from not just his technical diligence but also his “Doveryai, no proveryai!” (“Trust, but verify!”) attitude toward the vendors your organization works with — and that includes the usually honorable and the occasionally dishonest vendors. Let’s turn the floor over to Bob.
Job changes are not always just a job change
My IT career began when the engineer I was working with walked in as told me to “have a seat!” He proceeded to tell me he was taking a job in California, leaving the next week. That was when you could shut a system down, remove the main boards, clean the contacts with an eraser, get rid of any residue on the board, re-insert it into the backplane, and start the system up again. The system would run like a champ. Plus, I had a degree in electronics, which was very helpful. Did a lot of board level work as well as OS level and database work. That experience has been a blessing more times than I can count.
Prior to my contracting days, I worked for a cellular provider. I left a public utility as a Sr. IS Analyst, working as an Oracle DBA. I moved to the cellular provider as a database administrator (multiple products) and UNIX systems administrator (multiple OS). The attraction was immediate — much better pay, better benefits, less stress, and a terrific working environment.
At the first staff meeting at the cellular provider, I was introduced and received quite a list of problem areas that needed some serious attention. They ranged from disk arrays to OS to database problems, more than I have room to mention. Most folks said, “We’re glad you’re now the owner of these issues!” I wondered if I had made the right decision more times than one!
Reconciling RAID firmware levels
Cellular providers have to defend against the fraudulent use of their phones and systems. There are third-party companies that specialize in this. The problem that rose to the top for us was the Fraud Detection System (FDS). It had its own server and RAID disk array. (A brief aside — this cellular provider had been chosen “best of breed” for their systems and software. This meant a variety of different systems and many different variants of software and OS.) For weeks/months, there had been all sorts of errors showing up in this system. The error rate was terrible, so bad that it caused the disk array to actually lose some data.
Experience had taught me that systems using drives with RAID implemented needed to have their firmware at the same level or at least at the highest level for that version of the disk drive. (A little tidbit I had learned from an EMC technician.) A check of firmware levels showed they were all over the place! Some were nearly 24 months out of date, others just a few months and still others were very close to current. With the help of the server manufacturer, we got all the drives close to where they needed to be. After a dutiful reboot of the system, I followed the errors for about two weeks. The error rate dropped to nearly zero. I thought the FDS was done, but it would come back to haunt me.
Mysterious problems from an upgrade
We maintained the server/disk array, but the FDS was maintained by the company that wrote the FDS software. The FDS folks contacted us, stating they needed to install a major upgrade to their software. No problem. They sent us a DVD and I loaded it when they were ready.
The installation took place over a weekend when the FDS was usually running relatively little data. Over the next several weeks, I noted the server CPU was maxed out most of the time. The disk array was also showing heavy use, close to 90 percent usage. After about three weeks of this, the FDS folks contacted us, stating they thought we needed to upgrade our server and disk array. It was a pricey resolution, on the order of $500,000.
Troubleshooting a database without looking inside
Since the FDS was a “protected” system, we had no access to the data. This was done to remove any possibility of the data being changed from our end. This also limited my ability to verify that the hardware had reached the end of its useful life. I would not “bless” an upgrade of $500,000 without verifying the need. I needed a way to confirm what was going on in the system.
I knew the data was still the same, i.e., they were using the same data items and types. I had records from prior to the product upgrade showing:
- System throughput
- CPU utilization
- Memory utilization
- Disk utilization — including paging
- Network utilization
I used these items to set a baseline, a point to evaluate the product upgrade. I started watching the system, comparing the upgrade utilization vs. the “original” utilization. Inside the FDS software, data was being analyzed the same way. We were told that the FDS software had been rewritten, using different algorithms to get the same results. The new algorithms were “more efficient.”
After watching items 1 to 5 above, I had solid evidence showing what the system was doing. This along with the data I had prior to the FDS upgrade gave a good view of what was happening. You could tell exactly when FDS began analyzing a “load” of data. Using some additional software, I was able to “peek” at processes and see what they were doing.
I found that queries were poorly coded. In some cases, tables needed indices, as full-table scans were being done on tables that had 500,000 rows and in cases, much more. In others, the indices were not being refreshed. I found that tables were poorly designed, causing multiple joins. They were not even close to being normalized. Since I was looking at processes, I could not change anything, which prevented any violation of the software agreement.
Root of the problem
When I added all the problems together, performance was awful. I created a report showing the results from prior to the FDS upgrade and after the FDS upgrade. I showed my supervisor what was going on. He was not too happy with the software vendor. The degradation in performance was about 400 percent to the negative side. When we sent the report to the software vendor, they realized that they had to make some serious changes. As I remember, there were some “strong” words exchanged.
The bottom line is this: The vendor tried to pull one on a customer (us), thinking we would not find the problems. It was painfully obvious that the software was poorly coded. The software vendor had to rewrite their software. We (the cellular provider) were not left holding the bag for a $500,000 equipment upgrade that was unnecessary!
The lesson for all of us is to enter any situation with an open mind and clear vision. Use every resource you have available to check things out. Do not just believe everything you are told — yes, you will run into dishonest vendors. Stop and verify along the way. Finally, never fail to ask questions if you do not know and educate yourself as much as you can. Then, proceed with a bit of caution. Things have been known to bite us when we are not looking.
Photo credit: Shutterstock