The Data Skills Server Admins Should Master
Administrators are often shielded from the underlying complexity of business data, with more specialized business intelligence analysts often handling the majority of data integration projects. However, as times goes on and as “big data” eventually morphs into “routine data”, it will become increasingly important for systems administrators to gain understanding for how to best support what is poised to become routine operations, even as small organizations. Moreover, I believe that all IT pros should occasionally leave their comfort zone and explore topics outside their traditional areas of expertise. Finally, I’m also a believer that we need to tear down the walls that divide IT and, even if an organization doesn’t go haywire on DevOps, believe that adopting some DevOps principles is important to ensuring that projects get done on time and on budget but, more importantly, that they meet user needs.
I’m working on a significant data integration project right now for a client, so this stuff is on my mind.
Think about your company. What really makes it run? More than likely, some combination of enterprise applications and data, right? As a server or virtualization administrator, it’s your job to make sure that the systems on which these business critical systems operate is running in tip top shape.
But, should you stop there? Many server administrators do draw the line between the system and the data as the end of their responsibility. But, maybe it's time that you started to jump in and learn some basics about the database function, if you haven’t already. If you’re an administrator that hasn’t gotten into the data guts at all, you’re missing out on a lot of opportunities and databases are just the tip of the iceberg.
PowerShell and tools like PowerCLI (PowerShell-based) are incredibly powerful scripting languages that enable administrators to create custom automation tools that perfectly fit their environment. But, these tools can also dig into databases and pull information or query systems and write information into a database.
To use the database integration features, though, requires knowledge of database tables and structures, including how tables and databases relate to one another. Although I didn’t know it at the time, I was very fortunate in my very first IT job to be deeply exposed to data structures. Regardless of the area of IT I’ve found myself in throughout the years, database structures have been vitally important.
Even if you don’t care about the database-related features that PowerShell can use, start thinking about ways that you can use PowerShell to make your life easier. I’ve used it in the past to automate account provisioning, to automate the collection of Exchange mailbox data, and for much more. Pretty much every Microsoft product today – including Windows Azure and Office 365 – can be fully managed with PowerShell.
Solid state storage/hybrid storage
You might be wondering what solid state storage has to do with data tools. Well, the data has to be stored somewhere, right? In yesteryear, this storage target would be comprised of a series of regular hard drives and, if the performance levels started to fall off, you might simply be called upon to add more spindles to the storage system in order to provide sufficient IOPS for work to get done.
Today, though, there is another weapon in the arsenal: Solid state storage. Whereas hard disks are perfect when it comes to storing mass quantities of data, solid state disks really shine when raw I/O is needed. When it comes to data, raw I/O might come in the need to allow the system to perform complex computational analysis against a large data set. Such operations may require a whole lot of data reads and writes at a pace that would be impossible for traditional hard drives to match.
There is no end to the storage options at your disposal in the market today. It seems as if there are new vendors coming to the space every day. If you don’t feel that all-flash storage is necessary for an analytics project, consider using a hybrid array instead. Hybrid arrays have mostly traditional hard disks, but these are fronted by a series of solid state storage devices that help organizations massively accelerate their storage without having to spend a small fortune on an all-flash storage solution.
Data integration, reporting, and analytics tools
There are myriad ways to integrate data between systems. As a systems administrator, you’re often at the nexus of all of this, and you may not even know it. Think about it: All of the data that is consumed by your organization runs on the very servers that you manage. So, why not start to familiarize yourself with some of the tools that your users use so that you can further improve your value to the organization?
Here are some of the tools you can consider:
- Reporting. If you’re a SQL Server shop, adding reporting is a no-brainer since SQL has shipped with SQL Server Reporting Services (SSRS) for a number of versions now. While it may not be as full-featured as some of the standalone tools that are available out there, SSRS is more than capable of handling pretty complex reporting needs. I’ve used SSRS for a number of engagements and find it to be quite good, particularly given that it’s “free” with SQL Server.
- Extract, Transform, Load (ETL). Personally, I think that ETL is one of the most challenging parts of the data equation. It’s here where organizations work hard to integration their various data systems into something resembling a cohesive whole. Data integration is hard work and requires careful planning, deep data analysis, and understanding of the various business processes that make the organization work the way it does. Again, SQL Server comes with SQL Server Integration Services (SSIS), which is a good entry level option. However, I’ve recently discovered a tool from Astera called Centerprise, which takes ETL capabilities to insane levels. Centerprise takes a fully GUI-based drag and drop approach to what is often an incredibly complex task. As you can see in the figure below, this GUI-based approach makes it very, very easy to see exactly where data is sourced and where it’s going. Centerprise also enables easy to see data transformation opportunities all along the data flow path.
- Analytics. The pinnacle of data work is analytics. ETL tools and processes are necessary to enable real analytics to take place so don’t discount ETL; further, ETL tools can be used to actively improve processes. Analytics is where people can really begin to gain real insight as to what is happening in the environment and begin to make strategic decisions for the company. ETL tools enable operational decisions and improvements whereas Analytics enables the strategic side of the discussion to really take place. Tools in this arena include things like IBM’s SPSS.
Don’t stop expanding your skill set once you’ve mastered the server realm! Data is the lifeblood of the organization and an ever-changing IT landscape is making this realm more accessible than ever. You can even get started for free by downloading an evaluation copy of SQL Server from Microsoft and start learning about these critical functions. Consider also ways that you can kick start your learning journey through coursework from PluralSight.