Data warehousing is a longstanding IT practice of managing all the data available and generated by an organization’s applications. Its goal is to make this data readily accessible and usable to drive business decisions. Recently, data warehouse startup Snowflake announced a huge $263M funding from big names like Sequoia Ventures that enabled it to join the rank of elite startups in the unicorn club. Let’s look at the various factors that influenced this big funding round, and what is the opportunity ahead of this promising startup.
Warehousing — An enterprise challenge
Any organization that comes of age has many terabytes of data about its customers, products, applications, external datasets, and internal data on employees and finances. Data has been the single biggest focus of these enterprises as it has the potential to make an organization hugely successful if it handles all this data well. At the same time, mishandling this data can lead to lost opportunity, or worse, trouble for the business.
While even startups have large quantities of data, it’s typically the SMBs and enterprises that have been in operation for a while and are operating at a larger scale that feel the need to optimize their data storage and analysis. This is especially true in the case of decade-old enterprises that have data going back to their company’s origins that is still valuable to them. This data has been managed in the same way for decades, and any change to the system can disrupt business. It needs to be handled with care. This is why enterprises have a lot hanging on how they approach the data warehouse basics. While data warehousing is a challenge for most organizations, it is primarily the large enterprises that have more hanging in the balance, and more to lose.
Data warehousing the wrong way
Data warehousing is a complex undertaking with many aspects to consider like storage, compute resources, memory capacity, user interface, query language, data formats, and more. You need to build and optimize every component of the system for it to be successful. Traditionally, organizations built this entire setup in-house on hardware infrastructure. This means they need to buy hardware servers, disks, and over-provision resources in case a spike occurs. They need to train internal teams to update and maintain the hardware. There’s a separate team to tie together many different open source and proprietary software that runs on top of the hardware and makes up the service layer. They need to structure and architect their databases for best use, integrate it with their applications, and make sure it’s highly available, performs fast, and can handle any workload thrown at it as the business scales and grows. All this is easier than it may sound. Still, the hardware layer was the biggest restricting factor that eats up all the time of data warehousing, distracting from the real task at hand — data consumption and analysis.
Enter, the cloud
The cloud changes all this. First, it deals with the underlying hardware layer replacing it with cloud-based resources that are orders of magnitude better. Seeing the advantages of this, there are now cloud data warehousing solutions like Snowflake that have risen to meet the demand. There are many advantages of a cloud warehouse like easy management, rapid scale, better resource utilization, simpler integration, and cost-effectiveness.
Rather than take time away from data analysis, cloud data warehousing solutions let organizations focus on what they really want to do with their data and leave the management of infrastructure and underlying layers to the vendor. This is the biggest draw for cloud-based warehousing. Apart from this, organizations don’t have to over-provision resources for storage and compute. Rather than having servers idling for the most part, they can now scale up and down dynamically based on workloads and only pay for the resources they use. This can immediately cut costs. Additionally, the cloud makes it easy to integrate with other enterprise systems as the modern enterprise application suite is made up of cloud solutions like Salesforce, Workday, and ServiceNow. While these cloud SaaS platforms can integrate with on-premises warehouses as well, they work best with a cloud warehouse.
Snowflake’s take on warehousing
Snowflake starts with these foundational advantages of the cloud, and then adds some of its own benefits to come up with a compelling solution. To start with, Snowflake leverages the industry-standard SQL query language. This may seem like a small factor, but there are other solutions like AWS Redshift that use PostgreSQL instead, and this restricts their adoption as SQL is the most widely used language for data warehousing. So, if an enterprise is already steeped into SQL, they need not re-skill their teams to migrate to working in Snowflake. Similarly, Snowflake supports the most popular data formats like JSON, XML, and more. By making these decisions to support what’s widely accepted and standard in the industry, Snowflake appeals to enterprises, which are the main segment in the warehouse market.
The next big advantage Snowflake has is its unique architecture. While most traditional warehouses have a single layer for their storage and compute, Snowflake takes a more nuanced approach by separating the storage of data, the layer where the data is processed, and a third layer where it is consumed. Storage and compute resources are completely different and need to be handled separately. This way you can ensure very cheap storage and more compute per dollar, and not drive up costs by mixing the two essential components of warehousing.
Snowflake provides two different user experiences for interacting with data — a data engineer and a data analyst. The engineer loads the data and works from the application side. They are the admins and owners of the system. Data analysts consume the data and derive business insights from the data after it is loaded in the system by a data engineer. Here again, Snowflake separates the two roles by enabling a data analyst to clone a data warehouse and edit it to any extent without affecting the original data warehouse. Snowflake lets users create any number of virtual warehouses that are based on other warehouses. This frees analysts to derive maximum insight into the data and gives maximum control in the hands of admins who don’t have to worry about underlying schemas getting messed up in the process of routine operations.
Snowflake isn’t alone in this space. In fact, it faces competition from the biggest web companies and other startups. AWS Redshift is one of the more popular cloud data warehouse solutions available today. Google BigQuery is another great option for spiky workloads. Azure SQL Data Warehouse is the option from the Microsoft stable. Alibaba Cloud MaxCompute is a new entrant, and Oracle has its own solution coming soon. Then there are other innovative startups in the space like Panopoly, which also makes data warehousing easy and cloud-based. All this competition proves that data warehousing is big on the agenda of enterprises and cloud vendors alike. This is what has led to the big infusion of $263M in funding for Snowflake. If it is to stay competitive in a space like this, it will need to be better than its competitors on many fronts like pricing, performance, usage, and support.
Can Snowflake become an avalanche?
Data warehousing is rapidly moving to the cloud. Traditional methods fall short of providing the kind of service that today’s rapidly changing businesses need. Snowflake has brought to market an attractive proposition in its cloud data warehousing solution. Will its new $263M funding be enough to propel it to data warehousing stardom? Let’s wait and watch.