Open-source software was conceived as a strategy to take on tech giants like Oracle and Microsoft that had monopolized the software market at the time. That being said, today it pretty much sits in the arsenal of those very same tech giants and is a key element of their engagement strategies. In fact, out of the top 10 open-source software projects on GitHub, eight are planted by big tech, while the two fastest-growing projects are by Microsoft and Google.
While some might see a downside to tech giants profiting from open-source software, like the ongoing dispute between AWS and Elastic, for example, in this article we’re going to try and look at the silver lining, which is what big tech is giving back to open-source. To shed some light on how important these contributions can potentially be, some popular projects from the past include Facebook’s React Native, Google’s Android and Kubernetes, and Netflix’s ChaosMonkey and Hystrix.
1. Google Switch Transformer
To explain Google’s latest open-source software contribution, we need to do a quick recap on where we’re at in general with AI technology and NLP, in particular. As we all know, machines have a hard time translating what we say because most of the time, we say one thing and mean something else. To deal with these human nuances, we’ve gone from RNN models that were quite ineffective since they process word-by-word in a long chain that makes no sense to RNN models that use attention techniques and CNN models that can process words in parallel.
As opposed to the traditional attention technique with RNNs, where each word was assigned with a hidden state passed all the way to the decoding stage, transformers build dependencies between input and output. This approach is called self-attention and is revolutionizing NLP since it effectively creates an incredibly accurate backend that can then be used across industries. This means future AI users won’t need to train separate bots for separate use-cases.
Before Google’s latest contribution, OpenAI’s GPT3 was the most powerful language model, featuring 175 billion parameters built on the transformer mechanism. Google’s Switch Transformer model that was released last month features 1.6 trillion parameters. This is 10 times more than GPT3 and opens up a world of possibilities in artificial intelligence and machine learning. Switch Transformer achieves this by using a mixture-of-experts or MoE scheme. Instead of the traditional multiple experts, Google simplifies routing computation by using a single expert to handle a given input.
2. Netflix DGS
The next big-tech open-source software contribution on our list comes from the Netflix stable. It is a Domain Graph Service Framework or DGS Framework used primarily to implement GraphQL in Spring Boot applications. For the uninitiated, Springboot helps users create applications that “just run” with minimum hassle and configuration, while GraphQL is a data query and manipulation language for APIs. As opposed to REST APIs that always retrieve complete data sets, or “over fetch,” GraphQL schema allows for customized requests to retrieve only what’s been requested.
While the DGS framework is primarily written in Kotlin, it’s built on top of graphql-java and is primarily designed for Java users. This project is, in essence, Netflix’s homegrown solution to the problem of a “messy” aggregate API layer that according to Netflix, was getting increasingly difficult to manage. An aggregate API layer increases security and simplifies operations considerably by avoiding the risks that come with exposing hundreds (or thousands) of microservices to developers. As the environment increases in complexity, however, so does the aggregate API layer.
Netflix’s solution here is to use a federated GraphQL platform that not only powers the aggregate API layer but also provides it with tools for tracing, logging, metrics, and a whole lot more. Key features include error handling, integration with Spring Security, integration with GraphQL Federation, an annotation-based Spring Boot programming model, and a test framework that allows users to write query tests as unit tests. Additionally, there’s also a GraphQL client for Java, as well as pluggable instrumentation, a Cradle Code Generation plugin, and the ability to upload files.
3. Microsoft Dapr
Next on our list is tech giant Microsoft’s Dapr that just hit production readiness this month. Dapr stands for Distributed Application Runtime and aims to make it easier for developers to build event-driven applications that run anywhere, including edge and serverless environments. While allowing users to create microservices in any language, with any framework, and on any platform, Dapr also makes it possible for people with little experience with cloud-native architecture to carry out complex tasks like codifying best practices, for example.
These capabilities stem from the fact that Dapr is made up of API abstractions that provide a number of common capabilities like state management, secrets management, and service invocation that can be implemented across platforms, public clouds, and environments. This is because each capability is built into a building block that uses a number of plugins to remain autonomous or decoupled from any underlying technology. The use of such building blocks also allows for smooth evolution of code without having to rewrite to accommodate for changes.
Dapr uses a sidecar method, similar to what we see in service mesh architecture, to expose its APIs, making it extremely convenient for developers to call on a function since a call to the Dapr sidecar is all that’s needed. While Dapr v1.0 is production-ready and focuses on using Kubernetes to run applications, future updates are expected to provide more diversified environment options. Currently, Dapr integrates with all major public cloud providers, including Azure, AWS, Alibaba, and Google Cloud.
4. Fair Change and Take Two
The next two open-source software projects on our list are a collaboration between tech giants IBM and the Linux Foundation and are unique in their quest to promote racial justice and equality. The Linux Foundation will host seven such projects in total, which are part of a drive called Call for Code for Racial Justice. The three main focus areas for this drive are the police and judicial reform and accountability, diverse representation, and policy and legislation reform.
The first project is called Fair Change and is quite an interesting project because what we see here are DevOps-like strategies being used effectively in public interest. Fair Change is a platform that, much like DevOps platforms that allow different teams to work together transparently, allows different parties involved in racially charged incidents to catalog, record, and access case information. This includes any recorded evidence, hence ensuring any legal proceedings are conducted in a transparent, fair, and unbiased manner.
The second project is called Take Two, which looks to mitigate bias in digital content by using language recommendations whenever overt or subtle bias is encountered in headlines, news articles, web pages, blogs, and even within code. Currently under trial by IBM for the IBM Developer website content and has already been within existing IBM Developer Tools. Take Two is built on Python, Fast API, and Docker, and the API can be run with either a CouchB or an IBM Cloudant database.
Battle-tested production readiness at scale
While there may be a number of downsides to big-tech getting involved with open-source software on the scale that we see right now, the upsides are equally, if not more, valuable. When you get an open-source tool from the likes of Netflix or Google, especially something “homegrown” like Kubernetes, what you’re actually getting is an encapsulated experience that has been earned the hard way. This means assurance that these tools have been “battle-tested” in production on a scale that would otherwise be quite difficult to replicate in a testing environment.
Featured image: Shutterstock