Why the service mesh is an improvement over earlier network topologies

Comparing a service mesh with more traditional network architecture requires some brain-strain to wrap your head around a few concepts first. The most important concepts about scaling an application are all about “when” and “why” the decision is made to scale. While the “when” needs to be immediate and gives you speed, the “why” can be based on several possibilities and how well you choose gives you reliability. A combination of both, however, gives the levels of service that people expect from an application today. If you take a popular app like Uber as an example and assume scaling up has to happen during peak traffic hours, the traditional method would be to set a specific external script to scale up fixed resource pools during such “predicted” peak hours. The reason we use the word predicted here is that we’re basically guessing based on a daily average and no real-time information can influence this “closed” system.

Reactive in real-time

This closed system is called a push system where configurational changes need to be pushed to components through APIs and protocols. To have these changes based on real-time events and be reactive rather than predictive, however, we need our APIs to “pull” the necessary information. The main reason for this is because the traditional model crumbles under the pressure of dealing with a never-ending list of affected external factors, especially since each one has to be dealt with by the controller “personally.” In stark contrast, a service mesh spreads out this operational burden and is more about declarative rather than imperative configurations. This means instead of wasting time pushing questions to components to check if changes need to be made, the controller is now directly pulling the right answers from them.

Reacting isn’t just about pulling the necessary information on time, however. It’s also about acting on it quickly. What a service mesh does is it first pulls information about what the end-state of the component should be and then has the components achieve this themselves. This means unlike the traditional step-by-step information on “how” to get somewhere, we are now only interested in “what” the final destination needs to look like. Shifting focus like this is key to being able to react in a fashion that’s considered competitive in today’s world, that fashion being automated. In addition to bypassing a major resource strain by asking for a declarative end-state, as opposed to imperative configurational changes, service mesh technology also automates those changes. While the difference between the two may be subtle, the basic idea is that we’re relieving the burden on the controller by making sure it’s free from the hassles associated with implementation.

Unhinged from reality

As the number of services connected to your application continues to grow, handling service to service communication gets even more challenging. Service mesh technology helps by abstracting away the complexity of dealing with these communications. It does this by putting them in a service proxy that does the heavy lifting for you. Unlike traditional network topologies that are connected to very real and very “fixed” resource pools, service mesh technology does not feature fixed configurations or any kind of predefined routes or directional regulations. In fact, a service mesh completely decouples your application from the underlying network infrastructure through a mesh of Layer7 proxies. Typically, these proxies are “injected” into each service deployment as a sidecar so that when calls are made directly to other services over the network, they’re routed to the sidecar proxies instead. These proxies then manage requests on behalf of their respective services.

It is this ability to be agnostic to almost everything, that gives service mesh technology the required level of agility to deal with containers and microservices architecture. In addition to just managing and keeping track of inter-service communications, these proxies provide service discovery, load balancing, authentication, security policies, monitoring and more. Another important aspect of service mesh technology is the way it separates functionality into two planes, the data plane and the control plane. While the data plane includes the code that delivers all the features like service discovery, monitoring, security and so on, the control plane is where users gather information, specify policies and make configurational changes to the data plane. This unassuming little feature is key to enabling Dev and Ops teams to work side by side and also key to centralizing the control panel so that a single interface can be used to manage services from any application across the network.

Unobstructed observability

The third and most important improvement over earlier network topology is in terms of visibility. In fact, from a pure management and administrative point of view, there’s probably nothing more important than being able to actually see what’s going on under the hood, especially with containers and microservices. Debugging without such visibility is pretty much a nightmare considering the distributed nature of components. This is why engineers responsible for such tasks go to great lengths to retain any kind of “evidence” that can help them trace requests across remote services. Such evidence is usually in the form of logs or metrics like destination, source, protocol, URL, latency, status, duration, codes, etc. These are collectively used to gain insight into service-to-service communications and the term often associated with such distributed debugging is “observability.” Better visibility into your traffic also increases reliability by helping you spot potential issues before they become major problems.

Visibility often comes at a price, and most often that price is performance. That isn’t the case with service mesh technology, however, since the sidecar proxies are built to be light and fast. This is why as opposed to traditional networking, they provide developers with the required levels of visibility without them having to sacrifice any performance. Service mesh technology also offers visibility into the runtime, as well as control over it. It also just so happens to be perfectly positioned to help monitor and regulate all incoming and outgoing traffic. Istio, a popular service mesh tool, features custom dashboards that provide visibility into not just the performance of all your services but also provides additional insight into the different ways they affect your other processes as well. These additional calculations and conclusions that service mesh tools provide you with are important to gain an understanding of how little changes in performance can have big impacts on users.

Microservices ‘native’ networking

Modern applications need to be “on-the-ball” all the time if they are to be competitive, and this means being reactive in real-time, agnostic of infrastructure and implementations, and crystal clear. While earlier network topologies work great for monoliths and even API-driven web applications, they just can’t keep up with the unpredictability and sheer numbers that microservices throw at it. Service mesh technology, on the other hand, comes with the unique distinction of being built to deal with service-to-service communication in microservices architecture in the first place. That being said, many would argue that Kubernetes will eventually take care of this for you and while that is true to an extent, at the moment Kubernetes only offers a very basic service mesh and you need Istio, Envoy, Conduit, or Linkerd2 for a feature-rich experience.

Featured image: Pexels