Case Study: Tetrate Resolves the Complexities of NAV’s Transition to Istio

About Norwegian Labour and Welfare Administration (NAV)

The Norwegian Labour and Welfare Administration (NAV) is the government directorate responsible for implementing Norway’s welfare model, with services ranging from child benefits to pensions. NAV is responsible for distributing a third of Norway’s national budget. In five years, it’s estimated that 95% of the Norwegian population will be covered by one of NAV’s services.

 

Figure 1. NAV’s Kubernetes platform NAIS is open-sourced and available on github (github.com/nais)

The Challenge

Currently, NAV is in the midst of migrating workloads from their legacy systems to containers and the public cloud. On top of its legacy on-premise data centres, NAV’s application development team built NAIS, a Kubernetes-based platform that optimizes for development in speed and continuous delivery. Due to the critical nature of these systems, NAV selected Istio’s service mesh on the basis of security and reliability.

NAV’s selection of Istio service mesh was made based on the following requirements:

  • Protecting sensitive personal data: NAV stores sensitive personal data on a vast majority of the Norwegian population. This means it is vital for NAV to ensure secure communication between workloads in order to facilitate its migration from on-premise to the public cloud.
  • Enhanced observability into clusters: Istio’s enhanced observability functions would allow NAV’s application development team to maintain and debug applications in an efficient manner, with richer insights, across a multi-cloud environment.
  • Scale & flexibility: Istio service mesh enables each application to be secure in and of itself, which would allow NAV to safely modify or develop applications in response to new laws and regulations – without compromising runtime performance.

Figure 2. NAV’s transition from securing on-premise clusters to microservices and public cloud

NAV’s journey with Istio started in January 2018 with Istio’s 0.5 release. To test Istio’s telemetry capabilities, the NAV team installed Istio 0.5 onto an on-premise cluster. During this test, NAV experienced several challenges around networking APIs, HTTPS handling, as well as modeling service interactions outside the Istio service mesh. After expending a lot of time and resources, it was determined that Istio 0.5 was not production ready. The deployment was shelved as the team awaited a more stable Istio release.

NAV + Tetrate: A Working Solution  

By summer 2018, after consulting with the Tetrate team that had been involved in developing central components for Istio, NAV decided to make another attempt at Istio deployment – this time with Istio 0.8. Tetrate assured NAV that Istio 0.8 had resolved most of the earlier issues it had faced with Istio 0.5. NAV installed Istio with mutual TLS on the Google Cloud platform, determined that Istio 0.8 was stable, and deployed it across all applications on the NAIS platform.

But Tetrate went further to resolve arising complications.

After the Istio mesh implementation, another problem arose. Typically, on NAV’s NAIS platform, a kubelet first probes the applications to see if they are active and ready. However, once NAV deployed Istio with mutual TLS, the kubelet no longer had the required certificate to communicate with the applications. As Figure 3 illustrates, the kubelet came to exist outside of Istio’s service mesh.

Figure 3. The kubelet on NAV’s NAIS platform was effectively “shut out” of the Istio service mesh.

To solve this issue, NAV and Tetrate considered two possible solutions:

  • Solution A: Bundle a client URL (cURL) within an application container image, then use exec. The drawback was the need to embed a cURL inside every application image.
  • Solution B: Split the application into a business layer and platform layer and expose these layers on two different ports (eg. :8080 and :9001). NAV could tell Istio not to enforce mutual TLS on the platform layer (i.e. port :9001) by injecting an extra Envoy container (dubbed a ‘naiscar’) that’s dynamically configured to allow traffic only to that specific endpoint. As Figure 4 illustrates, this was the solution that NAV ultimately adopted.

Figure 4. The first Envoy proxy (‘istio-proxy’) is exposed to the business layer (port :8080) with mutual TLS configured, whereas a second Envoy proxy (‘naiscar’) allows for communication with the platform layer (port :9001).  

Looking forward

To date, NAV has adopted Istio incrementally and is now successfully running Istio in production. Despite finding a workaround solution to the problem described earlier, NAV is continuing to partner with Tetrate to find ways to eliminate the overhead of an extra Envoy proxy.

Looking forward, NAV plans to use Istio’s networking and access control capabilities to facilitate a uniformed move of its applications from on-premise to multiple public cloud providers like Google Cloud and Microsoft Azure.

It takes a lot of time and resources to learn the intricacies of Istio,  its drawbacks, bugs, and ensuring the correct use case for what makes sense in a customer’s environment. Tetrate helps companies achieve the benefits of service mesh without having to wrestle with its complexities.

Back to Blog