Lee Calcote at Tetrate's Service Mesh Day 2019: Service Meshes, but at What Cost?
Service mesh: Where do I get started? And what’s the overhead?
Speaking at service mesh workshops over the past year, these are the two questions that Lee Calcote, senior director of technology strategy at SolarWinds, heard over and over again.
How much latency are we talking about? What’s the CPU burn based on all this value I’m getting for free, with (almost) no code change?
We should help people answer these questions, said Calcote, speaking at Tetrate’s Service Mesh Day 2019 conference in San Francisco. People need a playground to go and easily spin up a mesh and gain familiarity with its functionality– so Calcote built Meshery, a developing open source multi-service mesh performance benchmark and playground. He demoed the tool to show how it would empower the “Joe Schmo” operator or engineer to point to a sample app and persist the results to check performance over time.
It’s especially challenging to answer the question about overhead because the answer is that it depends. There are so many variables and container orchestrators use scales that don’t easily allow for apples-to-apples comparison, said Calcote. With Meshery, you can check on performance statistics and other metrics about your environment, like latency and throughput, and the resources that are being consumed in load tests.
“This hopefully helps people self answer some of those questions about performance and what the cost is… [using] their own app, on or off the mesh,” said Calcote.
It might eventually yield something like a service mesh benchmark spec, or show that some meshes perform better than others depending on the environment and the workloads. But overall, Calcote hopes it will bolster confidence and understanding about service mesh in general and the price organizations will pay for its convenience.
Tetrate, the company that builds service mesh solutions for enterprise networking and observability, looks forward to the emergence of benchmarks and supports the idea that performance is an important characteristic of mesh that needs to be measurable.
Follow #tetrateio for more on the bleeding edge of service mesh.
You know, um, there’s a couple of takeaways for me today. One of those was from Larry Peterson earlier. I hadn’t seen Larry speak before, but you know, I’ve had my noodle tickled at least once today, which is nice. Larry had talked about, um, what the disruptors dilemma and if I can butcher it, it was something around,on one hand you can take your do disaggregation as a path to innovation. On the other hand, you can do what– integration, to facilitate adoption. And I think I’m butchering it, but, it’s the adoption bit that I wanted to focus on right now and talk to you guys a little bit about. This last year I gave a few different service mesh workshops. So I ended up speaking to a lot of people who are currently adopting or considering adopting.
And there were lots of questions asked, but two that we’re consistently asked and maybe have bugged me and some other members in the community enough to go do something about it.
One of those was: Well you sort of started off with, wow, this is fantastic. I see the value of the service mesh or these are awesome. Right? Cause you know, what else would they be saying when I was done with the workshop? Right. Um, and then, hey, uh, and then they said, you know, hey, where do I get started? Which one? What do you recommend? There’s a couple out there. You don’t know. You know, there’s various ways to deploy them. What do you think?
Okay, I’ll leave that cliffhanger there. I won’t, uh, answer that.
The other one though was also asked every time and you can kind of see the gears turning in the engineer’s mind and it was something like, wait a second, that’s not for free. There’s… what’s the catch? How much latency are we talking about here? What’s the CPU burn based on all this value I’m getting for free with no code change, which is always, there’s that little asterix by that no code change thing. Um, you know, mostly true. And so, you know, I think that, uh, you know, we, we should collectively help, you know, help people answer some of these questions. I think that people need a playground to go and um, easily spin up a service mesh, you know, which, whichever one of those that it might be. Play with it get, gain some familiarity with its functionality, have a sandbox.
Um, I think that also, it’s incumbent upon us to provide some answer that other question about overhead and what that looks like. And it’s not really a fun question to answer because if anyone does performance engineering in here, there’s just a lot of variables. It’s always, well it depends and it’s a, anyway, I didn’t realize the, what we were quite stepping into when, um, we set out to build this tool here. I want to introduce that tool to you today. Meshery. So multi mesh, performance benchmark and playground, an open source tool. Um, that there’s a small collection of, of folks who’ve spent some of their extra cycles to come in and come and do what’s hopefully a public service. Like all open source is, right? [LAUGHTER] It’s just all right, well getting like, I’m not sure if there’s like sarcasm floating around in here, what the laughs are about.
But speaking of, I think Chris had talked about what container wars. I have to say I was much more focused and centered on the container orchestrator wars. That was another topic that I went around talking a lot about. I did a fair bit of analysis on really comparing and contrasting kind of the four most prominent at the time, container orchestrators. Mesos Marathon, Kubernetes, Swarm, Nomad and felt like I was doing a very structured kind of, um, fair comparison. One of those categories that we compared on was, this was scale. It was like high availability, scale and the way that the container orchestrators kind of measured themselves on their, their scale was around the scheduler and it was something around, how many containers, how many nodes, uh, how quickly, how quickly can you schedule those and get them up. Each of the container orchestrators, usually each of those projects, you know, had some, some of them had pretty, or visualizations for this than others, but all of them had some pretty impressive stats that they threw up.
Which was fantastic. Although for my part it was like almost done in a vacuum without really a frame of reference. Um, and certainly sort of apples to oranges across them, not really an apples to apples comparison. Um, so those are part of the things that we’re hoping to address within, um, a tool like this to the extent that you can get to an apples to apples comparison. I don’t, I don’t know that that’s, um, it’s a hard, hard thing to achieve, but hopefully, um, this is about a two month old, uh, project. So, uh, so no doubt there are some bugs in there. Has anyone have we done a live demo yet today? Oh, okay. I, Oh shit. All right, there goes my, okay, well thanks for that. Uh, I’m too late. We’re going to, I’m going to see if I can demo this, but it’s just that simple tool.
You can deploy it locally, you can deploy it in your cluster, you can deploy it on the mesh if you want to. Although to the extent that you’re trying to do some performance analysis, probably don’t deploy it on your mesh, but rather point it at either your sample app or, or your application, it’ll generate some load. Um, receive that back and present it to you. Hopefully this is a tool that empowers you, the Joe Schmo operator or the Joe Schmo engineer who’s kind of going through this question about adoption. And even after they have it’ll persist the results, you can come back release after release and kind of check on whether or not things are getting improving or getting worse. And for those that looked at the Istio one one release, things improved in and around performance. So that working group is doing a great job.
Uh, let me see if I can show you what this looks like. And it looks like I’m going to drive from up here. So, so the tool itself, um, is a couple of containers as you’ve seen it set up to be, um, multi orchestrator. So it’s maybe not well shown here, but, uh, you’ll deploy Meshery the tool itself. It’ll connect to hopefully any number of adapters. We’ll talk about the state of some of these adapters. I’m going to show you the Istio adapter right now, but just a little adapter to communicate with the service mesh, um, that go over, uh, spin it up. Uh, we’re just using docker compose here. We’ve got, um, you know, the, the, the two Mashery and then an Istio adapter. Um, if we go over it. So this is running on my local machine right now. I’m, I’m VPNed back into, uh, back into the office and I’m going to point add a Kubernetes, uh, small Kubernetes, I think small Kubernetes cluster running the Istio kind of canonical sample app, Bookinfo. Um, but first we’ll bring up Meshery itself.
So again, just running here on local host, we ended up, um, signing in. And configuring the tool quickly. So to configure, we’ll go over and choose our cube config in this case and then we’ll point, uh, the tool Meshery to the adapter that’s in the environment. So this is the Meshery Istio adapter, just again running locally on, on this laptop. Uh, and we can go over and, um, it’ll also to the extent that since it wants to show you back information about your environment and performance statistics, you can connect it to a Grafana. If you’ve deployed Grafana as part of your Istio deployment, you can go ahead and, and, and we are, this one is, uh, running Grafana, so it connects to Grafana. You can go ahead and maybe pull in some of, if you’ve invested into some dashboards there, pull those in.
Um, maybe manipulate which specific metrics you’re wanting to display here, but it’d be nice to have those metrics about your environment and your nodes and the resources that are being consumed there. Um, as you go to generate your load tests. So let’s go over and, um, uh, uh, briefly, yeah, just stop within the playground, which really is an under formed playground. But I wanted to highlight it just because we’ve had a contribution here around Istio-vet. So the Aspen Mesh folks have been kind enough to let us incorporate Istio-vet as a library to help, uh, confirm whether or not your, uh, your config is, uh, that you’ve got good config in your environment. And so we went ahead and ran is Istio-vet, came back with a number of notifications saying it looks like your, each of these vetters, each has run well, this one hasn’t. And it’ll tell you again, just kind of let you play it, play around with the mesh and see what’s there.
But to the extent that you want to do a performance test in this case let’s point to the Bookinfo app, your product page. And, um, on the inside here we’re leveraging fortio as a load generator within the tool. And so those that are familiar with the Istio project might be familiar with that particular tool. Um, you can come in and, and, uh, you know, configure, configure a test, kind of run your tests. In this case, what we’re doing, uh, you know, essentially running in one, one request a second, provide a minute here. As that runs, we can entertain ourselves with our Grafana dashboards that we can see here. We should probably see a pickup on some of the overhead that we might expect around. And we are. Some overhead we might expect around the telemetry that Mixer is having to deal with as that load is as generated on the sample app.
Um, and then if I tell a joke for 30 more seconds, all right. Hey, that was good. I didn’t even have to tell a joke. That’s great. Yeah, but our hope is that while you right now this isn’t the most complex tool and there’s many nerd knobs to tweak and configure about how you want to run a performance test. Uh, but that this hopefully helps people self answer some of those questions around performance and what the, what the cost is. Either having them do it on their own app, against their own app, on or off the mesh. Um, and so in this case, we just, we were displaying the results back. We can see, um, where the, the median, the mean and kind of the average response time, uh, comes back. So, so this is measuring latency and it’s measuring throughput, mmm. Which is kinda nice. And so to the extent that you do that over time, this will just persist your, your results. Come in and look at them or you can come in and maybe compare them. As so. Or, if you’ve got. Boy, it’s fun, fun. Giving a demo. Or if you’ve got a few of them.
That’s not, that’s not very pretty. But if you’ve got a few of them maybe, you know, compare those results and kind of compare them over time. Someone needs to do something about this label, but, but, but more or less, we’re just, we’re leveraging some open source software that’s out there to help, um, help empower people. And so, um, with that, I’m not here today to say, Hey, we run this against all of the other, uh, meshes and of here’s how they compare that to the goal. Um, this particular tool is on, uh, on the schedule for these next three upcoming conferences. So stay tuned. There will be some results. My hope is that generally all of the results are pretty good. We might see that some meshes, uh, perform better than others in certain environments or under a certain against certain types of workloads or uh, that, but hopefully this bolsters and in you know, inspires confidence in people to go ahead and, and understand what’s happening in their mesh, understand the, the price that they’re paying for that convenience, something to come out of this is then potentially something of a service mesh, a benchmark spec.
So to the extent that, you know, you’re gathering up the result that you, that we were just seeing visualized that certainly you can store that but, but you need to know, and this was back to sort of the, it depends about performance engineering. So you store the result, but you also need to store, wait, what was the environment that we were using? How many nodes did we have? How big were those nodes? What was the app we were using? What version of my app was it hitting? What version of that? What service mesh, what version of that mesh? How is it configured? There’s lots of, Istio is very powerful, lots of things to configure and tweak. And so, um, part of that, we’re hoping to collaborate with the, um, uh, the community here, each of the project and the prominent projects. Um, the ones that I just had listed here all except for the one in italics, um, have committed to contributing and are, um, and so no pressure, Tony, Nick, you’re out there. No pressure.
But I do want to thank on these contributors that as a matter of fact, Vinu who was just up here, uh, has contributed some. So want to embarrass him if he’s. Very good. All right, he’s embarrassed. Good. Um, lastly, I’m going to embarrass another, a fellow ginger over here. I don’t know if you know this guy, but this guy is halfway through writing a, you know, one of probably what are only two potential Istio books out there. And so if you guys are looking for a source of authority, look for red beard. Yeah, generally. All right. One of us is more distinguished than the other. Just to. But um, but go ahead and subscribe there and you’ll, you’ll be, sure to be notified. So with that, uh, I want to invite Zach back up. Yeah. Anyway.
All righty. Uh, yeah, we’re good. Oh, there we go. Hey. Okay, now we’ve made it. Thank you. Thank you, Lee. Uh, so I want to thank all of you all for coming. Uh, so first off, just, uh, how was it? Good. Bad, good. Yeah. All right. Awesome. So I do, I want to thank you, all of our sponsors that actually made this possible, right? Uh, GCP, Juniper, Capitol One, Pivotal, AWS, sorry, I have to sneak a glance at my, at my signs as I go down the list, the CNCF, the ONF, the OpenStack foundation as well. Thank you all so much for making all this possible. Uh, it really is. It’s really cool.
Uh, it’s pretty special for me to see, uh, you know, being one of the early enginee working on service mesh. It’s pretty cool for you to see something like this. It really is very special. I also want to take some time out. Uh, Shriram, Chris, Twinee, come up here. Come on, come, come. Uh, these are our conference organizers come no, on the stage. Let’s go. I’m gonna, I’m gonna one up Lee on the embarrassment front, right? We’re gonna uh, but these are the people that have done so, so, so much work, uh, to make this whole thing happen. Uh, and, and we need to thank them for, for the awesome conference that they put on. Really. Thank you. It’s been awesome. Thank you. Thank you. Y’all have done such a good job.