Bootstrapping Microservices At WorkMarket

Drew Csillag
WorkMarket Engineering
5 min readDec 19, 2016

--

When I was first hired at WorkMarket, the plan was to kick-start a microservice architecture to migrate away from the existing monolith over time. The thing is, when you start the microservices journey, there’s a lot to consider. If you want to avoid a place where chaos reigns, there’s a whole constellation of things you want consider. Some amount of chaos is to be expected when moving to microservices, but it’s better when you choose your level of chaos, than have it happen to you by default.

Some things you want to make sure to consider: implementation languages, frameworks, logs, metrics, service discovery, configuration, build, test, request tracing, deployment, alerting/monitoring, what to do with external facing API gateways, the overall architecture, and what you want the microservice contract to be (some of which is dictated by some of the earlier items in this list). Maybe you decide on them, maybe you don’t, but it’s better if these aren’t decided by happenstance. There’s also what you plan to do short term versus long term, as well as those things you’re willing to take a stab at knowing you may well be choosing wrong (which you can then decide to hedge). While in theory, you can make the perfect microservice architecture and system given enough time, ultimately you have to deliver value, and the earlier you can deliver value, the happier everyone will be and the sooner you can validate your assumptions and adjust as necessary.

The max-chaos approach is to just start writing services, have no contract, and everyone does what is right in their own eyes; you hope for the best. If this goes on long enough, it doesn’t scale. As you decide things like api conventions, configuration standards, etc., code to implement these policies has to be written and maintained for each language you support; expertise in each of your backing stores needs to be kept up to date; ops has to deal with the zoo of choices that were made and deal with the various ways these are deployed, managed and monitored. These costs only grow as you write more microservices and continue to nail down other parts of your microservice ecosystem after the fact. The end result is: you have to start walking things back and start limiting certain choices (e.g. language, storage systems). This does not come without strife when developers’ favorite tools are deprecated as “cannot be used for new microservices.”

If instead, you limit choices from the get go, you can focus on getting on getting these integration and policy pieces in place. Or if you don’t get them in place, you can at least create the places where they’ll eventually plug in. This allows the most of the work in getting the services up to be focusing on their service, and have a smaller group of engineers working on the common pieces that will plug into the code already written.

At Work Market, we chose the lower end of the chaos scale, and it’s worked out really well so far, and I’m very pleased how much we got right with about a year and about 30 microservices under our belt, many of them critical to the business. For most of those things I listed that you have to decide on (or not), we chose to decide to decide on most of them and keep the choices fairly narrow, with the idea we can always widen them if necessary. We hedged on a couple choices, and we’ve tweaked a few others as we went along.

We hedged on deployment. We started with a puppet based deployment, but we have plans for a containerized one not too far down the road. We also hedged on metrics. We chose Graphite with Grafana as the front end, however, our microservices are metric system agnostic. This was because we weren’t sure we if we’d want to switch to something like Prometheus or InfluxDB in the future. Microservices simply publish their metrics to a Kafka topic; what happens from there is irrelevant to them. We also hedged on logs. Like metrics, we just publish them to a Kafka topic. We initially started with a Kafka consumer that just dumped the logs to files, one per service, so you could get a merged view the multiple instances of a microservice. Later, we got them into a better tool, but we may change our mind again.

Contrary to where we hedged, we doubled down and won big with Kafka. For instance, all microservice metrics (about 3.7B/day) and logs flow through Kafka, precisely because we didn’t know what we would eventually use for analysis, alerting, etc. So rather than tying the microservices to a particular log or metric system, we just send them to Kafka, and then wrote consumers that would do whatever we needed done to get them into the appropriate store. This turned out to be great in a number of ways, especially in the early days, as we could fire up a console consumer to sniff what’s going on when our metric consumer or log consumers were acting weird, as well as being able to decouple the development of the client and the deployment of the metric system, for example — I’ll have to look, but I honestly don’t recall which we got working first, the metric reporter, or Kafka consumer that shoved them into Graphite. We now use Kafka for a bunch of other things too like reporting differences in succession as mentioned in a previous blog post, or as the transport for the feeds for our search indexers, and soon to feed our data warehouse. It also had the nice side effect that because every popular language has libraries for Kafka, our choice of log and metrics systems would not be constrained by available client libraries for that system.

A fun anecdote: Kafka is built like a tank. In one particular instance, there was a broker whose AWS instance refused to boot. So we just killed it, and spun up a new one in its place. We were waiting for stuff to happen, and it didn’t seem fast enough for us. We manually moved some topics around, and at one point it was as if Kafka was saying to us, “Stand back, I’ve got this.” Twenty minutes later, it was fully re-replicated. It’s reputation for awesomeness is well-earned.

But as far as other things went, we went through a series of designs and evaluations to figure out what we wanted to choose. Over the next few weeks, we plan to go through the road to our first microservices (and first actual value delivered!), the architecture, initial integration, what decisions we made and why (we actually wrote a bunch of them down while we were making them), and lessons learned.

--

--