kafka – Corps of Engineers

class Actor(object): def topics(self): """ Return a list of the topic(s) this actor cares about. """ raise NotImplemented def interval(self): """ Return the batching interval for this actor. This is the maximum interval. If another actor on the same stage has a shorter interval, then the batching interval will match that interval. """ return 30 def process(self, topic, messages): """ Called periodically for this actor to process messages that have been received since the last batching interval. If messages for multiple different topics have been received, then this method will be called once for each different topic. The messages will be passed as an array of tuples (offset, message). """ raise NotImplemented @property def log(self): return getLogger(self.__module__)

Software engineering is hard. Even a small software project involves making countless trade-offs between the ideal solution and “good enough” code. Software engineering at a startup is even harder because the requirements are often vague and in constant flux, and economic realities force us to release less-than-perfect code all the time.

Over time these decisions pile up and the technical debt becomes overwhelming. Axial has hit this point a few times. Until about two years ago, Axial’s platform was a single monolithic Django application that was becoming increasingly bloated, slow and unmaintainable. At that time, the decision was made to “go SOA” and we started to decompose this Django application into smaller services mostly based on Flask, and, more recently, Pyramid.

Some of the services that we’ve broken out since then are small and focused and make logical sense as independent services, but others turned out to be awkward and inefficient and resulted in brittle and tightly-coupled code. Further, it has become clear that our current architecture does not align well with the features and functionality in our roadmap, such as real-time updates to our members. We realized that we needed a new architecture. Thus was born Reaxial.

The design keywords for Reaxial are reactive, modular and scalable. Reactive means that the system responds immediately to new information, all the way from the backend to the frontend. Modular means that parts of the system are decoupled, making it easier and faster to develop and test (and discard) new features without disturbing existing code. Scalable means that as we add members, we can simply add hardware to accommodate the additional load without slowing down the site.

To achieve these goals, we knew that we needed to use some sort of messaging middleware. After researching the various options out there, including commercial solutions like RTI Connext and WebSphere, and open-source packages like RabbitMQ, nanomsg and NATS, we settled on Apache Kafka. Kafka was developed by LinkedIn and offers a very attractive combination of high throughput, low latency and guaranteed delivery. LinkedIn has over 300 million users, which is an order of magnitude more than we ever expect to have to support, so we are confident that Kafka will scale well as we grow. Further, because Kafka retains messages for a long time (days or weeks), it is possible to replay messages if necessary, improving testability and modularity. With Kafka as the underlying message bus, the rest of the architecture took shape:

Probably the most important new service is the entity service. The entity service handles CRUD for all top-level entities, including classes like Company, Member and Contact, among many others. Whenever an entity is created or updated, a copy of it is published on the message bus, where it can be consumed in real-time by other services. Simple CRUD does not handle the case where multiple entities need to be created or updated in a transaction, so to handle that the entity service also offers a special API call, create_entity_graph, that can create and update a set of related entities atomically. In the Reaxial architecture, most features will be implemented as a service that subscribes to one or more entity classes, and then reacts to changes as they occur by either make further updates to those entities or by creating or updating some other entity.

Recall that our design goals were to enable real-time updates all the way to the member. To accomplish this, we created a subscription service that uses SockJS to support a persistent bidirectional socket connection to the browser. This service, written in NodeJS, subscribes to changes in all the entity classes and allows the browser to subscribe to whatever specific entities on which it wants updates, and for which the user session is permissioned to see, of course.

We have deployed these components to production and are just starting to use them for some new features. As we gain confidence in this new infrastructure, we will slowly transition our existing code base to the Reaxial model. We decided to deploy Reaxial gradually so that we can start to reap the benefits of it right away, and to give us an opportunity to detect and resolve any deployment risks before we are fully dependent on the new architecture. We have a lot of work ahead of us but we are all quite excited about our modular, scalable and reactive future.

Corps of Engineers

Axial // Factum non Verbum

Tag Archives: kafka

Reaxial Update – On Stages And Actors

Reaxial – A reactive architecture for Axial

Share this:

Share this: