Reaxial – A reactive architecture for Axial

Software engineering is hard. Even a small software project involves making countless trade-offs between the ideal solution and “good enough” code. Software engineering at a startup is even harder because the requirements are often vague and in constant flux, and economic realities force us to release less-than-perfect code all the time.

Over time these decisions pile up and the technical debt becomes overwhelming. Axial has hit this point a few times. Until about two years ago, Axial’s platform was a single monolithic Django application that was becoming increasingly bloated, slow and unmaintainable. At that time, the decision was made to “go SOA” and we started to decompose this Django application into smaller services mostly based on Flask, and, more recently, Pyramid.

Some of the services that we’ve broken out since then are small and focused and make logical sense as independent services, but others turned out to be awkward and inefficient and resulted in brittle and tightly-coupled code. Further, it has become clear that our current architecture does not align well with the features and functionality in our roadmap, such as real-time updates to our members. We realized that we needed a new architecture. Thus was born Reaxial.

The design keywords for Reaxial are reactive, modular and scalable. Reactive means that the system responds immediately to new information, all the way from the backend to the frontend. Modular means that parts of the system are decoupled, making it easier and faster to develop and test (and discard) new features without disturbing existing code. Scalable means that as we add members, we can simply add hardware to accommodate the additional load without slowing down the site.

To achieve these goals, we knew that we needed to use some sort of messaging middleware. After researching the various options out there, including commercial solutions like RTI Connext and WebSphere, and open-source packages like RabbitMQ, nanomsg and NATS, we settled on Apache Kafka. Kafka was developed by LinkedIn and offers a very attractive combination of high throughput, low latency and guaranteed delivery. LinkedIn has over 300 million users, which is an order of magnitude more than we ever expect to have to support, so we are confident that Kafka will scale well as we grow. Further, because Kafka retains messages for a long time (days or weeks), it is possible to replay messages if necessary, improving testability and modularity. With Kafka as the underlying message bus, the rest of the architecture took shape:

Reaxial Architecture

Probably the most important new service is the entity service. The entity service handles CRUD for all top-level entities, including classes like Company, Member and Contact, among many others. Whenever an entity is created or updated, a copy of it is published on the message bus, where it can be consumed in real-time by other services. Simple CRUD does not handle the case where multiple entities need to be created or updated in a transaction, so to handle that the entity service also offers a special API call, create_entity_graph, that can create and update a set of related entities atomically. In the Reaxial architecture, most features will be implemented as a service that subscribes to one or more entity classes, and then reacts to changes as they occur by either make further updates to those entities or by creating or updating some other entity.

Recall that our design goals were to enable real-time updates all the way to the member. To accomplish this, we created a subscription service that uses SockJS to support a persistent bidirectional socket connection to the browser. This service, written in NodeJS, subscribes to changes in all the entity classes and allows the browser to subscribe to whatever specific entities on which it wants updates, and for which the user session is permissioned to see, of course.

We have deployed these components to production and are just starting to use them for some new features. As we gain confidence in this new infrastructure, we will slowly transition our existing code base to the Reaxial model. We decided to deploy Reaxial gradually so that we can start to reap the benefits of it right away, and to give us an opportunity to detect and resolve any deployment risks before we are fully dependent on the new architecture. We have a lot of work ahead of us but we are all quite excited about our modular, scalable and reactive future.

Understanding Product Management

When I first started at Axial, I had a hard time understanding what it meant to be a Product Manager. The role is often more on the nebulous side, especially at a fast-growing company. The result is a shifting set of responsibilities as the needs of the company change.

Some days I would focus on collecting and analyzing data between releases; other nights I would find myself frantically triaging an urgent bug with the rest of the engineering team. Still, other days I would interview users, trying to bridge the gap between what they were trying to do and what the product was capable of.

office space reference 

But you know, I’d still get this question every now and then

Between a shifting set of day to day responsibilities, I also read a lot of articles on how to get good at product management. Here’s a brief list of what others thought being a Product Manager meant:

  • Prioritizing
  • Communicating clearly and effectively
  • Thinking strategically
  • Empathizing deeply with customers
  • Having an eye for design
    • Having an attention to detail
  • Being passionate
  • Being a product visionary
    • and then communicating that vision
  • Having deep domain expertise
  • Evangelizing
  • Being technical
    • But not too technical
  • Leading
  • Hustling
  • Having integrity (???)
  • ninja/unicorn/rockstar *

 

ninja unicorn rockstar guru
* I kid, but that list gets pretty ridiculous

 

Having read a lot of articles on Product Management, I highly recommend the following:

 

good gallant bad goofus
Every time I read ‘Good Product Manager/Bad Product Manager’

 

Now that I’ve gone through a couple of product development cycles here at Axial, I think I have a better understanding of Product Management. As I understand it, the role boils down to three parts of a continuous cycle:

  • Shipping
  • Analyzing
  • Iterating

 

Shipping

There’s no getting around this. You can’t really be much of a Product Manager if you don’t have a product. This means moving a vague idea into something in production. This also means doing whatever’s necessary to ship the product out.

If there’s anything standing between your product and production, you need to do whatever is necessary to move it out of the way. No QA support? Time to roll up your sleeves and feature test with your hands. Some older business requirement blocking an engineer? Find the right person to talk to and negotiate.

 

simpsons reference
But there’s usually more than one way to cross a wall

 

Analyzing

Just because you shipped something doesn’t mean your responsibilities are over. Tracking and defining success are just as important as shipping. Not every product is a success, but you need to track how users are (or aren’t!) using your new product.

 

dark knight reference
Ideally you want this amount of logging

Sometimes you’ll inherit a product that’s already been shipped by someone else, but the feature doesn’t contain all the data that you need to really answer the questions you may have. Depending on the usage of the feature, it may be a more efficient use of time to add that additional logging to get better visibility before rolling haphazardly into a grand redesign. 

Based on the data that you have available, you can begin to construct hypotheses about usage patterns: “Our users don’t use X because of Y. Therefore if we change Y, users will start to use X”.

 

Iterating

Assuming that iterating on the product/feature is the right decision, it’s time to start making improvements to your product. That means taking the hypotheses that you formed from your analysis and figuring out how to properly test them.

One way to test a hypothesis is to talk to your users. This can be a more efficient use of engineering time, as long as you ask the right question in the right manner. During interviews with users, remember that what people say is not the same thing as what people actually do.

Another way to test your hypothesis is to plan out tweaks and improvements to your feature. Shipping those improvements of course takes you right back into the virtuous loop of shipping -> analyzing -> iterating.

Communicating

Remember, no Product Manager works in isolation. That means that as you go through the virtuous loop of shipping -> analyzing -> iterating, you need to be a master of communication both within the team you’re working on and within the company. 

A lot of people like to emphasize the importance being able to explain technical details to a non-technical audience. I tend to agree, with the corollary that it’s just as important to be able to effectively represent the needs of the business to a technical audience. 

Of course a framework like this is just an abstract way to think about Product Management. Just as most of your users probably don’t care whether you use the latest technologies in your product, the rest of the organization only cares about what you do as a Product Manager in so far that you can produce results against key metrics.

Hopefully this was helpful to you! Leave questions or comments down below.

Migrating the Axial stack to PostgreSQL

Axial has used MySQL for most of its life. We had a variety of complaints with it, ranging from poor query planning to silently accepting (and discarding) invalid data. Yet the effort involved in migrating to PostgreSQL always seemed too great compared to the benefits. However, we started using Redshift for data warehousing. Redshift uses the Postgres client library, so we had yet another reason to migrate to Postgres, and it finally happened.

Migrating the data

Preparing for the migration took one engineer (me) about three weeks. The first task was to actually move all the data, converting data types when necessary. pgloader was a fantastic tool for this, but there were still some hiccups. During this process, I discovered that despite our foreign key constraints (and yes, we were using InnoDB) there were a handful of rows with foreign keys that pointed to rows which did not exist. We decided that the least bad course of action here would be to create new dummy rows and point these foreign keys to the dummy rows. We also had a few rows with data that was invalid UTF-8 text, even though the character set was UTF-8. For these there was nothing to do except strip out the bytes that weren’t UTF-8.

Once our data was sane again, it was time to migrate it. pgloader knows about mysql’s types, including common hacks (since MySQL has no boolean type, people typically use a tinyint(1) instead). However, some of our boolean columns were wrongly created as wider integers, which pgloader doesn’t automatically convert. Fortunately, pgloader makes it very easy to override its type adaption on a column-by-column level, and if you need conversion rules that it doesn’t have built in, you can write additional rules in Common Lisp. (No beard necessary!)

Here’s our configuration file:

LOAD DATABASE
FROM mysql://user:pass@127.0.0.1/production
INTO postgresql://user:pass@127.0.0.1/primary_db

WITH include no drop, create tables, create indexes, downcase identifiers, foreign keys, reset sequences, batch rows = 1000

SET maintenance_work_mem to '128MB', work_mem to '12MB'

CAST column ams_msg.migrate_message using nullify-column,
column transaction_profile.control using empty-string-to-null,
column event.is_staff to boolean drop typemod keep default keep not null using tinyint-to-boolean,
column event.is_masquerading to boolean drop typemod keep default keep not null using tinyint-to-boolean,
column preferences_userpreferences.email_buyer_received_opportunity_report_daily to boolean drop typemod keep default keep not null using tinyint-to-boolean,
column preferences_userpreferences.show_import_marketing to boolean drop typemod keep default keep not null using tinyint-to-boolean,
column preferences_userpreferences.show_find_and_send_marketing to boolean drop typemod keep default keep not null using tinyint-to-boolean

EXCLUDING TABLE NAMES MATCHING ~/ams_/, ~/cms_/, 'jogging_log', 'RedisOppEmails_tbl', ~/announcements_announcement/, 'bak_opportunity_expiraiton_date', ~/company_sellerprofile/, ~/djcelery_/, 'numbers_tbl', ~/network_updates_statusupdate/, 'opp_invite_redis_dump_tbl', 'site_announcements_siteannouncement', 'south_migrationhistory', ~/tmp_m/, ~/waffle_/, 'inbound_email_inboundemail', 'audit_log_auditlog', ~/analytics_/

AFTER LOAD DO
$$ UPDATE alembic_version SET version_num = '4e97c60f45e0'; $$;

We start off by telling pgloader to create the tables and indexes, lowercase all identifiers and keep foreign keys, and reset the sequences for primary keys to the next value they should have after copying over the data. We’re also batching rows; pgloader requires a bit of tuning to keep it from running out of memory (compiling it with Clozure Common Lisp instead of Steel Bank Common Lisp definitely helps with the memory usage). Most of the custom column casts are booleans, but one of them uses a very simple custom cast that turns all data into nulls. We did this because this was a column we wanted to keep in MySQL in case the migration went poorly, but we knew we would not need in Postgres. We also exclude certain tables matching either regular expressions or strings, and finish by setting our alembic migration version number. This ran smoothly and quickly.

Migrating the code

Our codebase used the Django ORM and SQLAlchemy for most of its SQL access, but we still had some bad habits. Some queries would use 1 and 0 with boolean fields instead of Python’s True and False values. Other queries were written in Django templates that were rendered and executed by a custom system called tplsql. These queries often used MySQL specific functions and index hints. We also needed to start using timezones on our DateTime data, since Postgres supports (and indeed expects) timezones. These were simple to fix and easily found by exercising the app against the Postgres database.

A couple of other issues were more troublesome. Some of our code used an upsert idiom that worked a bit like this:

try:
    create_a_thing()
except IntegrityError:
    update_the_thing()

This doesn’t work in Postgres because an integrity error means the transaction is now in an invalid state and you can’t continue using it. The simplest approach is to wrap the existing idiom in savepoints, so that’s what we did. A better idiom would use Postgres’s support for CTEs, but given everything else we were changing, we decided to be conservative.

We also had queries (usually using the tplsql system) which were misusing GROUP BY. If you tell a SQL engine to group by some column, every other column you select needs to be an aggregate function; otherwise the SQL engine doesn’t know which rows to return, right? Well, MySQL will just pick a row, but PostgreSQL complains. So we had to restructure those queries to properly work with GROUP BY, in the process making sure we were actually querying for the data we wanted.

Going live

After the migration had been extensively tested, we had to do the release. This would be a multi-hour downtime, since we needed to make sure no new data was going into MySQL while we were moving it to PostgreSQL. So we ordered a pizza and waited till Friday night, whereupon the migration failed two hours into the process. It turned out the postgres user in our production environment didn’t have all the permissions that it had had in test, and some of those permissions were important. So once we fixed the permission problems, we tried again on Sunday. This time it went off without a hitch and everyone came in on Monday to a faster, more reliable Axial.

Anonymizing data with SQLAlchemy and Ansible

One of the key aspects of a good configuration management tool is that it provides the aspect of idempotence to your system configuration.

Idempotence (/ˌaɪdɨmˈpoʊtəns/eye-dəm-poh-təns) is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application.”

– http://en.wikipedia.org/wiki/Idempotence

Given a system, a desired target state and the sequence of steps required to take an unconfigured host to that state, an idempotent CM (configuration management) tool will make only the  subset of changes required to bring the system to the desired target state.  On a system that is already in the desired state, repeated runs will have no effect as the tool will makes zero changes, since none are required.

Ansible is a configuration management written in Python that performs its actions by way of modules.  Ansible configurations are defined as YAML-formatted files called playbooks, which by way of modules, compose a description of what the state of the system should be.

---
- hosts: localhost

  tasks:
  - name: Ensure sample.txt is correctly permissioned
    file:
      path:   /data/sample.txt
      owner:  bobby
      mode:   0600

In the above example, a playbook ensures that the file “sample.txt” located in the “/data” directory on the system is owned by the user “bobby”, and that only bobby can read or write to the file (as specified by mode 0600).

If this playbook is run in Ansible, the file module will examine the file at that specified location path, and make changes to its ownership and permissions if they do not match what we’ve defined in the playbook.

An Ansible module can be written in any language, but the default set of modules provided with Ansible are written in Python.

One could technically write a set of carefully written shell scripts to do tasks like these, for example, but a CM unit such as an Ansible module has a well-specified interface, that when you write against, will automatically give you access to all the tooling and features that the CM framework provides. The most obvious benefit is that your operation now becomes a unit that you can verily easily use in combination with other units to define non-trivial workflow.

In this instance, we wanted to use Ansible to ensure that certain types of data within a given database were anonymized (where necessary). This might be for ensuring that sensitive user information is removed before a dataset is used for testing or development. Since Ansible does not have this capability out-of-the-box, we added the functionality as a module by way of the Python SQLAlchemy library.

SQLAlchemy

SQLAlchemy is quite a capable database toolkit and ORM (object relationship mapper) for Python, and it’s been covered to some extent here on the Corps blog.

SQLAlchemy gives us the tooling we need to interact with the variety of databases in a (mostly) vendor agnostic way using plain Python, and Ansible gives us a framework within which we can define system configuration and combine complex actions to take a system to a desired state, or define and execute a specification for a particular process.

Our basic aim is to examine a database’s schema, recognize columns that look like they contain the type of data we want to anonymize, and make changes to the data only where necessary.  The type of data that should be anonymized will vary, but in this case we wanted to apply changes to user contact information.

An example of anonymization happens when we encounter a column that contains email information. The anonymization function that is applied returns a query that transforms all email values in this column into an anonymized email address format that looks like ‘user+8b7578454fdb311af0ce727f8944bd0a@example.com‘. The hash (8b75…d0a) is generated from the original email address, so you get the same output for the same input every time. This is assures integrity across the dataset for all email columns that are modified.

Finding these columns happens when SQLAlchemy is examining the database. This is done by way of SQLAlchemy’s reflection functionality.  The identification and marking of columns is performed during the reflection phase, as this is when SQLAlchemy is building its understanding of what the database’s structure looks like.

During reflection, we want to flag columns of interest (candidates for anonymization). There’s more than a few ways of doing this, but SQLAlchemy offers a very interesting feature which came in handy for this.

SQLAlchemy Events

The SQLAlchemy event API, introduced here, allows you to register callback functions or methods that fire whenever certain operations take place within SQLAlchemy, effectively letting you hook any arbitrary logic you desire into its internal operation.

There are events for a large portion of the operations that SQLAlchemy performs, from DDL operations to the moment when a transaction is rolled back, for example.  As a result of this, you can use events for quite a number of purposes, such as:

We use the ‘column_reflect’ event to step in and “decorate” the Column objects that SQLAlchemy creates while it performs database reflection.

from sqlalchemy import ..., event

class DBAnonymizer(object):
    def __init__(self, params):
        ...
        # subscribe our column decorator function to column reflection events
        event.listens_for(Table, 'column_reflect')(
            self.column_decorator_callback)

    ...
    def column_decorator_callback(self, inspector, table, column_info):
        ...

(Above, we’re calling the event.listens_for event subscription decorator directly as a function to register our own instance method that we want to use for column reflection callbacks.)

By “decorate”, we basically mean that we mark the column to be anonymized with the query generation function that is pertinent to the type of contact information it contains.

By default, the module attempts to guess which columns should be decorated by looking at the column names.  But we can make additional decisions as specified by the parameters passed in to our module from the user’s playbook, such as ignoring all columns except those from a particular table, or decorating an explicitly specified list of columns.

When reflection is complete, the database structure is stored in SQLAlchemy’s MetaData object, which now also contains columns that have been “marked” by us.

Idempotence in Database Operations

Following the idempotence model, any actions we would execute should only happen to the column values that aren’t in the state we want.

Since we’re working in a database, that’s easily accomplished using a WHERE clause.  We simply define the conditions that don’t match what we want within the WHERE clause being used for our UPDATE.  Any rows that are already in the desired state don’t match, and won’t be updated.

Additionally, Ansible defines a special mode called Check mode, intended for running a playbook against a system with the intent of only checking what changes would be made, and not making any actual changes.  It’s very useful when you want to get an idea of how far a system is from the state defined in the playbook, without modifying anything on it.

We implement check mode by simply changing our anonymizing queries from UPDATEs into SELECTs, which then shows us how many rows do not match our desired state.

Building as many of our configuration operations into well-defined units such as Ansible modules affords us the ability to keep the processes we use in a consistent and composable format that we can then easily build upon.

If you’re interested in the anonymization module, you can grab it here.

The case for styleguides + class spokes for more agile development

As a UX designer, one of the biggest challenges to agile development and processes I’ve found is that iterations and new features often get designed and developed with less regard to previous iterations and styles than there should be. If your organization or site doesn’t have a well defined and comprehensive style guide to work from, or at the very least, established and reusable stylesheets, the problem often grows from a minor annoyance into serious technical debt that must be resolved.

If your process has been agile for long enough, and you haven’t focused on that level of detail enough, you end up with multiple css files with inconsistent styles that conflict with each other. You also accumulate a lot of duplication of the same things (and a lot of leftovers that should be removed). Embarrassingly, we recently audited our front-end code and discovered that we have over 50 different font-family strings, when only 3 font-family strings are really necessary. We could be so much more efficient about this

How does this happen?

As I’m designing a new feature or changes to existing functionality, there are often new elements that no existing style fully covers. Sometimes there is also a need to improve the existing design to make it more visually appealing, functional, or obvious to the users. These are real-world needs and an important part of the iterative process, but also open the door for problems to arise. Often in providing specifications for these new designs, I’ll try to specify an existing classname of something else that matches the style of what I’m wanting, or state “make it like this other class we use here”. Without a style-guide to refer to and good habits to pay close attention to overlap with existing styles, most of the time I just end up detailing the individual styles in the specs, and verifying that they look right upon implementation. For our engineers, this makes it easy to develop without having to care too much about prior styles, but it ALWAYS either causes problems during QA, or later on when a style changes or stylesheet doesn’t load and hunting down the offending conflict is like finding a needle in a haystack.

We recognized and complained about the mess, but couldn’t prioritize fixing it. It’s like when you know your car is a mess, but you’re just too tired after driving it every day to get it clean, and you know its going to rain tomorrow and mess it up again… The problem is, the longer we let it get dirty, the more work we created for ourselves for design, implementation, testing, and reusability.

LETS DO THIS!

We cycled through the Kübler-Ross Model of Five Stages of Grief:

  1. Denial and Isolation: what got us here in the first place
  2. Anger: what we experience when things break during implementation, testing, or later on.
  3. Bargaining: what we do back and forth to get around it when it breaks, but we don’t really solve the problem.
  4. Depression: sinks in when you realize how far you have to go, how many places you’ll have to pick through to clean up, and how much work you have to put into documenting the right way.
  5. Acceptance: Is this sustainable? No? Then LETS DO THIS:

So, how do you fix it?

We started with a comprehensive audit of the most common elements of our application that had the most variation, and identified the two lowest-hanging fruits we could start resolving: Buttons and text styles.

Buttons:

We had old button styles from bootstrap, buttons that still referenced bootstrap but overrode certain things, and then several generations of newer buttons. We identified the buttons, the pages they lived on, and then went about defining a NEW button style that would cover all button implementations. With a style defined for every kind of button, we started building a button spoke that contained everything every button needed. Now, we have a plan of action of which buttons on which pages we can replace using the single reusable spoke. Every new button a designer designs must now match (or make a case for adding an additional button style, which can then be added to the spoke). Every new button an engineer implements must use the styles from the button spoke. While we’ve only just recently implemented the button spoke and started slowly converting existing buttons to use it, both the product and engineering teams have a much higher degree of confidence that the new buttons will be the easiest part of a design to implement.

Before: (provided in every single design spec & mockup every time a button is used)
“Here is the button, and the specs for this button”
button-spec-example

After: (stated in implementation spec)
“use Regular Primary Icon Button” from the button spoke.

Text Styles

We currently have over 50 different font-family strings, and countless variations on weight, size, and line-heights. We only need 3 font families specified, and 10 size variations with specific weights and line-heights for each. Our next project will be to define a type spoke that specifies these, and then begin converting pages to refer to this. Already as we provide specs to the engineers, we’re referencing text by the proper family name and size, weight, making documentation and testing much easier.

Takeaways

For Engineers: ask your designers to use existing class names already available to them. If they don’t know them, work with them to develop a library or reference. And when implementation calls for styles, either add them to the stylesheets in a re-usable way, or try to reference as much of the original style as possible.

For Designers: look at the code and figure out what you currently have, and what you actually want. Define your styles methodically and with reason, and if something needs to be different, communicate WHY.

Validating JSON-RPC Messages

Here at Axial we have standardized on JSON-RPC 2.0 for our inter-service communications. We recently decided to start using JSON Schemas to validate our JSON-RPC 2.0 messages. JSON Schema (http://tools.ietf.org/html/draft-zyp-json-schema-04) is a flexible declarative mechanism for validating the structure of JSON data. In addition to allowing for definitions of simple and complex data types, JSON Schema also has primitives ‘allOf’, ‘anyOf’ and ‘oneOf’ for combining definitions together. As we will see, all three of these primitives are useful for validating even just the envelope of JSON-RPC 2.0 messages.

The simplest JSON Schema is just an empty object:

{ }

This JSON Schema validates against any valid JSON document. This schema can be made more restrictive by specifying a “type”. For example, this JSON Schema:

{ “type”: “object” }

would validate against any JSON document which is an object. The other basic schema types are: array, string, number, integer, boolean and null. Most of the basic schema type comes with additional keywords that further constrain the set of valid documents. For example, this JSON Schema could be used to validate an object that must contain a ‘name’ property which is a string:

{
    “type”: “object”,
    “properties”: {
        “name”: { “type”: “string” }
    },
    “required”: [ “name” ]
}

If “name” was not in the “required” list (or if “required” was not specified) then objects would not have to actually have a “name” property to be considered valid (however, if the object happened to have a “name” property whose value was not a string, then the object would not validate). Normally, objects are allowed to contain additional properties beyond those specified in the “properties” object; to change this behavior, there’s an optional “additionalProperties” keyword that can be specified with ‘false’ as its value.

The combining primitives ‘allOf’, ‘anyOf’ and ‘oneOf’ can be used to express more complicated validations. They work pretty much as you might expect; ‘allOf’ allows you to specify a list of JSON Schema types which all must be valid, ‘anyOf’ means that any one of the list of types must be valid, and ‘oneOf’ means that EXACTLY one of the list of types must be valid. For example:

{
    “anyOf”: [
        { “type”: “string” },
        { “type”: “boolean” }
    ]
}

would validate against either a string or a boolean.

It turns out that all three of these combining primitives are useful for accurately validating JSON-RPC objects. JSON-RPC objects come in two flavors; the request and the response. Requests have the following properties:

  • jsonrpc: A string specifying the version of the JSON-RPC protocol. Must be exactly “2.0”.
  • method: A string containing the name of the method to invoke. Must *not* start with “rpc.”.
  • id: An identifier for this request established by the client. May be a string, a number or null. May also be missing, in which case the request is a “Notification” and will not be responded to by the server.
  • params: Either an array or an object, contains either positional or keyword parameters for the method. Optional.

Responses must have these properties:

  • jsonrpc: As for the request, a string which must be exactly “2.0”.
  • id: This must match the ‘id’ of the request, but could be null if the request ‘id’ was null or if there was an error parsing the request ‘id’.

In addition, if the request was successful, the response must have a ‘result’ property, which can be of any type. Conversely, if the request was not successful, the response must NOT have a ‘result’ property but must instead have an ‘error’ property whose value is an object with these properties:

  • code: An integer indicating the error type that occurred.
  • message: A string providing a description of the error.
  • data: A basic JSON Schema type providing more details about the error. Optional.

Here’s a schema for validating JSON-RPC Request objects:

{
    "title": "JSON-RPC 2.0 Request Schema”,
    "description": "A JSON-RPC 2.0 request message",
    "type": "object",
    "properties": {
        "jsonrpc": { "type": "string", "enum": ["2.0"] },
        "id": { "oneOf": [
            { "type": "string" },
            { "type": "integer" },
            { "type": "null" }
        ] },
        "method": { "type": "string", "pattern": "^[^r]|^r[^p]|^rp[^c]|^rpc[^.]|^rpc$" },
        "params": { "oneOf": [
            { "type": "object" },
            { "type": "array" }
        ] }
    },
    "required": [ "jsonrpc", "method" ]
}

Note the use of “oneOf” in the definition of the ‘id’ and ‘params’ properties. In addition, the ‘pattern’ in the definition of the ‘method’ property could use some explanation. Recall above that methods starting “rpc.” are not allowed in JSON-RPC. JSON Schema provides for the use of regular expressions to constraint string types, and the pattern here matches all strings except those starting with “rpc.”. If you are familiar with Perl-compatible regular expressions, you probably are thinking that this pattern would be more elegantly written as “^(?!rpc\.)” which is certainly true, but for the widest compatibility, it is recommended to not use “advanced” features like look-ahead in JSON Schema patterns.

The JSON Schema for validating JSON-RPC Response objects is a bit more complicated:

{
    "title": "JSON-RPC 2.0 Response Schema",
    "description": "A JSON-RPC 2.0 response message",
    "allOf": [
        {
            "type": "object",
            "properties": {
                "jsonrpc": { "type": "string", "enum": ["2.0"] },
                "id": { "oneOf": [
                    { "type": "string" },
                    { "type": "integer" },
                    { "type": "null" }
                ] }
            },
            "required": [ "jsonrpc", "id" ]
        },
        {
            "oneOf": [
            {
                "type": "object",
                "properties": {
                    "result": { },
                    "jsonrpc": { },
                    "id": { }
                },
                "required": [ "result" ],
                "additionalProperties": false
            },
            {
                "type": "object",
                "properties": {
                    "error": {
                        "type": "object",
                        "properties": {
                            "code": { "type": "integer" },
                            "message": { "type": "string" },
                            "data": {
                                "anyOf": [
                                    { "type": "string" },
                                    { "type": "number" },
                                    { "type": "boolean" },
                                    { "type": "null" },
                                    { "type": "object" },
                                    { "type": "array" }
                                ]
                            }
                        },
                        "required": [ "code", "message" ]
                    },
                    "jsonrpc": { },
                    "id": { }
                },
                "required": [ "error" ],
                "additionalProperties": false
            } ]
        }
    ]
}

In this example, “allOf” is used to combine an object that defines the properties shared between error and success responses with “oneOf” two different objects that define the specific properties in either an error or success response. The “additionalProperties” property setting is needed because it is an error to supply both ‘result’ and ‘error’ properties. Because of the use of “additionalProperties” it was also necessary to specify the “jsonrpc” and “id” properties in both the error and success responses, but no further type information is required there because their types are fully specified in the first “allOf” definition. Although the “anyOf” used in the definition of the “data” property of the error object could be replaced with just an empty schema, it is convenient here as it clearly documents what types are allowed.

As you can see, JSON-RPC envelopes are actually rather tricky to validate accurately, but JSON Schemas are flexible and generic enough to do the job. We also use JSON Schemas to validate the parameters to and return data from our JSON-RPC; look for the details of how that works and how it integrates with our RPC service definitions in a future blog post.

Named Locks in MySQL and Postgres

Axial recently hit a major milestone with the release of AMS (Axial Messaging Service). AMS provides users with an end-to-end email solution (much like Google’s Gmail) that seamlessly integrates with their experience on Axial (much like LinkedIn’s InMail). Of all the issues that arose while developing AMS, none were as simple and destructive as the one presented below. Our solution was as simple and beautiful as the problem itself; and that… is worth writing about my friends.

Consider the case where lisa@gmail.com sends an email to two Axial members, Scuba and Doug. The SMTP envelope might look something like this:

From: lisa@gmail.com
To: scuba@mail.axial.net, doug@mail.axial.net
Subject: Our next meeting
Message-ID: <123-abc@mail.google.com>

Hey guys! Shall we meet tomorrow at 2 PM?

We use Postfix as an MTA, which means Postfix is responsible for receiving the message and invoking the AMS inbound processor as a maildrop_command. We’ve configured Postfix to deliver each message once per recipient, with the philosophy that failure to deliver to scuba@mail.axial.net should not prevent delivery to doug@mail.axial.net. This means the AMS inbound processor will be invoked twice, once with Delivered-To: scuba@mail.axial.net and another with Delivered-To: doug@mail.axial.net. The following diagram shows Postfix delivering to AMS once per recipient:

locking_blog_post0

The steps for processing an inbound email look something like:

  • decode the message
  • look at the SMTP headers to see who the email is From and Delivered-To
  • record the email in our relational DB
  • store the email in the corresponding IMAP mailboxes

The last two steps involve storing and retrieving data. If you’ve ever dealt with two concurrent processes manipulating the same data at once, then you’re probably familiar with the need for inter-process synchronization. To illustrate this, the following diagram shows both processes appending to Lisa’s sent mailbox at once:

locking_blog_post

The arrows are red because there is a high chance the message gets appended to Lisa’s sent mailbox not once but twice. Although each process first checks to see if the message is already in Lisa’s sent mailbox, there is a chance they both check at the same time, in which case they both end up appending.

We simply need to ensure only one message is processed at a time. A file system lock won’t do the trick given messages can be processed on different servers and each has its own file system. However, given all of our servers reference the same dedicated SQL server, can we somehow use that as a distributed locking mechanism? Yes! With a named  lock, of course!

Remember this is still a single message with a unique Message-ID (in this case <123-abc@mail.google.com>). If we use the Message-ID as the name of our lock, we can use the following logic to get the mutual exclusion we’ve been longing for:

  • Get the Message-ID from the SMTP header
  • Attempt to obtain a lock whose name is <123-abc@mail.google.com>
    • If we CAN get the lock then continue processing the inbound email and release the lock when done.
    • If we CANNOT get the lock then immediately return 75 (Temporary Failure) to Postfix. Postfix will retry shortly.

With the logic above we can guarantee each message will be processed sequentially. Specifics for using named locks in both MySQL and Postgres can be found below.

Named Locks with MySQL

GET_LOCK(‘<123-abc@mail.google.com>’, 10)

Attempt to get the named lock, waiting up to 10 seconds. Return 1 if lock was obtained or 0 if not obtained.

RELEASE_LOCK(‘<123-abc@mail.google.com>’)

Release the named lock. Return 1 if lock was released, 0 if lock was obtained by another thread or NULL if lock does not exist

Named Locks with Postgres

It just so happens that we recently switched from MySQL to Postgres. When migrating the locking mechanism above we learned Postgres provides advisory locks in many flavors. The big differences are:

  • Rather than taking a string, Postfix takes either one 64-bit key or two 32-bit keys as a name for the lock.
  • Postgres does not allow a timeout to be specified. This makes sense for us because the 10 seconds above is extremely arbitrary.

We went with pg_try_advisory_xact_lock, which obtains an exclusive transaction level lock if available. Because this lock is at the transaction level it will automatically be released at the end of the transaction and cannot be released explicitly. This has a big advantage over the MySQL implementation, where cautious exception handling was required in order to ensure the lock is always released.


Thanks to:

  • Ben “Hurricane” Holzman – for pointing out that MySQL supports named locks
  • Jon “Inklesspen” Rosebaugh – for migrating the use of named locks to Postgres

Testing Front-End Components using Karma, Jasmine and Spokes

jasmine+karma+spokeAs your applications grow more complex and involved on the client-side, it becomes incredibly important to find ways to create reusable and encapsulated components that you can share between services. At Axial, we use spokes to solve that problem. With spokes, we can generate small executable Javascript files that encapsulate all of the Javascript, CSS and HTML necessary for our components and include them wherever we see fit.

There’s one more thing that we can do to gain even more confidence in the execution of our front-end components. Unit Testing. By breaking our application into small testable parts, or units, we can be sure that it is working exactly as we intend, protect against regressions in the future and most importantly refactor with confidence.

Using spokes is the first step in creating those small testable units within our application. The next step is actually writing the tests! In this article we are going to use Jasmine to write the tests themselves, and Karma to aid in creating our testing environment and utilizing spokes to their fullest extent.

To start, you’ll want to follow these directions to get Karma up and running on your machine. Once that’s all set, let’s create a new project and set up our Karma Configuration.

mkdir bootstrap-spoke && cd bootstrap-spoke

Then create a file called karma conf.js with these basic settings.

module.exports = function(config) {
    config.set({
        frameworks: ['jasmine'],
        files: [
            'spec/*.js'
        ],
        browsers: ['Chrome']
    });
};

Here, we’ve told Karma to look for all Javascript files in the spec directory. Let’s create that along with our first test file.

mkdir spec && touch spec/main_spec.js

We’ll write a basic test just to confirm everything is up and running.

in spec/main_spec.js

describe("Our First Test", function() {
    it('should pass', function() {
        expect(true).toBe(true);
    })
})

and run:

karma start

1

Awesome! Now that we know Karma is finding our test files and we’re seeing the output we expect, we can start adding spokes to our project. We are going to make a fairly basic spoke that will include Backbone and its dependency Underscore in our project.

mkdir spokes && touch spokes/backbone.cfg

We’ll give our spoke the following config:

[backbone]
js = ["underscore.min.js", "backbone.min.js"]

Grab those files from http://backbonejs.org/ and http://underscorejs.org/ respectively, and place them under spokes/js.

Now, we just want to make sure we’re able to generate our spokes from the command line before we get fancy and have Karma do the work for us.

spokec -c spokes/backbone.cfg -p spokes backbone > backbone.spoke.js

You should now see a new file named backbone.spoke.js in your root directory with the combined contents of both the Underscore and Backbone libraries. Let’s have Karma include any generated spokes in its build.

files: [
    '*.spoke.js',
    'spec/*.js'
],

And write a simple test to make sure Backbone and Underscore have been loaded correctly:

describe('Backbone', function() {
    it('should be loaded', function() {
        expect(Backbone).toBeDefined();
    })
})

describe('Underscore', function() {
    it('should be loaded', function() {
        expect(_).toBeDefined();
    })
})

All tests pass!

2

Now, we could simply run this command to generate our spoke anytime we run our specs but that would get old quick if we are modifying these files with any sort of frequency. We want Karma to automatically create the spokes for us before it runs its specs. Plus, I don’t like having these new generated files hanging around my test directory.

Enter Karma Preprocessors

A preprocessor will ingest files that you provide it, and perform a particular action on each. When it’s done, it will return the file and in most cases include the results of its work in the current Karma build. For example, the popular Karma Coffee Preprocessor will simply take all .coffee files and compile them to .js files at build time.

We’re going to make our own preprocessor, and have it compile our spokes and automatically include its results in our test build.

First thing’s first, delete backbone.spoke.js and its entry in karma.conf.js so we know we aren’t accidentally including that instead of our newly generated spokes.

Creating Karma-Spoke-Preprocessor

In order to create and utilize our own preprocessor, we need to create a new project (outside of our current directory) and include it in our backbone-spoke project as a Karma plugin. In a separate terminal window:

mkdir karma-spoke-preprocessor && cd karma-spoke-preprocessor

Karma plugins are loaded via NPM, so there is a little bit of boilerplate we need in order to get things up and running. First, I’ll create a package.json to describe our new project:

{
    "name": "karma-spoke-preprocessor",
    "version": "",
    "description": "A Spoke Preprocessor for Karma",
    "main": "index.js",
    "scripts": {
        "test": "echo "Error: no test specified" && exit 1"
    },
    "author": "Nader Hendawi",
    "license": "ISC"
}

and a new file index.js with the following (which I will explain shortly):

var createSpokePreprocessor = function(args, config, logger, helper, basePath) {
    return function(content, file, done) {
        done(content);
    };
};

createSpokePreprocessor.$inject = ['args', 'config.spokePreprocessor', 'logger', 'helper', 'config.basePath'];

// PUBLISH DI MODULE
module.exports = {
    'preprocessor:spoke': ['factory', createSpokePreprocessor]
};

This may look intimidating at first, but it’s really rather straightforward once you understand what’s going on. The first thing we want to do is create a function that will act on our preprocessed files. We are calling this createSpokePreprocessor here and it simply returns another function.

When all is said and done, Karma will be calling this function for each file that needs to be preprocessed. So, in our return function we are provided arguments for the contents of the file, some metadata about the file and a done function.

The contents of the file will be just that…a string representation of the file in question. The file argument will provide us with an object resembling the following:

{ path: '{{PATH}}/file.ext',
    originalPath: '{{PATH}}/file.ext',
    contentPath: '{{PATH}}/file.ext',
    mtime: Wed Apr 30 2014 16:39:32 GMT-0400 (EDT),
    isUrl: false,
    sha: 'a573bab0110b98ceaba61506246c18c4cdb179f9' }

Lastly, we have a done argument. This is a function that we must call when our preprocessor has finished its work so that Karma knows to proceed onto the next one. This function will take an argument that represents the output of our preprocessing. For now, we are simply passing the content back into the done function so that the file doesn’t undergo any processing.

Let’s throw some logging in there just to see that our app is actually using our preprocessor:

return function(content, file, done) {
    console.log(file);
    done(content);
};

Using Our Preprocessor

Adding a preprocessor to our project is very easy. Back in our karma.conf.js add the following:

preprocessors: {
    'spokes/*.cfg': 'spoke'
},

This will tell Karma to preprocess all .cfg files in the spokes directory using the spoke preprocessor.

If we were to run our tests now, everything would run just fine…but we’re not seeing our logging from the preprocessor. That’s because, according to Karma, there are no .cfg files in spokes, so it has nothing to pass to the spoke preprocessor. We have to tell karma to include this directory in its build.

files: [
    'spec/*.js',
    'spokes/*.cfg'
],

Now, if we run karma start again we should be good to go….but we aren’t.

3

Oh yea, we never told our project about our new preprocessor. Normally we’d do this by simply installing via npm npm install karma-spoke-preprocessor but our plugin only lives on our machine for now.

One way to do this is to provide the install command with the path to our module like so:

npm install ../karma-spoke-processor

This would work, but we would need to rebuild our spoke preprocessor and reinstall it in our project directory everytime we made a change.

NPM LINK

The npm link command is extremely helpful for local package development. The docs have a much more detailed explanation but essentially, we want to symlink our spoke preprocessor’s directory in our local npm installation:

In our karma-spoke-preprocessor directory:

npm link

We should see output similar to the following:

/usr/local/lib/node_modules/karma-spoke-preprocessor -> /Users/nhendawi/Projects/blog_posts/karma-spoke-preprocessor

This will tell npm to provide our karma-spoke-preprocessor directly from its directory rather than a remote repository.

Then, we need to tell our backbone-spoke project to use this symlink’ed version of our plugin. In backbone-spoke:

npm link karma-spoke-preprocessor

We should see the following output which confirms that we have successfully installed a linked version of our preprocessor:

/Users/nhendawi/node_modules/karma-spoke-preprocessor -> /usr/local/lib/node_modules/karma-spoke-preprocessor -> /Users/nhendawi/Projects/blog_posts/karma-spoke-preprocessor

Now, we can make changes to our preprocessor and have them be immediately available to our app!

Running karma start now, we should see the output from our log:

{ path: '/Users/nhendawi/Projects/blog_posts/fake-app/spokes/backbone.cfg',
    originalPath: '/Users/nhendawi/Projects/blog_posts/fake-app/spokes/backbone.cfg',
    contentPath: '/Users/nhendawi/Projects/blog_posts/fake-app/spokes/backbone.cfg',
    mtime: Wed Apr 30 2014 16:39:32 GMT-0400 (EDT),
    isUrl: false,
    sha: 'a573bab0110b98ceaba61506246c18c4cdb179f9' }

4

Great! Our project directory is using our preprocessor for the files that we have configured.

However, our tests are failing since we are no longer including Backbone and Underscore in our build. We’ll solve that once we actually generate our spokes and include them in our project.

Generating the spokes

Configuration Options

Now that we know our files are being delivered safely to our preprocessor, we have to find a way to use them to generate their corresponding spokes. To do that, we are going to pass in some configuration options from our Karma project and use them in our preprocessor.

In karma.conf.js:

spokePreprocessor: {
    options: {
        path: "spokes",
    }
},

By passing in an object named after our preprocessor, we can provide context-specific configuration options right to our preprocessor. In our case, we want to give our spokec command line client some information on what it should use for config and file paths when generating spokes.

To use these in our preprocessor is just as easy:

var createSpokePreprocessor = function(args, config, logger, helper, basePath) {
    config = config || {}

    var defaultOptions = {
        path: "/"
    };

    var options = helper.merge(defaultOptions, args.options || {}, config.options || {});

    return function(content, file, done) {
        console.log(file);
        done(content);
    };
};

With this, we can take the config that has been passed in from karma and access it directly in our preprocessor code. I’ve added some defaultOptions just for good measure and used the Karma helper class to merge it with our passed in options. We could just as easily set this to be something like:

{
    path: "spokes",
}

This would allow us to avoid having to define these in our karma.conf.js at all, but we’ll leave it as is for the purposes of this tutorial and to show how easily we can configure our plugins.

Now that we have our configuration options all ready, we just need to find a way to run our spokec command and let it work its magic.

Enter Exec

In order to do that, we are going to use the exec npm package. Exec will allow us to “Call a child process with the ease of exec and safety of spawn”. In our case, we will be using Exec to run our spokec -c etc etc command directly from within our preprocessor’s return function.

First, add it to our package.json.

"dependencies": {
    "exec": "0.1.1"
},

and run npm install to add it to our preprocessor project.

Then, require it at the beginning of our index.js.

var exec = require('exec');

For each file that is passed into our preprocessor, we want to run the following command:

spokec -c CONFIG_PATH -p FILE_PATH FILENAME | cat

Now, CONFIG_PATH and FILE_PATH are both available in our config options and filename can be parsed easily from our file argument:

var filename = file.originalPath.replace(basePath + '/' + options.path + '/', '').split('.')[0];

It’s important to note that we are piping the output into the cat command. This means that the output from spokec will be provided directly to our exec callback method as the out argument. This will then be provided to the done function and finally evaluated within our project.

So, our return function will now look like this:

return function(content, file, done) {
    var filename = file.originalPath.replace(basePath + '/' + options.path + '/', '').split('.')[0];

    exec('spokec -c ' + file.originalPath + ' -p ' + options.path + ' ' + filename + " | cat", function(err, out, code) {
        if (err instanceof Error)
            throw err;

        process.stderr.write(err);
        done(out);
    });
};

Now head back to our project directory, and run karma start:

5

And all of our tests are back to passing!

UserName Spoke

Let’s take this one step further and really see how we can test a spoke complete with HTML, JS and CSS. We’re going to follow the steps in the Spoke Project on GitHub and create the username spoke within our project.

When all is said and done, our project directory structure should look like the following:

6

There are a couple of changes that we will make for the purposes of this tutorial. First off, since we are already including the backbone spoke as part of our build, we are going to remove the following line from username.cfg

spokes = ['backbone']

If we run karma start now, we’ll see the following error:

7

Oops, we never included jQuery in either of our spokes. To get things moving, we’ll grab it from jQuery add it to our Karma build:

files: [
    'spokes/js/jquery.min.js',
    'spec/*.js',
    'spokes/*.cfg'
],

Testing our Backbone Model

At this point, our tests are still passing but we haven’t added any new ones for our Username Spoke. Let’s change that. First, let’s do some simple tests for our UsernameModel to ensure it behaves correctly. In a new file at spec/username_spec.js:

describe("UsernameModel", function() {
    it('should exist', function() {
        expect(UsernameModel).toBeDefined();
    })

    it('should have the correct defaults', function() {
        var default_model = new UsernameModel;

        expect(default_model.get('first_name')).toBe('');
        expect(default_model.get('last_name')).toBe('');
        expect(default_model.get('is_internal')).toBe(false);
        expect(default_model.get('is_masq')).toBe(false);
    })

    it('should honor passed in attributes', function() {
        var new_model = new UsernameModel({is_internal: true, is_masq: true});

        expect(new_model.get('is_internal')).toBe(true);
        expect(new_model.get('is_masq')).toBe(true);
    })
})

I normally don’t like bundling up so many assertions in each it block, and some of these tests would be considered redundant, but I’m making an exception in this case for both the purposes of this tutorial and the simplicity of our spoke. In each test, we instantiate a new model with the attributes we’d like to test, and simply check to make sure it has the expected values. If we had a more complicated model, this is also where we could mock out any HTTP Requests we would make to the server, or any events that may occur when our model changes.

Testing our Backbone View

Testing the UsernameView takes a little bit more work than its Model counterpart. For one, we need to instantiate a model to pass into our view, and provide it with a DOM element in which to render itself. So, our it blocks will have a little more setup:

describe("UsernameView", function() {
    it('should exist', function() {
        expect(UsernameView).toBeDefined();
    })

    it('should set the model correctly', function() {
        var model = new UsernameModel({first_name: "Nader", last_name: "Hendawi"}),
        view = new UsernameView({model: model});

        view.render();

        expect(view.model.get('first_name')).toBe("Nader");
        expect(view.model.get('last_name')).toBe("Hendawi");

    })

    describe('template rendering', function() {
        it('should render the correct text when name is too long', function() {
            var model = new UsernameModel({first_name: "Nader", last_name: "Hendawi The Magnificent"}),
            view = new UsernameView({model: model});

            view.render();

            expect(view.$el.html().trim()).toBe("N.nnHendawi The Magnificent");
        })

        it('should render the correct text when the name is short', function() {
            var model = new UsernameModel({first_name: "Nader", last_name: "Hendawi"}),
            view = new UsernameView({model: model});

            view.render();

            expect(view.$el.html().trim()).toBe("NadernnHendawi");
        })

        it('should render the internal badge when necessary', function() {
            var model = new UsernameModel({first_name: "Nader", last_name: "Hendawi", is_internal: true}),
            view = new UsernameView({model: model});

            view.render();

            expect(view.$el.html().trim()).toBe("NadernnHendawinn(internal)");
        })

        it('should render the masq badge when necessary', function() {
            var model = new UsernameModel({first_name: "Nader", last_name: "Hendawi", is_masq: true}),
            view = new UsernameView({model: model});

            view.render();

            expect(view.$el.html().trim()).toBe("NadernnHendawinn(masq)");
        })
    })
})

Feel free to go through each test and see exactly what’s going on. Some of the setup logic can easily be moved into a beforeEach block such as:

beforeEach(function() {
    model = new UsernameModel({first_name: "Nader"})
});

where each subsequent test would simply update this one model object. But, for now I think it’s okay to repeat some of this code as each spec does test a distinct case.

If we run our tests now, we should see them all passing!

8

Next Steps

This is really the tip of the iceberg when it comes to testing our front-end components. From here we could easily add more spokes, create more complex view logic, integrate some communication with a RESTful backend service…anything we would want to do in a modern client-side application. But, we’re no longer at the mercy of the assumption that our code just works. By building up a sufficient test framework around our components, we have the liberty to add new features or refactor existing components with the confidence that we haven’t caused any regressions or bugs along the way. To change the output of our UsernameView’s template, we would simply update the tests with our new expected result, watch them fail, and rewrite the template until our tests pass.

But the testing doesn’t stop with Backbone. If you’re writing applications in Angular, Ember, React…whatever, the process is almost identical to the one we’ve followed here. Alot of the time, the hardest part will be getting our test environment in a state where we can easily test single units of our application. In our case, that’s where spokes, and ultimately our karma-spoke-preprocessor come in.

Leveraging Karma’s plugin interface, we were able to create a preprocessor that would ingest our spoke configuration and automatically embed its result in our test build. That’s awesome! And there’s so much more we can do. In future articles, I’ll be exploring ways to build our own test reporters for Karma, refactoring existing Javascript code to make it more easily tested, and other exciting ways to make our front-end architecture more robust and reliable.

Preventing errant git pushes with a pre-push hook

GitAt Axial, we use a centralized repository workflow; everyone works off the main git remote, making branches and pull requests as needed. Broken code is allowed on topic branches, but the latest code on master is expected to always work.

However, since master is just an ordinary git branch and the git command-line interface has a lot of different ways to do the same thing, we’ve had a few incidents where a developer intends to push a topic branch to the remote, creating a new branch there, but instead pushes the topic branch to master. Luckily, git 1.8.2 added support for a pre-push hook.

Git hooks are small scripts that can interact with different parts of the git operation. For example, a pre-commit hook can check that your code is properly formatted, or a post-receive hook can be responsible for kicking off a build. In our case, pre-push will run before any push and has the ability to abort it.

The pre-push hook is an executable script named .git/hooks/pre-push. It will be called with two command-line arguments (the name and url of the remote) and provided a list of refs being pushed and the corresponding ref, if any, on the remote side. See the description in githooks(5) for full details.

The first part of the hook assembles data about the push. We obtain the current branch using git symbolic-ref HEAD and check for a force push by inspecting the command-line arguments of the parent process (the git push in question). We also split out the branch names from the refs in the commits when possible. (A git push origin HEAD unfortunately won’t have the branch name associated with it, which is why we get the current branch.)

Push = namedtuple('Push', ['commits', 'remote_name', 'remote_url',
                           'current_branch', 'removing_remote', 'forcing'])
Commit = namedtuple('Commit', ['local_ref', 'local_sha1', 'remote_ref', 'remote_sha1',
                               'local_branch', 'remote_branch'])

def assemble_push(args, lines):
    commits = []
    for line in lines:
        split = line.split()
        if len(split) != 4:
            parser.exit(status=1,
                        message="Could not parse commit from '{}'\n".format(line))
        # local_branch
        local_branch = split[0].split('/')[-1] if '/' in split[0] else None
        split.append(local_branch)
        # remote_branch
        remote_branch = split[2].split('/')[-1] if '/' in split[2] else None
        split.append(remote_branch)
        commits.append(Commit(*split))
    current_ref = subprocess.check_output(['git', 'symbolic-ref', 'HEAD']).rstrip()
    current_branch = current_ref.split('/')[-1]
    pid = os.getppid()
    push_command = subprocess.check_output(['ps', '-ocommand=', '-p', str(pid)])
    forcing = ('--force' in push_command or '-f' in push_command)
    removing_remote = set()
    for commit in commits:
        if commit.local_ref == "(delete)":
            removing_remote.add(commit.remote_branch)
    return Push(commits=commits,
                remote_name=args.remote_name, remote_url=args.remote_url,
                current_branch=current_branch, removing_remote=removing_remote,
                forcing=forcing)

Now that we’ve assembled all the info, we can check the push for things we want to prohibit.

Push(commits=[Commit(local_ref='refs/heads/topicbranch',
                     local_sha1='6eadeac2dade6347e87c0d24fd455feffa7069f0',
                     remote_ref='refs/heads/master',
                     remote_sha1='f1d2d2f924e986ac86fdf7b36c94bcdf32beec15',
                     local_branch='topicbranch',
                     remote_branch='master')],
     remote_name='origin',
     remote_url='git@github.com:axialmarket/repo.git',
     current_branch='topicbranch',
     removing_remote=set([]),
     forcing=False)

In this push, you can see that I’m attempting to push topicbranch on the local repo to master on the remote; this is the first category of mistake we want to prevent.

def check_unmerged(push):
    for commit in push.commits:
        compare = commit.local_branch
        if commit.local_ref == 'HEAD':
            compare = push.current_branch
        if commit.remote_branch in PROTECTED and \
                compare != commit.remote_branch:
            msg = ("You cannot push the local branch '{}' to the protected "
                   "remote branch '{}'\n".format(
                       compare, commit.remote_branch))
            parser.exit(1, message=msg)

PROTECTED is a set of branch names, currently master and qa. commit.local_branch will be None if the local ref doesn’t have a branch name. We have a special case to handle HEAD, where we instead use the current branch, but otherwise we fail to be on the safe side. This function will prohibit accidentally pushing a topic branch directly to master or qa. We have a similar check for deleting a protected branch, based on the removing_remote property on the push object.

We use alembic for database migrations. When two people create migrations in their topic branches and merge them together, this creates a branched database history and alembic will refuse to apply migrations until you resolve the branch. We have some scripts to warn you when you get an alembic branch after a merge, but people have sometimes forgotten to fix that and pushed the branch to qa or even master. So here’s a check for that! It’s a bit more complicated than the last one.

def check_alembic_branch(push):
    for commit in push.commits:
        if commit.remote_branch not in PROTECTED:
            continue
        awd = tempfile.mkdtemp(prefix='alembic.')
        treeish = "{}:share/migrations".format(commit.local_sha1)
        tar = subprocess.Popen(['git', 'archive', '--format=tar', treeish],
                               stdout=subprocess.PIPE)
        extract = subprocess.Popen(['tar', 'xf', '-'], stdin=tar.stdout, cwd=awd)
        tar.stdout.close()  # Allows extract to receive a SIGPIPE if tar exits.
        extract.communicate()
        if extract.returncode != 0:
            parser.exit(1, message="unable to check for alembic branches\n")
        branches = subprocess.Popen(['alembic', 'branches'], cwd=awd,
                                    stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        output = branches.communicate()
        if branches.returncode != 0:
            parser.exit(1, message="unable to check for alembic branches\n")
        # cleanup awd! otherwise we pollute /tmp
        shutil.rmtree(awd)
        if len(output[0]) > 0:
            msg = ("Alembic migration conflict!\n{0}\n"
                   "Fix this first!\n")
            parser.exit(1, message=msg.format(output[0]))

The good news is alembic has a command to let you know if a branch exists. The bad news is alembic expects to be able to work on an actual file tree on disk. The working tree will often correspond to the tree being pushed, but we can’t guarantee that will be the case. So, we make a temporary directory and use git archive to extract the migrations directory for the appropriate refs. This is actually pretty fast.

There’s a lot more checks we can add here following these examples; we can prevent force-pushing to master, ensure Jenkins has successfully built the topic branch first, and so on. This addresses our immediate pain points while being easy to extend in the future.

[image via git-scm.com]

Test Data Generation

Testing software is reducible to the Halting Problem. No software program, no matter how small, can be proven (or tested) to show that it will “always” work as expected. Even the simplest program:

#!/usr/bin/env python
print (“hello world”)

can break if the python executable is moved to a different directory on your system, or due to operating system failures or physical hardware failures.

Realistically, today’s programs are increasingly complex and built on top of multiple frameworks, each with its own bugs. And all of this before you get to testing different browsers and browser versions for web applications. What test engineer hasn’t heard: “This does not work on IE9”? There are an infinite number of factors that can cause a program to fail.

In addition to the challenge of testing complex programs, there is increasing pressure to test and ship faster. To keep pace, you have to automate a sufficient amount of testing. When testing web interfaces, even with popular functional testing tools like Selenium-Web Driver or QuickTest Pro (QTP), your automated tests can quickly become brittle and obsolete against a constantly changing feature set.

Data-Driven Automation

To keep tests from becoming overly brittle, test engineers try to separate test data from testing logic, choosing to store test data in a database or config file rather than hard-coding test data with test code. Tools like pytest’s mark.parametrize allow you to make the most of this test data by treating each test entry as a separate test-case.

Remember that the goal is still not full coverage, you’re not trying to test every possible permutation of bytes for every text field. The goal is good coverage with minimal test data, taking a tip from The Art of Software Testing we can quickly get to “better coverage” with a minimum of test data by using equivalence partitioning, and boundary testing, but even with consistent application of these techniques we’re left with a few questions:

  1. Who is going to write out all the test data for each field?
  2. With multiple interdependent form fields, how are we going to maintain the data?

The Answer? Automation!

Think of each field in a form as a node of a directed graph. If each field is a node in a graph, all you need to automate test data generation is a data-format for representing each node, and a graph generation utility. Given inputs that conform to our equivalence partitioning and boundary testing, we can then start to generate a minimum amount of test data with a trivial amount of code.

A first pass at graphing a simple form looks like this:

fig1

  1. Start -> ObjSell, ObjRaise, ObjExplore
  2. ObjSell, OppExplore -> NextButton
  3. ObjRaise -> CapSenior, CapMezz, CapEquity, CapAll
  4. CapSenior, CapMezz, CapEquity, CapAll ->  RangeTrue, RangeFalse
  5. RangeTrue -> CapMin, CapMax
  6. RangeFalse -> CapMin
  7. CapMin -> CapMax, NextButton
  8. CapMax -> NextButton

Which would generate test cases like this:

  1. START->ObjSell->NextButton
  2. START->ObjRaise->CapSenior->RangeTrue->CapMin->CapMax->NextButton
  3. START->ObjRaise->CapSenior->RangeFalse->CapMin->NextButton
  4. START->ObjRaise->CapMezz->RangeTrue->CapMin->CapMax->NextButton
  5. START->ObjRaise->CapMezz->RangeFalse->CapMin->NextButton
  6. START->ObjRaise->CapEquity->RangeTrue->CapMin->CapMax->NextButton
  7. START->ObjRaise->CapEquity->RangeFalse->CapMin->NextButton
  8. START->ObjRaise->CapAll->RangeTrue->CapMin->CapMax->NextButton
  9. START->ObjRaise->CapAll->RangeFalse->CapMin->NextButton
  10. START->ObjExplore->NextButton

With this one simple example, you can start to see the power of using graphs to generate maintainable test data. If fields change, you need only update the node description to reflect any changes and regenerate your test data.

The next step is automatically generating test cases and hooking it up to a test runner. Stay tuned!