Brian Rumschlag System Architect at Allied Payment
What service does Allied Payment provide its customers?
We are an online bills-payment company. We process bills payments for banks and credit unions. If you use online banking services with a regional bank, they typically outsource that service to companies like ourselves. We’ve been in business about two and a half years. We have seven financial institutions live and should be getting three or four more this month. Our first customer was First Financial out of Abilene, Texas and shortly thereafter, there was City Bank of Texas.
We provide an API for mobile banking providers to submit payments. We also have several user interfaces, which integrate with a bank’s online banking system.
One of our most interesting features is PicturePay. This allows you to pay a bill by taking a picture of the coupon. Kind of like when you would take a picture of a check to make a deposit, only this is for bill payments. Get your phone bill in the mail, take a picture of it, and you’re done.
What brought you to a NoSQL database? What kind of business problems were you facing that caused you to evaluate NoSQL technologies? What then caused you to, out of that, choose Cassandra and DataStax Enterprise?
Our schema was rapidly evolving and adding additional features. Being locked into our read model, our schema and changes were getting more and more painful. As we added features, we needed to add new columns and tables and things like that. We were looking for a way to pull out these whole objects at a time and not do this complicated join.
You see, writes are very simple, they’re consistent, you got one place to write them; but it makes reads very difficult because you’re joining all these tables together. If you have anything but a basic parent-and-child relationship, it gets very difficult to model and pull all that down.
If our application is about a hundred reads per write, then why on earth are we making our writes simple and our reads difficult? Our architecture is based around event sourcing. Every state change in the system is encapsulated there. If I added a payment or changed a customer’s name , then that’s all encapsulated in an event. Then we have a process to reach those events and denormalize them and write them into our data source. And they calculate the read models there. What we were seeing was that our writes were pretty rapid but then when we read all that information back, we need quite a bit of caching in order to get any kind of performance out of it. That’s partly what brought us to NoSQL.
Our architecture enabled us to essentially write new denormalizers and then run with it. You still have that state change. It wasn’t a dead correction. You read the current schema out and then turn it into some object and then commit that. We were able just to replay those events into Cassandra. That was the challenge we were looking at. We were looking at some document databases and we noticed that lots of our data tends to be pretty clustered.
Regarding how Cassandra underpins your applications, is it more external facing, internal facing or both?
Cassandra’s hosting most of our read models which is, when we’re querying against things, usually what we’re querying against. Whether it’s a customer looking for something or our internal operations folks looking for data, they tend to read those out of the Cassandra database. We are able to recreate everything that is in Cassandra out of the event storage that is still kept with SQL Server.
The event storage is basically a log that gives you a list of events. You think of it like a transaction log. It gives you a list of all of the state changes that happened in the system.
At any point in time, I can go back and recreate what the state of the system looks like. That lets us use something like Cassandra even though we didn’t know how it would behave in Windows Azure, anything like that, because we were able to create the data that was in Cassandra over and over again.
How does Cassandra impact your day-to-day business?
Here’s an example of a typical query. We take a customer and say “give me all of the payments that he’s ever made.” That could be 10, or a hundred, or maybe a thousand payments. If you’re looking through that in a big table, you would have issues where payments are getting inserted at the time, a lot of users are going to wait. Even at being indexed, you’re still going through all of these reads. It was rare that we were looking at every payment from the beginning. We were mainly looking at payments for each customer and then payments by day. What payments do we have to process today? What payments do we have to process tomorrow?
That’s where the column store and Cassandra benefitted us because we’ll have a row keyed on the day, or keyed on the customer’s identifier. We just add additional columns that contained the serialized payment update.
That reduced our queries from 75 milliseconds or 80 milliseconds to 9 milliseconds, nearly a 90% reduction in response time!
Great. It sounds like you started that with a relational database. Which one was that?
We were running Microsoft SQL Server on both Azure and EC2.
Let’s turn our attention now to some of the tech aspects of the company. Give me an idea of what your infrastructure is composed of. What kind of platforms do you use? Do you do your work on premise? Is it in the cloud?
Sure. Our entire infrastructure is in the cloud. We run in Microsoft Azure, as well as EC2. We are in the process of adding a third provider running on a private cloud in Rackspace.
Do you use our software on EC2 and Azure or just EC2?
We actually use Cassandra/DataStax Enterprise in both Azure and EC2.
Do you have the database itself span multiple availabilities zones or do you just use a single availability zone? How is that done?
Right now, they are within a single-availability zone and then they are replicated across the cloud providers.
Which document databases did you evaluate?
Mongo and Raven.
What type of growth in capacity are looking at in the near term?
We’re anticipating our volume to go up by 100 percent this quarter. We seem to be on a curve of doubling, re-doubling our capacity.
We’ve got 10 financial institutions up and will have another four more this month. Then we got five or six more in the pipeline to be done before end of the quarter within June. That’s why we needed our solution to scale horizontally by just adding additional nodes, which Cassandra does very well.
Do you foresee yourself performing analytics on your Cassandra data?
Certainly. We try to provide our customers a better picture of who they’re paying – such as are they paying their mortgage through that bank or do they have the mortgage through another bank or a car loan or things like that. That still really is important because most of the financial institutions that we’re dealing with are pretty traditional. They’re a four-billion-dollar bank. They’ve got 20, 25 branches, but operationally they’re not sophisticated enough to know what they’re looking for.
So we ask, “how can we use that information to target advertising, or send mail with the bills that they’re paying, and get a better picture of what’s going on?” We’re still in the data collection phase as far as going through and looking at aggregating payment information together to get a better picture of the customer’s makeup. It would help our fraud detection by getting payments and seeing what fraudulent payments look like and feeding that through a neural network that would enable us to better predict if the payment is legitimate.
At the moment, we do things like look at the geo location, ask them where they were and control the fraud through limits. We only allow them to pay for so much and that kind of thing so we can analyze it better, determine the purpose of the payment, etc. Hadoop must learn to figure out whether or not a payment is legitimate or not.
How do you manage your clusters?
We use OpsCenter to check the size of how many keys there are in each key space and that sort of thing. We bring up OpsCenter mainly when we want to poke at the data. It has a nice interface that lets us write all queries to see what’s going on.
What benefits have you experienced since moving to a NoSQL database?
We’ve been able to greatly simplify our read models. Having a schema-less data store has allowed us to be very aggressive in adding new functionality.
If someone was brand new to this world of NoSQL and all these type of stuff and or Cassandra, what advice would you give them in terms of getting started, what to concentrate on, etc.?
I guess the most interesting part of it to me was the difference in the data model. Even between a column store like Cassandra versus document stores like Mongo, how are you going to store your data so that you’re able to get the benefit that you want as opposed to trying to just move and normalize data into Cassandra. You have to wrap your head around really wide rows, lots of columns in a row as opposed to many rows.