The importance of being available – (tech stuff)
Having got the philosophy and design of a high availability solution out of the way in my last post, this one is about the crunchy hard tech of implementation. (fans of my musings will have to sit this one out )
Westhawk Ltd had a contract to implement a high availability database backed SIP service – what did we use – and why?
After a lot of research and consultation with the customer we decided not to use Oracle (which is our normal first choice) and to go for MySQL as the database engine. Our research had turned up the clustered version of MySQL as a good candidate for the job.
Clustered MySQL has a very interesting solution to the problem of how to keep a pair of database machines in sync. All transactions are committed on both machines simultaneously. The transaction doesn’t complete until both machines have committed it. This is a ‘share nothing’ architecture. It could be slow, but the default for the NDB engine it uses is to hold the whole database in RAM. There are some odd limitations involved in this solution. For example: A newly created table in one machine will automatically be copied to the other but a view won’t.
That’s the back end – what about the VoIP server ?
We knew we needed something OpenSource, reliable and flexible.
The customer had come to us on the assumption that we would use Asterisk, which we considered and then decided not to. We plumped for FreeSWITCH.
Why FreeSWITCH ?
First off, because of the specific requirements there is very little ‘state’ info
to preserve no users, no registrations etc. In fact it is all about call routing.
The way that freeswitch’s xml_curl works makes it
very simple to switch the ‘database’ from one call to the next, so that when one database dies we can failover to the other.
Here’s what happens:
Every inbound call generates an HTTP Post to one of 2 nominated servers.
The post contains all the ‘channel variables’ – dialed number, ANI, channel, IPaddress etc. The web server replies with dynamically generated dialplan for this call telling FreeSWITCH how to deal with it. The dialplan is in XML, so it is easy to generate from a database with standard web tools. If the first web server does not respond, the second is queried instead. (no connections to drop, no mySQL proxy etc.)
Now, I could write a complex dialplan in Asterisk that emulated this behavior, but with FreeSWITCH I get it out of the box. (Xml_cdr works the same way – writing the call detail records to the database via HTTP).
So for this specific system, FreeSWITCH has done most of the work for me.
I’m not saying that it is now my default choice – it isn’t – but for this specific task it was the easier option.
We chose to use Glassfish as the web server, which in retrospect was overkill, it has a stack of clustering options we didn’t even touch.
We handle the high availabilty on the front end by having the phone company route their inbound SIP requests to a ‘floating’ IP address. We use the high availability support in SuSE 11 to move the IP address to the ‘active’ node and
re-start FreeSWITCH when needed.
There were of course problems – my ‘favorite’ being when I spent a day working out that the default Linux HA config assumes a cluster consists of 3 or more machines and won’t failover a service unless it has a quorum.
Don’t underestimate the added complexity of building a system with 5 hosts as compared with a single system – the system administration takes a lot longer and is quite a bit more complex.
On the other hand, we have built a high availability VoIP routing system based on free opensource software with a clear and extensible architecture.