rss

Startup News

Pakistani Startup News

» Selecting the Right Cloud Platform

While the array of cloud platforms (Platform as service) has increased significantly recently, its still a hard job to find the perfect host for your website. The obvious advantages of cloud platforms over dedicated servers start with cost and extend to ease of maintenance. For instance, on EC2 it takes about 30 min or less to fire a new server instance and the Amazon’s ability to charge its customer on per hour basis, makes EC2 a very attractive option. AutoScale is another great add-on to EC2 platform and lets the consumer automate the computing power usage based on user load.

We recently did an extended research on available cloud platforms for a client. The requirements were however no straight forward. Client’s top priorities were ability of platform to scale automatically, ease of maintenance, and performance of infrastructure. Then, to make the task challenging and exciting, the client wanted a geographical fail-over ability in case of a disaster. Amazon alone had 2 major outages in year 2011 so users are quite concern about business continuity in time of a disaster.

We evaluated many options starting from Amzon EC2, RDS, Heroku, Bluebox, RackSpace, High Velocity, Xeround etc and concluded that only one provider will not be able to answer all questions from our client. So we divided the architecture into App layer and DB layer and choose different hosts for each layer. In such a division the biggest concern is network latency but interesting fact is most cloud platforms are built over EC2 with additional wrapping so within EC2 network latency is not huge at all. We ran some test with Heroku (app) and Xeround (DB) and found network latency to be less than half a sec per transaction.

You can find below a presentation we put together after our analysis of different cloud hosts. There is a slide that links to a Google spreadsheet with calculation of estimated cost for our proposed infrastructure. Feel free to drop a comment to correct me or ask me

Presentation:

» Kung-Fu Chop To the Load Beast – Part I

A common goal every website or online service provider share is to have maximum visitors or server hits possible. That, of course, does not include  spam or denial-of-service attack on the server. While more traffic or hits means more business, it also generates the need to ensure the scalability and load capacity of the server. Interestingly, most engineers haste in order to test the load capacity of the server and put tons of loads the very first day and expect to see great results. Guess what? the server either gets non-responsive (meaning probably crashed) or it did take the load but the engineer did not know they over-killed the hardware and received a shockingly high server invoice at end of month.

Clearly one nor want too small a server muscle, neither too big and expensive of a server. To look eye-to-eye with the load creep, there is need of a strategy. Here is part I of my kung-fu attempt to chop it down. I would like to keep it high level and at a strategic level. In part II I will discuss specific details of

our recent optimization exercise on Rails. I must disclaim that server tuning or optimization is a big topic with tons of material available and what follows is my experience working with load testing and optimization of a Ruby on Rails server for a messaging engine project.

Before You Start Optimizing and Scaling:

a) Size the Beast: Estimate your target load on server. I prefer a number like transactions per min rather than transactions per second. Per seconds is much smaller a number when you have heavy processing and a normal transaction spans over a few seconds. It is also important to clearly define what a transaction means in your system, does a single DB hit counts for a transaction or entertaining a user request end-end counts for a transaction?

b) Benchmark Response Times: The goal of optimization and scaling should not be solely handling more load on the server, but also serving requests within a decent response time. A server handling tons of loads but keeping the user waiting for longer period will soon put the CEO out of business

c) Choose an appropriate load generation mechanism: This could be a free tool like JMeter or SOAPUI who can can create massive HTTP hits on the server. The flexibility these tools provide is quite nice ranging from configuring exact load to put on server using multi-threading and ability to attach a data pool to vary request data. You can also write your own code to generate a load if the request structure is complex. In our case, we used both.

Places to look for Optimization:

a) Starting with code optimizations. Hotspots are DB calls, third party web service calls and parsing large JSONs, XMLs etc. I have experienced that using an async approach of DB writing and JSON/XML parsing (wherever possible) greatly improves system performance and user experience. We optimized one of our routines by 800% using asynchronous DB writing

b) Application server threads: Application server request threads should always maintain appropriate ratio with hardware muscle. You don’t want to do too much or too less parallel request handling on application server. Too much will lead to CPU or Memory starvation and too less means you have getting an oversize server invoice month end. With out pretty standard request size, we have enabled 50 maxClients for Apache on standard EC2 XLarge instance and hitting about 50% of CPU capacity

c) Caching: Caching saves us from disk and notwork latency by reusing already fetched data. Caching is also available at multiple levels starting from Web serer caching, SQL caching provided by standard RDBMS and third party caching such as Memcached

d) DB Indexing: This is not something super latest or cutting-edge and has been in use for a while, but, there is a catch. Normally we create DB indexes on tables whom we hit the most in searching etc. However, if there are massive CUD operation (Create, Update and Delete) on the table as well then indexes will really slow them down because it updates the B-trees every time

Guide Yourself in Load Testing:

a) A cyclic approach is what works. Run more than one tests while recording them. I have found it useful to create a simple spreadsheet that records details and results of every test run. I suggest to record basic information like hardware profile, change in settings/hardware from previous test, load put on server, throughput of the server, exceptions/crashed, and duration of test

b) Its important to bring one change at a time to the system – let it be DB index, memcached, or more memory attached to the system If we bring more than one changes to the system for test run then it will be hard to determine the adverse or positive affect of a change independently

c) Profile your system: We recorded following information during the tests. CPU, memory and disk usage using Munin, system throughput using NewRelic and system response times using JMeter

c) Do not forget longevity tests: While we run many short duration tests it is important to run 10 hour or a day long tests as well to figure out if there are any dormant memory leaks that might crash the system in a few days time

Below is my attempt to picture the optimization process in a simple flow chart:

In part II I will discuss specifics of our recent load testing exercise on Rails

Page 1 of 1 ( 2 posts )