3 differences between in-house and AWS

Wednesday, December 11, 2013

Our guest blogger Craig Lawton this week writes about the differences between in-house and AWS infrastructure.  To read more from Craig, please visit his blog at CoffeeScroll.com




3 differences between in-house and AWS

To support Gartner’s latest hybrid cloud predictions and become a broker of cloud services, IT will need to minimise the architectural differences between on-premise and cloud infrastructure and use standard cloud building blocks.

On-premises IT and AWS are different though. Traditional enterprise applications have evolved on open, interoperable, heterogeneous platforms, while Amazon evolved out of the requirements of an on-line retailer who needed serious scale.

Here is a typical cloud-based application pattern in AWS. This is a single-site deployment with a data-only DR site. Compare it to the typical enterprise stack. There are some differences.

Load balancing

Traditional load balancers are customisable, tuneable, and built on high performance hardware. Load balancers by the likes of CitrixF5 and Radware can perform a wide range of functions (eg. load balancing, fire-walling, vulnerability protection, address rewriting, VPN, VDI gateway).
Load balancing in AWS is a bit different to what you’re used to. Amazon’s native offering, the Elastic Load Balancer (ELB) is presented as a service to customers and the hardware is completely abstracted. AWS ELB can load balance across Availability Zones. This is geo-load balancing, a feature you have to pay extra for with on-premise load balancers. It also natively supports auto-scaling of EC2 instances, of course.

AWS ELB has some shortcomings though. You cannot create a static IP for load balanced loads, log HTTP(S) traffic, drain hosts or configure different load balancing algorithms, all of which is standard functionality in an on-premise load balancer.

An alternative load balancing option in AWS is open source software like HAproxy, or spinning up the equivalent of F5 etc. inside an AMI. The benefit of these approaches is that it more closely represents your internal set-up making hybrid load configuration easier. The downside is that it can be more expensive and more effort to set up. These alternatives are shown as greyed out above.

Storage

AWS storage has “unlimited” scale so businesses don’t need to worry about outlaying for big storage arrays every year.

For application and database storage in AWS, EBS Storage is really your only choice. AWS EBS IO performance can be poor and unpredictable though compared to on-premises storage options. For better performance you need to pay extra for provisioned IOPS. You can also create your own RAID volumes from EBS block devices but this can break EBS snapshots and lead to higher costs.

The most requested EBS feature is the ability to mount an EBS disk to more than one EC2 instance. On-premises infrastructure has always had the option of using shared storage as a way of creating clusters, transferring data and sharing configuration. This EBS constraint seems like a deliberate choice to force you to create EC2 instances that stand alone with no inter-dependencies, as a way of mandating/supporting the auto-scaling philosophy.

Databases

A highly available database hosted in AWS looks quite different to its in-house equivalent. One reason is that workloads in AWS are often web applications with a high read requirement. Databases in AWS are also built using EBS storage with its constraints on shared storage and AWS has been strongly influenced by MySQL and its replication methodology. Here is a basic architecture for a highly available on-premise and AWS database

AWS v Onprem DB

An AWS database can have multiple slave databases, which can also be used for database read operations to improve performance. Replication and the management of failover between nodes is scripted or manual. The storage replication occurs within the database tier so it may be slower. Transaction logs maintain database consistency.

In a typical in-house database there is expensive and complicated clustering software that provides well integrated availability. The data replication occurs within the storage tier, which should be faster and leverage existing storage assets. There is a single floating DNS/IP Address for the entire DB tier, which simplifies application set-up. There is no opportunity to get extra read performance from failover/standby servers though.

*******

There are other differences between AWS and in-house that I’ll cover in a follow-up blog. I’d be interested to know what you like or dislike about the different approaches to infrastructure and how it has affected your planning.


*******

Craig's original blog post can be found via this link