AWS Well-Architected Review

Ryan Rafferty
10 min readDec 10, 2020

Wherever you are in your cloud journey, it is important to always ensure that your workloads are running optimally. There are many factors that contribute to a well architected environment. AWS offer a bootcamp that explain what is required to have a well architected environment on AWS. This article will share with you some of the things I’ve learnt from attending the bootcamp. Please read on to learn more.

The Well-Architected Bootcamp aims to focus on a number of key areas that set out to provide benefits for customers through understanding and adopting the best practices in building an AWS platform. The objective of this course is to focus on the 5 pillars of AWS, laying out a framework that matches your enterprise AWS workloads. The result of this is that it allows you to make informed business decisions that let you gain huge benefits in bringing your technology posture & environments up to par.

AWS Well-Architected is not a one-time set and forget task but instead it is a framework which you continually use throughout the entire lifecycle of your applications. You can use it before you start designing your applications and workloads or in your live production environment. By using it continually, it will align with new AWS service capabilities and it just becomes part of the framework for operating your workloads within AWS.

The 5 Pillars & its value

The 5 pillars are Operational Excellence, Security, Reliability, Performance, Efficiency and Cost Optimisation. These are the metrics that you should use to review and understand the performance of all your AWS workloads.

Most customers assume that their AWS set-up is architected and designed well. However, once we start running through a deeper assessment of their AWS infrastructure customers always often find areas of improvements through these 5 pillars. Taking a look through each of all five pillars will help you with the following:

  • Gain more understanding and awareness of how your workloads are configured
  • How your workloads are positioned
  • What risks have been identified
  • How often they’ve been reviewed
  • What the remediation of those are, and
  • What improvements can be been made to those over time

Design Principles

One of the biggest advantages of adopting the Well-Architected framework is leveraging the design principles of Capacity, Testing, Automation, Evolutionary Architecture, Data Architecture and Game Days.

If we focused our attention to the Automation aspect of your current IT environment, you may have a team or perhaps there is an individual member of the team that responds to security events manually (let’s say there’s a need to respond at 1AM). Sometimes this manual effort can result in errors, therefore in this case when we think about the design principle of Automation, we can put in place & replicate this system at a low-cost price to automate the manual effort. Services like SNS and CloudWatch Events can automate system responses and trigger self-healing remediation events.

When you think of general design principles and look at all the workloads you initially implemented and are now running, looking at Well-Architected allows you to really consider the fact your capacity now is very different to what it once was. AWS services have virtually infinite capacity, and have the ability to scale anytime on demand, so you no longer need to guess or estimate your capacity needs. Thinking about these design principles and your resources means you can spin up and down services very quickly more easily.

Reliability Pillar

Thinking about this pillar specifically looks at failure management and ensuring systems are architected to be designed to withstand failure along with being self-healing. During a Well-Architected review, the specific question will ask “How is your system designed to withstand failures now”? This question is intended to address that your environments and workloads are designed to withstand failures. If you considered when you first implemented your applications, features or workloads it is usually not about identifying the bad decisions existing in complex systems but rather the decisions that had not been made. For example, in your cloud environment, when deploying new services, during the time of implementation, someone may have chosen IP addresses as much as possible, not taking into consideration the fact that resources will normally have a new IP address if it is rebooted or replaced. This is the aim of the Well-Architected, to give you that awareness of how you can run your workloads more reliably.

If there is also one thing that is constant it is change. When failures happen and you don’t have a backup plan it would indicate that testing was not something taken into account. Before the failures took place, implementing automation wherever possible, is crucial when it comes to recovering from failures. By using the Well-Architected framework, it helps you understand where your failure points are and building scripted responses will enable your workloads to recover automatically.

Operational Excellence Pillar

There are so many things to consider once you are operating workloads in the cloud. Most commonly, you may have spent more time initially in the operational end of your workload lifecycle rather than the automation pillar. Not considering automation and putting everything as code to test and execute consistently means you will not benefit from faster deployments, things like run books make it easier than annotating your documentation. By making small, frequent and reversible changes it will be easier to roll back during a deployment and having this ability will make it easier to prevent downtime.

By using the Well-Architected framework, it allows you to think and plan ahead of time to anticipate failure.

Security Pillar

When it came to security best practices there are immediate things you can do such as making sure you have a strong identity foundation in the first place and by having a well thought out credential an identitymanagement plan. Traceability is also very important and being able to see every action, being able to understand why and how that action was implemented and if the change or action was legitimate or not.

Another aspect was around end to end encryption, both at rest and in transit which is a core principle for security. When it comes to automating security, nobody wants to get that 2AM wakeup call and a manual action or response to a breach of information having to be carried out.

By considering strong identity foundations and following principles of least privilege can ensure security access to your data is controlled.

Preparing ahead of time for security incidents and testing your recovery procedures was definitely key in this area.

Performance & Efficiency Pillar

Scaling out with AWS is quite easy. However, if not managed well — costs can blow out very quickly. Consider using serverless architecture which comes with immediate benefits that will reduce a lot of effort and save cost that you can invest back into the business.

By focussing on performance efficiency look at workloads that could benefit by addressing constraints and allow you to free up more time and reduce cost. By using the right technology for the right application means you can do what you need, measure and iterate efficiently.

Cost Optimisation Pillar

AWS want you to save money in the cloud, and when you stop treating the cloud as a data centre and instead something that operates as a consumption model you can think about how your AWS environment works. Take into consideration your expected use along with a thorough analysis on your attributed expenditure and you can start to get real insight into which things are costing and what is actually being used to deliver business value.

By carrying out a Well-Architected review of your enterprise you can reduce cost of your AWS bill per month. Through the analysis you also gain insight into some of the excellent services that you can take advantage of, such as spot instances and serverless technologies that scale big and are cheaper to run.

The Intent of The Reviews Themselves?

Being Well-Architected is not about exposing things you are doing wrong; it is not a finger pointing exercise and the most important part to understand is that reviews are NOT audits. It’s not a test with a hard pass or fail result, it’s a means of having a conversation with you about your architecture and understanding where your risks are. It can help you expose risks in your environment you were not aware, focus and prioritise. It’s about understanding proven best practices and being able to measure whether or not you are aligned to them.

Well-Architected reviews are also not something that is an “unachievable perfect future state” it’s taking all the experience AWS have learned from customers over the years. AWS approach Well-Architected the same way they are designing services which are 90–95% of features that are based on what customers are asking for which really helps to drive the understanding in what’s actually working for customers ultimately to help you.

Operational Readiness

This is very useful at reducing the amount of downtime or cost you may not be aware of. One of the best reasons to do a review on Operational Readiness is if you are aware of risks in your environment and can identify these problem areas before you launch an application or new feature into production. When customers are asking how they know if they are ready to launch into production, the Well-Architected can be used for ensuring governance and best practice. Risks that have also been identified can be allocated to a series of sprints, allocated and worked on over time to make the necessary improvements to reduce risks identified to make your workloads better.

Automating & Experimenting Easier

Quite often in your enterprise you will have a wide and vast range of AWS resources running in one or more environments. There may be many services running of which you are not aware. Thinking about automation and treating everything as code, allowing events that happen in your enterprise to trigger alerts automatically which can self-heal will reduce overhead and get your product out to the market faster, which ultimately provides better business value.

The Well-Architected Tool — A Journey to Learn, Measure and Improve

If you are currently thinking about building in the cloud you will be able to reduce the risk of things going wrong before you go into production. A great analogy I heard recently is let’s say you drove past a construction site and you see building apartment development starting, you observe the foundations and structures. If you can imagine the planning and the hours that had been spent on ensuring the foundational aspects were right before you can see the resulting buildings going up. In essence it is in building this concrete foundation which is what the Well-Architected tool will help you achieve, it wants you to have a solid foundation, to stay strong and secure both your current cloud workloads and planning for new workloads to maximise your business value for your technology portfolio.

By recognising risks and helping you address and mitigate the findings in your environment means you can build and deploy faster freeing up capacity in your organisations. It allows you to think about capacity management, security issues & automation.

Within the AWS Management console, you can find a self-serve option for the Well-Architected tool. You can use the tool to run through the questions and provide feedback about your current workloads. It is designed to make use of how you answer the questions, by breaking them down into a couple of areas — the tool itself and data that is derived from the tool. If you decide to use the self-service tool and do the Well-Architected review yourself, it is recommended to carry this out with both your business leadership and technical leads as there will need to be alignment between both.

During the feedback stage of the self-service tool in the console you will notice within each region the Well-Architected Tool resides for compliance. This secures your information and captures notes within the region you specify. You can also skip questions that are not applicable to you or your workloads or prioritise whichever pillar is of most importance to you; at the end of the questionnaire it won’t show any risks associated with the ones you skip.

By having the ability to document where you’re at in your enterprise is very valuable & it’s important to see how you’re stacking up against best practices for each workload.

If you choose to do a Well-Architected review but would prefer to do it with an AWS Partner, There are various partners to choose from which will walk you through your own console whilst having meaningful discussions about each of the specific questions along with best practice guidance.

If you are about to go live with a new capability or introducing a new feature or looking to reduce or mitigating risk and would prefer to get assistance with the review, definitely feel free to reach out to me.

Engaging with AWS partners is free, and what comes out of the review stage at the end is usually a statement of work that runs through how you can address the risks that have been identified along with areas that would need to be improved. You also have the choice if you are already on Enterprise support to have your Technical Account Manager facilitate AWS to help carry this out.

Another notable point is that because AWS want to see you succeed, if you did want to use any partner in the network, after the initial review and approved statement of work, if 25% of high risk issues are remediated in production, you as the customer get $5000 USD credited to your account, this is also per workload. So, let’s say you have 5 workloads in production that have a security issue or failure point of some sort which have each been resolved, this means you get $5K per workload equalling a total 25K in savings. So not only do you benefit from cost saving but also an improvement in your technology posture. This is a great benefit of having it done by a partner.

If you have any other experience in Well-Architected reviews or improvements, please share! Thanks for reading.

Summary

  • The Well-Architected reviews are not audits
  • The reviews are time boxed and you as a customer are made aware of the time expectation
  • The reviews can run in 2 sessions
  • There is a statement of work involved but you are under no obligation to commit
  • Reviews are specific to the pillar areas
  • If you feel there is part of the review not needed it is ok to leave it out.
  • High Risk Issue is not a best practice discovered during review and could impact, revenue and reputation negatively
  • Workload is a collection of resources and code, consists of a subnet in a single AWS account or multiple resources spanning multiple AWS accounts
  • Well-Architected Tool helps you review and compare the latest in best practices
  • Lenses are a unique set of questions based on common type of workload

--

--