Coding with Lilla: Exam prep 2

Chapter one was “Introduction to the Google Professional Cloud Architect Exam”. It felt more like a high-level description of the topics of the exam, as the introduction itself already happened at the beginning of the book.

The very first page of the chapter started with a list of all the aspects of software architecting for the cloud.

I was hoping to read about each one of these separately, which did not happen, but they were present in the other topics that came up.

The first is analyzing the business requirements. They are usually coming from non-technical people, and require a big-picture mindset. It should be the very first step in planning and analysis, as these are the closest to how the end customers perceive the piece of software. Therefore, they do include SLOs, but can also be related to speed of development, compliance and capital expenditure. Which is basically the long-term investment into machines, plants, buildings etc.

The greatest benefit of moving an application to the cloud is the reduction of operational expenses. On GCP, it is very cheap to have your infrastructure in multiple physical data centres, even on different sides of the world. That means fewer people plugging in cables and having to press the On button when someone accidentally “sudo shutdown now”s. It also means having the option to delegate management of some services (like database or queueing) to GCP. Or using only the computing resources that are needed by having autoscaling, or allowing preemptible VMs (VMs that can be deleted after 24 hours if the resource is needed by GCP) – all of which means saving a huge part of the operational expenses.

Another business requirement could be accelerating development. The cloud architect should be familiar with agile methodologies, but also how managed services, CI/CD and microservices could support developers in pushing out new features and bug fixes faster. Of course, all these things have to be put into context. Sometimes it is better to keep managing a low-effort service and have the engineers focus on new features or making the existing ones leaner and stronger. Same with microservices. If there is pressure on moving to the cloud because the current infrastructure cannot support the user base properly, it is also better to “lift and shift”, overspending double the amount of time on rewriting everything to fit into GCP.

Reporting on SLOs (service-level objectives) is about monitoring the application and making sure it does the right level of what the end-users are expecting. And the most general things users are usually expecting are availability (is the service reachable and does it do what I want it to do?), reliability (does the service work well with higher load, or do timeouts pop up?), scalability (is the service able to serve a higher user base when needed, but also save resources when the users are not online?) and durability (how probable is the accessibility of the stored data after some time?). These four appear also as nonfunctional requirements later on.

Reducing the time to recover from an incident is also an important business requirement. The book defines incidents as a disruption of the service that causes degradation or unavailability. When there is an outage, usually, the users cannot consume the service. There’s not just this direct impact that leads to loss of revenue, but if a service does not come back up fast, there is also a chance of bad press (loss of reputation), loss of users, and loss of data. Setting up proper monitoring, alerting and logging can optimize the time it takes to notice that something is wrong, and, in some cases, it can also prevent incidents.

Compliance means respecting the legal regulations and protecting users as best as possible. There are five industry regulations mentioned:

HIPAA (health)
COPPA (children’s privacy)
SOX (financial)
PCI (credit card payments)
GDPR (privacy)

Technical requirements can be broken down into functional and non-functional requirements. The non-functional ones should be measured by SLIs (metrics for SLOs, the SRE book has more info on them).

For technical requirements, there are mainly three areas:

Compute – virtual machines and containers (Compute Engine, App Engine, Kubernetes Engine)
Storage – SQL (Cloud SQL, Spanner, BigQuery), NoSQL (Bigtable, Cloud Datastore) and archival storage or buckets in Cloud Storage
Networks – VPCs and hybrid cloud (when the company network has to be connected to a network in the Google datacenters)

This chapter was actually really good at refreshing my memory on storage solutions and giving an overview of what technical things will be needed. It also had the case studies in there, but two of them were missing that I had to look up online, and one was taken out this year. I am planning to read them again next time when I sit down for chapter two. This time I got 14/15 on the review questions at the end of the chapter. The only thing I got wrong was related to a specific business requirement. I have to make sure to read the question twice and understand the circumstances described better.

Coding with Lilla

Sunday, December 3, 2023

Exam prep 2

No comments:

Post a Comment

Building Resilience

Report Abuse