In this post, I will go over two chapters: one on technical requirements and the other on the Compute services.
Designing for Technical Requirements

In this chapter, there were mainly three broader categories of technical requirements discussed: high availability, scalability and reliability.
Availability is the “continuous operation of a system at sufficient capacity to meet the demands of ongoing workloads”, and is measured in percentages. What this means in practice is that requests coming from clients should be responded to promptly. Many types of failures affect the availability of a service, starting from bugs introduced by developers to DNS server hiccups.
For Compute services, Google already takes responsibility for the availability of infrastructure in most of the places. They are not responsible for bugs resulting from coding errors, but in case there’s a physical server issue, they do perform live migration of the app. There are still some things that the Cloud Architect or DevOps engineer has to take care of: like making sure to use redundant resources, and use IaC. The more managed the service is (Compute Engine -> Kubernetes Engine -> App Engine), the less there is a need to care about compute availability.
Storage services on GCP can be set up to have automatic backups, run in different zones (which means different buildings in the same DC), so Google is already taking care of most of the availability settings. Even for persistent disks, there’s a minimal effort required to make them as available as needed, by providing a redundant copy or resizing them to fit increasing file sizes.
With networks, two things have to be mentioned for availability: redundant connections and Premium Tier networking.
Scalability is the ability of software to grow (or shrink) based on incoming traffic. There’s horizontal scalability – which basically means being able to deploy another instance of the service to make it work with higher loads, and there is vertical scalability – meaning increasing already existing disks and CPU numbers to be able to respond to more requests. Managed services often offer auto-scaling, and Kubernetes also has a configurable auto-scaler object type. Managed instance groups can also scale by increasing or decreasing the number of instances in them.
Reliability is covered by most of the availability practices. Redundancy and following DevOps best practices are important for reliability. SRE practices make sure that monitoring, alerting, incident response and post-mortems are properly set up.
Designing Compute Systems

In this chapter, there were four GCP offerings discussed in more detail. I have heard from colleagues that Anthos has recently been added to the curriculum, so I will definitely have to look into that more.
The four services were Compute Engine, App Engine, Kubernetes Engine and Cloud Functions. I have a few years of experience with both Compute and Kubernetes and at least one year with Cloud Functions.
Compute Engine is an Infrastructure-as-a-Service offering, giving out virtual machines running in different data centres all around the world. There are numerous machine types, each fitting different use cases, some with more and different vCPUs, some with more memory, and there are also custom/configurable types. Service Accounts are used by software not have an account, but still needs permissions for operations on GCP. They can and should be associated with VMs. Persistent disks are making sure that data is surviving the virtual machine, and the key management system to secure the app better. Shielded VMs should be used when security is extremely important. Instance groups can be managed (starting from a template) and unmanaged (mostly for load balancing).
Kubernetes Engine is the managed Kubernetes offering on GCP. It lets you run your own Kubernetes cluster, with very little need for configuration.
App Engine is a Platform-as-a-Service offering. It can run application code in a container without needing to configure the underlying infrastructure. It can be Standard (supporting only certain languages) and Flexible – being more general.
Cloud Functions are handy when a piece of code needs to be run after something happens. In CF there are functions triggered by events. These events can be happening in other GCP-managed services, and they can also be configured with webhooks. Even Stackdriver logs can trigger a cloud function via Pub/Sub, the messaging service.
It is highly advised to use all this with some sort of Infastructure-as-Code – the Deployment manager is perfect for it.
I did do the tests at the end of both chapters and although I did make mistakes, I feel like I am going the right way. I’ve been thinking about signing up soon for the exam, but first I want to finish the book. I do have a date in my mind, in any case.
No comments:
Post a Comment