This website collects cookies to deliver better user experience
Cloud computing quickstart for data engineering
Cloud computing quickstart for data engineering
What
Cloud computing is the use of a network of remote servers hosted on the internet to store, manage and process data.
no need to invest in hardware upfront
rapid provisioning of resources
provides efficient global access through deployments in different regions.
Cloud providers are Amazon, Microsoft, Google, Alibaba, Oracle and IBM. As Amazon is the biggest one, we are going to get an overview to get the basics needed for data engineering.
AWS - Amazon Web Services
AWS offers more than 140 services for computation, storage, databases, networking and development tools.
The services can be accessed in 3 ways:
SDK's: https://aws.amazon.com/tools/ - Software development kits. Available in a lot of programming languages. The advantage of using IaC - Infrastructure as code are sharing, reproducibility, multiple deployments and maintainability. For development with python we can use the famous boto3.
As there are over hundred services available, you might be overwhelmed at first sight. In order to make the start a bit easier we create a glossary with the services you will need for data engineering and the according links to their documentation. As there are a lot more services than the ones mentioned below, feel free to dive deeper into the AWS documentation here.
A web service that provides secure, resizable compute capacity in the cloud. If we want to use the cloud self-managed we can use EC2 + Postgresql, EC2 + Unix FS instead of Amazon RDS or Amazon DynamoDB and Amazon S3.