DevOps & SRE Essential Skills List for Engineers

Source: Internet
Author: User
Keywords sre devops devops sre sre vs devops
Guide
The article lists the basic technologies and essential skills of DevOps and SRE. You can use them as a checklist to evaluate yourself or others, or prepare for the next interview with DevOps/SRESite Reliability Engineers (site reliability engineer). (Additional, This list is a more personal idea.

What's next Learn more about the DevOps ecosystem:

  • First, be sure to understand the importance of cultural points: here (15 points DevOps checklist) can read more information.
  • You should master the *nix system and have a good understanding of how Linux distributions work.
  • Select an operating system for product settings. You don't need to master every operating system, it will make your work difficult, choose one and master it.
  • You can easily use the terminal, there may be some GUI to manage the server, but in any case, you must love the terminal, it is faster and safer, and frankly, once it is mastered, it will be easier to use.
  • How to get CPU/system information (cat /proc/version, /proc/cpuinfo, uptime, etc.)
  • How the cron job works. Set up a cron job on a specific date/time/month.
  • Understand what operating system is running on the machine (cat /etc/lsb-release)
  • Understand the differences between different *nix operating systems, and understand the operating system running on the machine (eg cat /etc/lsb-release)
  • The difference between shells: sh/dash/bash/ash/zsh
  • How to set and unset ENV variables. The exported ENV variables are temporary. How to export permanent variables?
  • What are shell configuration files: ~/.bashrc, .bash_profile, .environment .. How to "add" settings to the program initialization file?
  • To understand Vim, its configuration (.vimrc) and some basic tips are necessary.
  • How logging works on *nix systems, what is the logging level and how to use log management tools (rsyslog, logstash, fluentd, logwatch, awslogs...)
  • How does swapping work. What is swappiness. (Swapon -s, /proc/sys/vm/swappiness, sysctl vm.swappiness..)
  • Can easily use scripting language with ease. Bash is a must know (other scripting languages are also very useful, such as Python, Perl..).
  • Master useful commands, such as process monitoring commands (ps, top, htop, atop...), system performance commands (nmon, iostat, sar, vmstat...) and network troubleshooting and analysis (nmap, tcpdump, ping, traceroute, airmon, airodump ..).
  • What is your backup strategy? How to test the reliability of backup?
  • Do you know ext4, ntfs, fat? Do you know the Union File System (Union FS)?
  • How to view/set network configuration on the system?
  • How to set static/dynamic IP address on computers with different subnets? (Hint: CIDR)
  • Use network packet analysis to analyze and understand how the network works: tcpdump, Wireshark..
  • Are you familiar with the specifications of the OSI model and the TCP/IP model? What is the difference between TCP and UDP? Do you know vxlan?
  • How to set up firewalls (iptables, at least know ufw): set rules, list rules, route traffic, block protocols/ports..
  • How to check / set / backup router settings?
  • How does DNS work? How to set up DNS server (Bind, Unbound, PowerDNS, Dnsmasq...)? What is the difference between recursive and authoritative DNS? How to troubleshoot DNS (nslookup, dig...)
  • Familiar with DNS and A, AAAA, C, CNAME, TXT records
  • What happens when you click google.com in your browser? From the browser's cache, local DNS cache, local network configuration (hosts file), routing, DNS, network, and web protocols, the cache system to the web server (the most basic questions are difficult to answer if you analyze it in depth).
  • Familiar with CDN providers (fastly, Akamai, etc.)
  • Familiar with the working principle of SSL/TLS and the working principle of digital certificate (https)
  • Learn about SSL certificates (requires encryption)
  • Familiar with more secure protocols and tools: TLS, STARTTLS, SCP, SSH,SFTP, FTPS..
  • Understand the difference between PPTP, OpenVPN, L2TP/IPSec
  • Learn to set up record sets for domains (you can use hosted cloud services such as Route53 or CloudFlare)
  • How SSH works, how to debug it, how to generate ssh keys and log in to other computers without a password
  • What is the init system? You know Systemd (used by Ubuntu since 15.04 Use), Upstart (developed by Ubuntu), SysV ..
  • Compile it with the source code of any software (gcc, make and other related content)
  • How to compress/decompress files in different formats via terminal (mainly: tar/tar.gz)
  • How to set up a web server (Apache, Nginx...)
  • Learn to use "awk, sed, sort, uniq" to manipulate Nginx/Apache log files
  • What is the difference between Nginx and Apache? When to use Nginx? When to use Apache? When and how to use them in the same Web application?
  • How to set up a reverse proxy (Nginx..)
  • How to set up a cache server (Squid, Nginx, Varnish...)
  • How to set up a load balancer (HAproxy, Nginx...)
  • How to build an API gateway for your microservices (Ambassador, Kong, Traefik, Nginx...)
  • Familiar with Systemd and how to use systemctl and journalctl to analyze and manage services
  • Familiar with OAuth, SAML, Auth0 integration
  • Familiar with RESTful API, Webhooks, GraphQL, gRPC
  • Secure ES cluster (XPack (commercial), OpenSource: ReadOnlyREST, Search Guard)
  • ES snapshot (snapshot and incremental) using snapshot API or esdump (note: nodejs/npm is required)
  • Use DB backup
  • Learn Python (pip + setup.py) and BASH. Have you started using Golang as a scripting language? Try it out.
  • Develop cloud computing skills. Start by choosing a cloud infrastructure provider: Amazon Web Services, Google Cloud Platform, Digitalocean, Microsoft Azure. Or use OpenStack to create your own private cloud.
  • What about the staging server? What is the test strategy for unit testing? End to end? Do you really need a staging server? "Staging servers must die" under Google.
  • Read about PaaS/Iaas/Saas/CaaS/FaaS/DaaS and serverless architecture
  • Learn how to use and configure cloud resources through the Cloud Shell in the CLI or the Cloud SDK in your application
  • Learn how to use at least one configuration management and remote execution tool (Ansible, Puppet, SaltStack, Chef, etc.). Your choice should be based on the following criteria: syntax, performance, template language, push-pull model, performance, architecture, integration with other tools, scalability, usability, etc.
  • Packer for image building
  • Integrate Jenkins into CI/CD
  • Set up Consul (for service discovery)
  • Started research on "infrastructure as code" and infrastructure configuration automation tools such as Terraform and Packer
  • Start researching containers and Docker. Containers are the underlying architecture (cgroups and namespaces), how does it work?
  • Get familiar with basic Docker commands (logs/inspect/top/ps/rm). Also have to research docker hub (push/pull image)
  • Started research on container orchestration tools: Docker Swarm, Kubernetes, Mesosphere DC/OS, Alibaba Cloud ECS
  • Read about stateless and stateful applications
  • Learn to build a small docker image for your application (alpine is more appropriate). It is enough to install only the required packages.
  • Understand the most commonly used port numbers for running services by default (eg SSH (22), Web (80), HTTP/S (443), etc.
  • Learning the network from a distributed perspective (building a network in the container world). Take advantage of the 8 fallacies in a distributed system and let yourself cope easily.
  • Learn about L4/L7 load balancers.
  • Learn how to ensure the security of proxy servers and reverse proxy servers (Nginx, Traefik, Ambassador...) and understand how their network systems work.
  • Familiar with the tools that help create a distributable and portable development environment (for example: Vagrant and Docker).
  • Manage private information when deploying applications. Hashicorp Vault will help you.
  • If you are using Kubernetes, then understand all its components and work.
  • Learn how to deal with the built-in features of K8s first, and then learn Helm/Istio.
  • Understand the monitoring methods and content (from the perspective of the operating system and applications).
  • Once the appropriate stage is reached, Tracing will be needed to help understanding and mining, and the application needs to directly support it
  • If you are dealing with (big) data engineering related applications, you must be familiar with Hadoop, HBase, Zookeeper, Spark and how to set up related clusters
  • Learn how to set up and adjust Redis according to application requirements, and how to add authentication.
  • Understand the nature of the application: CPU-intensive, memory-intensive, I/O-intensive, and then understand how to deal with it accordingly.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.