Frage im Vorstellungsgespräch bei enclaive.io

Role expectations DevOps/DevOpsSec/CloudOps/GitOps Ability to write technical documentation: README, tutorials, installation and configuration guides. See docs.enclaive.cloud familiarity with Tech Stack Ops Terraform Ansible Docker Helm Linux Virtualization (KVM/QEMU, Proxmox, Libvirt, Openshift, Rancher) Kubernetes (Openshift, RKE, GKE, AKE, EKS) Cloud: AWS, GCP, Azure and On-premise Challenge: “Ship a Service — End-to-End CI/CD on Managed Kubernetes” Scenario Our team owns a small HTTP API (a simple /healthz endpoint is enough). You must: provision cloud infrastructure for a managed Kubernetes cluster, containerize and test the app, build a CI/CD pipeline that goes from commit → container → security checks → Helm deploy, implement safe rollout and rollback, add basic observability. You can choose one: Amazon EKS, Azure AKS, or Google GKE. Apply for a free tier, if you do not have an account. What we’re assessing Terraform fluency for cloud & k8s provisioning Ansible for configuring CI runners or build hosts (or image pre-bake) Docker image design & best practices Helm packaging & environment promotion CI/CD pipeline orchestration and quality gates Kubernetes rollout strategies & rollback Secrets & IAM hygiene Observability & troubleshooting Clean architecture, reproducibility, and documentation Requirements 1) Infrastructure (Terraform) Create a minimal, production-sensible K8s baseline on EKS/AKS/GKE: VPC/VNet with at least 2 subnets (multi-AZ / multi-zone). Managed node pool (or autopilot if GKE, but justify choice). Private container registry (ECR/ACR/Artifact Registry). IAM/role assignments tightly scoped for the CI job to push images & deploy with kubectl/helm. Outputs: kubeconfig (securely handled), registry URL, and cluster name. Include a destroy path and document cost-guardrails (e.g., small node sizes, TTL labels). 2) Build Host / Runner (Ansible) Use Ansible to configure a self-hosted CI runner VM or to build a reusable image that includes: docker/buildx, kubectl, helm, Terraform, and your cloud CLI. Login to registry via OIDC or short-lived credentials. Provide an Ansible playbook and inventory (local, cloud, or containerized runner). Idempotence matters. 3) Application (Docker) A tiny HTTP service (any language) with: /healthz returns 200 and some JSON payload. Payload contains the output of a system variable, e.g. SYS_ENV=helloworld Dockerfile must: Use multi-stage builds Run as non-root. Set the environment SYS_ENV=helloworld Set a minimal base and sensible HEALTHCHECK. Tag images with app: and app:main (or :latest for dev only if justified). 4) Helm Deployment Create a Helm chart charts/app with: Configurable replicas, resources, liveness/readiness probes. values.dev.yaml and values.prod.yaml. Ingress (or Gateway) + Service. HorizontalPodAutoscaler (HPA) based on CPU (and optionally RPS/custom metrics if you like). Implement a safe rollout strategy: Pick one: rolling update with surge/unavailable limits, or canary/blue-green (Argo Rollouts acceptable, but keep it simple and documented). Provide an automated rollback step triggered when health checks fail. 5) CI/CD Pipeline Use GitHub Actions. Pipeline should include: On Pull Request to main: Lint & test app. Docker build (no push), Trivy image scan (fail on high/critical). Terraform fmt/validate/plan (no apply). Helm lint and chart unit tests (helm-unittest or chart-testing). IaC security scan (e.g., Checkov or tfsec) with non-zero exit for high issues. On Merge to main: Build & push image to registry with tags : and :main. Terraform apply to ensure infra is reconciled. Deploy to dev using Helm with values.dev.yaml. Post-deploy smoke test: hit /healthz via a job or script; fail pipeline if non-200. Promotion to prod (manual approval job): Deploy to prod with values.prod.yaml. Apply rollout strategy; verify health checks. If failure, automated rollback to previous release. Artifacts & reporting: Upload SBOM (e.g., Syft) and scan results. Publish deployment summary with image tag, chart version, and links/logs. 6) Secrets & IAM Use cloud-native secret storage HashiCorp Vault for CI In cluster, mount secrets as env vars, specifically SYS_ENV=helloworld, or files via ExternalSecrets (bonus) or native Secret objects encrypted at rest (explain trade-offs). Prefer OIDC-based auth for CI to cloud (no long-lived keys). 7) Observability Expose Prometheus-style metrics endpoint in app (even a counter is fine). Install minimal metrics stack: Option A: kube-state-metrics + Prometheus (can be lightweight). Option B: Cloud-native managed metrics (e.g., CloudWatch metrics for EKS). Add basic logging guidance (e.g., structured logs; rely on cloud logs). Provide a simple dashboard or kubectl query recipe to validate app health & HPA behavior. Deliverables Repository with this structure: ├─ app/ │ ├─ src/… # simple HTTP server │ ├─ tests/… # unit tests │ ├─ Dockerfile │ └─ README.md # how to run locally ├─ charts/ │ └─ app/… # Helm chart + values.dev.yaml + values.prod.yaml ├─ infra/ │ ├─ terraform/ │ │ ├─ main.tf # providers, cluster, node pool, registry, IAM │ │ ├─ variables.tf │ │ ├─ outputs.tf │ │ └─ README.md │ └─ ansible/ │ ├─ inventories/ │ ├─ roles/ │ ├─ playbooks/runner.yml │ └─ README.md ├─ .github/workflows/ or .gitlab-ci.yml ├─ SECURITY.md # threat model & hardening notes ├─ OPERATIONS.md # runbooks: deploy, rollback, debugging, cleanup └─ README.md # top-level overview & quickstart Docs to include: README.md: cloud chosen, prerequisites, how to run CI locally, how to authenticate, high-level flow diagram. OPERATIONS.md: Deploy: dev→prod. Rollback: helm history/rollback or Argo Rollouts revert. Troubleshooting: common kubectl commands, logs, events. Cleanup: terraform destroy order & caveats. SECURITY.md: IAM roles/policies overview; why least-privilege is sufficient. Secrets approach & rotation story. Supply-chain controls: SBOM, image/IaC scanning, provenance (bonus: cosign). Tutorial.md: Topic: Automating the deployment and integration of a web service in GKE/AKE/EKS Kubernetes Cluster Write a tutorial for/in the style of docs.enclaive.cloud. Use markup. Success Criteria (Scoring Rubric, 120 pts) Terraform (20 pts) Correct cluster, registry, and IAM (10) Variables, modules, state handling, and destroy path (6) Cost-aware and documented (4) Ansible (10 pts) Idempotent runner setup / golden image (6) Clear inventory & docs (4) Docker (15 pts) Multi-stage, minimal, non-root, healthcheck (8) Tagged images & caching strategy (4) Unit tests wired into build (3) Helm (20 pts) Clean chart, sensible values, probes, resources (10) HPA and ingress (6) Rollout strategy implemented (4) CI/CD (25 pts) PR checks: tests, lint, scans, plan (10) Main: build/push, deploy dev, smoke test (8) Manual prod gate + rollback automation (7) Security & Observability (10 pts) OIDC or short-lived creds; secrets managed properly (5) Metrics/logging accessible; basic dashboard or commands (5) Docs & Tutorial (20) Clear READMEs, runbooks, diagrams (+6) Tutorial (+14)