Role expectations

DevOps/DevOpsSec/CloudOps/GitOps
Ability to write technical documentation: README, tutorials, installation and configuration guides. See docs.enclaive.cloud
familiarity with Tech Stack
Ops
Terraform
Ansible
Docker
Helm
Linux
Virtualization (KVM/QEMU, Proxmox, Libvirt, Openshift, Rancher)
Kubernetes (Openshift, RKE, GKE, AKE, EKS)
Cloud: AWS, GCP, Azure and On-premise

Challenge: “Ship a Service — End-to-End CI/CD on Managed Kubernetes”
Scenario
Our team owns a small HTTP API (a simple /healthz endpoint is enough). You must:
provision cloud infrastructure for a managed Kubernetes cluster,

containerize and test the app,

build a CI/CD pipeline that goes from commit → container → security checks → Helm deploy,

implement safe rollout and rollback,

add basic observability.

You can choose one: Amazon EKS, Azure AKS, or Google GKE. Apply for a free tier, if you do not have an account.
What we’re assessing
Terraform fluency for cloud & k8s provisioning

Ansible for configuring CI runners or build hosts (or image pre-bake)

Docker image design & best practices

Helm packaging & environment promotion

CI/CD pipeline orchestration and quality gates

Kubernetes rollout strategies & rollback

Secrets & IAM hygiene

Observability & troubleshooting

Clean architecture, reproducibility, and documentation

Requirements
1) Infrastructure (Terraform)
Create a minimal, production-sensible K8s baseline on EKS/AKS/GKE:

VPC/VNet with at least 2 subnets (multi-AZ / multi-zone).

Managed node pool (or autopilot if GKE, but justify choice).

Private container registry (ECR/ACR/Artifact Registry).

IAM/role assignments tightly scoped for the CI job to push images & deploy with kubectl/helm.

Outputs: kubeconfig (securely handled), registry URL, and cluster name.

Include a destroy path and document cost-guardrails (e.g., small node sizes, TTL labels).

2) Build Host / Runner (Ansible)
Use Ansible to configure a self-hosted CI runner VM or to build a reusable image that includes:

docker/buildx, kubectl, helm, Terraform, and your cloud CLI.

Login to registry via OIDC or short-lived credentials.

Provide an Ansible playbook and inventory (local, cloud, or containerized runner).

Idempotence matters.

3) Application (Docker)
A tiny HTTP service (any language) with:

/healthz returns 200 and some JSON payload. Payload contains the output of a system variable, e.g. SYS_ENV=helloworld

Dockerfile must:

Use multi-stage builds
Run as non-root.
Set the environment SYS_ENV=helloworld
Set a minimal base and sensible HEALTHCHECK.

Tag images with app: and app:main (or :latest for dev only if justified).

4) Helm Deployment
Create a Helm chart charts/app with:

Configurable replicas, resources, liveness/readiness probes.

values.dev.yaml and values.prod.yaml.

Ingress (or Gateway) + Service.

HorizontalPodAutoscaler (HPA) based on CPU (and optionally RPS/custom metrics if you like).

Implement a safe rollout strategy:

Pick one: rolling update with surge/unavailable limits, or canary/blue-green (Argo Rollouts acceptable, but keep it simple and documented).

Provide an automated rollback step triggered when health checks fail.

5) CI/CD Pipeline
Use GitHub Actions. Pipeline should include:
On Pull Request to main:
Lint & test app.

Docker build (no push), Trivy image scan (fail on high/critical).

Terraform fmt/validate/plan (no apply).

Helm lint and chart unit tests (helm-unittest or chart-testing).

IaC security scan (e.g., Checkov or tfsec) with non-zero exit for high issues.

On Merge to main:
Build & push image to registry with tags : and :main.

Terraform apply to ensure infra is reconciled.

Deploy to dev using Helm with values.dev.yaml.

Post-deploy smoke test: hit /healthz via a job or script; fail pipeline if non-200.

Promotion to prod (manual approval job):
Deploy to prod with values.prod.yaml.

Apply rollout strategy; verify health checks.

If failure, automated rollback to previous release.

Artifacts & reporting:
Upload SBOM (e.g., Syft) and scan results.

Publish deployment summary with image tag, chart version, and links/logs.

6) Secrets & IAM
Use cloud-native secret storage HashiCorp Vault for CI

In cluster, mount secrets as env vars, specifically SYS_ENV=helloworld, or files via ExternalSecrets (bonus) or native Secret objects encrypted at rest (explain trade-offs).

Prefer OIDC-based auth for CI to cloud (no long-lived keys).

7) Observability
Expose Prometheus-style metrics endpoint in app (even a counter is fine).

Install minimal metrics stack:

Option A: kube-state-metrics + Prometheus (can be lightweight).

Option B: Cloud-native managed metrics (e.g., CloudWatch metrics for EKS).

Add basic logging guidance (e.g., structured logs; rely on cloud logs).

Provide a simple dashboard or kubectl query recipe to validate app health & HPA behavior.

Deliverables
Repository with this structure:
├─ app/
│ ├─ src/… # simple HTTP server
│ ├─ tests/… # unit tests
│ ├─ Dockerfile
│ └─ README.md # how to run locally
├─ charts/
│ └─ app/… # Helm chart + values.dev.yaml + values.prod.yaml
├─ infra/
│ ├─ terraform/
│ │ ├─ main.tf # providers, cluster, node pool, registry, IAM
│ │ ├─ variables.tf
│ │ ├─ outputs.tf
│ │ └─ README.md
│ └─ ansible/
│ ├─ inventories/
│ ├─ roles/
│ ├─ playbooks/runner.yml
│ └─ README.md
├─ .github/workflows/ or .gitlab-ci.yml
├─ SECURITY.md # threat model & hardening notes
├─ OPERATIONS.md # runbooks: deploy, rollback, debugging, cleanup
└─ README.md # top-level overview & quickstart

Docs to include:

README.md: cloud chosen, prerequisites, how to run CI locally, how to authenticate, high-level flow diagram.

OPERATIONS.md:

Deploy: dev→prod.

Rollback: helm history/rollback or Argo Rollouts revert.

Troubleshooting: common kubectl commands, logs, events.

Cleanup: terraform destroy order & caveats.

SECURITY.md:

IAM roles/policies overview; why least-privilege is sufficient.

Secrets approach & rotation story.

Supply-chain controls: SBOM, image/IaC scanning, provenance (bonus: cosign).

Tutorial.md:
Topic: Automating the deployment and integration of a web service in GKE/AKE/EKS Kubernetes Cluster
Write a tutorial for/in the style of docs.enclaive.cloud. Use markup.

Success Criteria (Scoring Rubric, 120 pts)
Terraform (20 pts)
Correct cluster, registry, and IAM (10)

Variables, modules, state handling, and destroy path (6)

Cost-aware and documented (4)

Ansible (10 pts)
Idempotent runner setup / golden image (6)

Clear inventory & docs (4)

Docker (15 pts)
Multi-stage, minimal, non-root, healthcheck (8)

Tagged images & caching strategy (4)

Unit tests wired into build (3)

Helm (20 pts)
Clean chart, sensible values, probes, resources (10)

HPA and ingress (6)

Rollout strategy implemented (4)

CI/CD (25 pts)
PR checks: tests, lint, scans, plan (10)

Main: build/push, deploy dev, smoke test (8)

Manual prod gate + rollback automation (7)

Security & Observability (10 pts)
OIDC or short-lived creds; secrets managed properly (5)

Metrics/logging accessible; basic dashboard or commands (5)

Docs & Tutorial (20)
Clear READMEs, runbooks, diagrams (+6)

Tutorial (+14)

Question

Role expectations

DevOps/DevOpsSec/CloudOps/GitOps
Ability to write technical documentation: README, tutorials, installation and configuration guides. See docs.enclaive.cloud
familiarity with Tech Stack
Ops
Terraform
Ansible
Docker
Helm
Linux
Virtualization (KVM/QEMU, Proxmox, Libvirt, Openshift, Rancher)
Kubernetes (Openshift, RKE, GKE, AKE, EKS)
Cloud: AWS, GCP, Azure and On-premise

Challenge: “Ship a Service — End-to-End CI/CD on Managed Kubernetes”
Scenario
Our team owns a small HTTP API (a simple /healthz endpoint is enough). You must:
provision cloud infrastructure for a managed Kubernetes cluster,

containerize and test the app,

build a CI/CD pipeline that goes from commit → container → security checks → Helm deploy,

implement safe rollout and rollback,

add basic observability.

You can choose one: Amazon EKS, Azure AKS, or Google GKE. Apply for a free tier, if you do not have an account. 
What we’re assessing
Terraform fluency for cloud & k8s provisioning

Ansible for configuring CI runners or build hosts (or image pre-bake)

Docker image design & best practices

Helm packaging & environment promotion

CI/CD pipeline orchestration and quality gates

Kubernetes rollout strategies & rollback

Secrets & IAM hygiene

Observability & troubleshooting

Clean architecture, reproducibility, and documentation

Requirements
1) Infrastructure (Terraform)
Create a minimal, production-sensible K8s baseline on EKS/AKS/GKE:

VPC/VNet with at least 2 subnets (multi-AZ / multi-zone).

Managed node pool (or autopilot if GKE, but justify choice).

Private container registry (ECR/ACR/Artifact Registry).

IAM/role assignments tightly scoped for the CI job to push images & deploy with kubectl/helm.

Outputs: kubeconfig (securely handled), registry URL, and cluster name.

Include a destroy path and document cost-guardrails (e.g., small node sizes, TTL labels).

2) Build Host / Runner (Ansible)
Use Ansible to configure a self-hosted CI runner VM or to build a reusable image that includes:

docker/buildx, kubectl, helm, Terraform, and your cloud CLI.

Login to registry via OIDC or short-lived credentials.

Provide an Ansible playbook and inventory (local, cloud, or containerized runner).

Idempotence matters.

3) Application (Docker)
A tiny HTTP service (any language) with:

/healthz returns 200 and some JSON payload. Payload contains the output of a system variable, e.g. SYS_ENV=helloworld

Dockerfile must:

Use multi-stage builds
Run as non-root.
Set the environment SYS_ENV=helloworld
Set a minimal base and sensible HEALTHCHECK.

Tag images with app: and app:main (or :latest for dev only if justified).

4) Helm Deployment
Create a Helm chart charts/app with:

Configurable replicas, resources, liveness/readiness probes.

values.dev.yaml and values.prod.yaml.

Ingress (or Gateway) + Service.

HorizontalPodAutoscaler (HPA) based on CPU (and optionally RPS/custom metrics if you like).

Implement a safe rollout strategy:

Pick one: rolling update with surge/unavailable limits, or canary/blue-green (Argo Rollouts acceptable, but keep it simple and documented).

Provide an automated rollback step triggered when health checks fail.

5) CI/CD Pipeline
Use GitHub Actions. Pipeline should include:
On Pull Request to main:
Lint & test app.

Docker build (no push), Trivy image scan (fail on high/critical).

Terraform fmt/validate/plan (no apply).

Helm lint and chart unit tests (helm-unittest or chart-testing).

IaC security scan (e.g., Checkov or tfsec) with non-zero exit for high issues.

On Merge to main:
Build & push image to registry with tags : and :main.

Terraform apply to ensure infra is reconciled.

Deploy to dev using Helm with values.dev.yaml.

Post-deploy smoke test: hit /healthz via a job or script; fail pipeline if non-200.

Promotion to prod (manual approval job):
Deploy to prod with values.prod.yaml.

Apply rollout strategy; verify health checks.

If failure, automated rollback to previous release.

Artifacts & reporting:
Upload SBOM (e.g., Syft) and scan results.

Publish deployment summary with image tag, chart version, and links/logs.

6) Secrets & IAM
Use cloud-native secret storage HashiCorp Vault for CI

In cluster, mount secrets as env vars, specifically SYS_ENV=helloworld, or files via ExternalSecrets (bonus) or native Secret objects encrypted at rest (explain trade-offs).

Prefer OIDC-based auth for CI to cloud (no long-lived keys).

7) Observability
Expose Prometheus-style metrics endpoint in app (even a counter is fine).

Install minimal metrics stack:

Option A: kube-state-metrics + Prometheus (can be lightweight).

Option B: Cloud-native managed metrics (e.g., CloudWatch metrics for EKS).

Add basic logging guidance (e.g., structured logs; rely on cloud logs).

Provide a simple dashboard or kubectl query recipe to validate app health & HPA behavior.

Deliverables
Repository with this structure:
├─ app/
│  ├─ src/…                  # simple HTTP server
│  ├─ tests/…                # unit tests
│  ├─ Dockerfile
│  └─ README.md              # how to run locally
├─ charts/
│  └─ app/…                  # Helm chart + values.dev.yaml + values.prod.yaml
├─ infra/
│  ├─ terraform/
│  │  ├─ main.tf             # providers, cluster, node pool, registry, IAM
│  │  ├─ variables.tf
│  │  ├─ outputs.tf
│  │  └─ README.md
│  └─ ansible/
│     ├─ inventories/
│     ├─ roles/
│     ├─ playbooks/runner.yml
│     └─ README.md
├─ .github/workflows/ or .gitlab-ci.yml
├─ SECURITY.md               # threat model & hardening notes
├─ OPERATIONS.md             # runbooks: deploy, rollback, debugging, cleanup
└─ README.md                 # top-level overview & quickstart

Docs to include:

README.md: cloud chosen, prerequisites, how to run CI locally, how to authenticate, high-level flow diagram.

OPERATIONS.md:

Deploy: dev→prod.

Rollback: helm history/rollback or Argo Rollouts revert.

Troubleshooting: common kubectl commands, logs, events.

Cleanup: terraform destroy order & caveats.

SECURITY.md:

IAM roles/policies overview; why least-privilege is sufficient.

Secrets approach & rotation story.

Supply-chain controls: SBOM, image/IaC scanning, provenance (bonus: cosign).

Tutorial.md:
Topic: Automating the deployment and integration of a web service in GKE/AKE/EKS Kubernetes Cluster
Write a tutorial for/in the style of docs.enclaive.cloud. Use markup.

Success Criteria (Scoring Rubric, 120 pts)
Terraform (20 pts)
Correct cluster, registry, and IAM (10)

Variables, modules, state handling, and destroy path (6)

Cost-aware and documented (4)

Ansible (10 pts)
Idempotent runner setup / golden image (6)

Clear inventory & docs (4)

Docker (15 pts)
Multi-stage, minimal, non-root, healthcheck (8)

Tagged images & caching strategy (4)

Unit tests wired into build (3)

Helm (20 pts)
Clean chart, sensible values, probes, resources (10)

HPA and ingress (6)

Rollout strategy implemented (4)

CI/CD (25 pts)
PR checks: tests, lint, scans, plan (10)

Main: build/push, deploy dev, smoke test (8)

Manual prod gate + rollback automation (7)

Security & Observability (10 pts)
OIDC or short-lived creds; secrets managed properly (5)

Metrics/logging accessible; basic dashboard or commands (5)

Docs & Tutorial (20)
Clear READMEs, runbooks, diagrams (+6)

Tutorial (+14)

enclaive.io

Frage im Vorstellungsgespräch bei enclaive.io

Beobachtete Unternehmen

Jobsuchen