Join Neurons Lab as a Cloud Engineer on a delivery engagement with a regulated EU BFSI enterprise (German-speaking client). The product is an AI / RAG-based enterprise productivity tool running in production across the client's internal teams. You will pick up a CDK-based codebase already deployed inside the client's AWS account, take over from the outgoing engineer, and own cloud delivery end-to-end: production hardening, security findings remediation, RAG infrastructure stability, and SSO/RBAC integration with the client's identity stack. This is a pure delivery role on a live, customer-managed AWS environment. Data protection is the single most important constraint on every architectural and operational decision. Reporting: AI Architect on the engagement; day-to-day collaboration with the AI Delivery Manager and ML Engineer.
Own and extend the existing AWS CDK codebase deployed inside the client's AWS account.
Operate the production stack: ECS Fargate, ECR, ALB (public + internal), VPC, CDN, S3, AWS Bedrock.
Run the data layer: Postgres, Redis, vector database (Qdrant or similar), LLM observability (Langfuse or similar).
Triage and remediate AWS Security Hub / Health Dashboard findings independently — the client expects us to handle this end-to-end.
Integrate SSO and RBAC with the client's identity stack.
Keep the RAG stack reliable as additional pilot teams onboard; partner with the ML Engineer on retrieval-quality incidents.
Own cost tracking and capacity planning for the client's Bedrock + ECS spend.
Document CDK constructs, runbooks, and incident playbooks so handover to the next engineer takes days, not weeks.
This is a short-term contract engagement starting May 4, 2025, for a duration of 5 weeks at 0.3 FTE (approximately 12 hours per week). The schedule is flexible; however, some availability during standard business hours will be required for team syncs and occasional client meetings.
Advanced AWS CDK (primary) — must be able to extend an existing CDK codebase from day one, not just author from scratch.
AWS Bedrock hands-on experience — model invocation patterns, IAM scoping, cost monitoring.
ECS Fargate in production: task definitions, service auto-scaling, ALB target groups, blue/green or rolling deploys.
Networking: VPC design, public/private ALB patterns, CloudFront, private subnet egress.
RAG-stack ops: deploying and operating a vector database, Postgres (RDS/Aurora), Redis (ElastiCache), and an LLM observability layer on AWS.
AWS Security Hub / Inspector / Health Dashboard — finding triage and remediation in restricted client environments.
Python — FastAPI backends, MLOps automation, deployment glue.
Identity & access: SSO (Okta / Azure AD / Cognito), RBAC, IAM least-privilege design.
Terraform — secondary; useful for modules supplied by the client's IT team.
Working in restricted client AWS accounts — limited permissions, async approvals, wiki/docs-portal handovers.
Communication: clear written and verbal English. German is a strong plus, not required.
AWS Certified Solutions Architect — Associate or Professional (required), or AWS Certified DevOps Engineer — Professional.
Working knowledge of AWS Well-Architected framework, especially Security and Reliability pillars applied to BFSI.
Familiarity with EU AI Act obligations relevant to RAG / GenAI products.
GDPR fundamentals as they apply to credentials, logs, and EU data residency.
5+ years in cloud / DevOps / cloud engineering, with 2+ years of hands-on AWS CDK in production.
2+ years operating AI/ML or GenAI workloads on AWS (Bedrock, SageMaker, or comparable).
Direct experience deploying inside a regulated client's AWS account (BFSI, healthcare, government, or similar) — not just internal sandbox environments.
Track record of stepping into an existing codebase mid-project and shipping within 1–2 weeks.
Comfortable being the only Cloud Engineer on a small (3–4 person) delivery team.