We are building a customer-facing data platform from the ground up. The platform requires real-time data ingestion, high availability, and integration with Generative AI.
You will lead architecture decisions on Azure, sitting between Backend Engineering and Data Engineering. You will own a code-first approach: Python-based pipelines and APIs, not drag-and-drop ETL tools.
Key Responsibilities
Design event-driven ingestion architectures. Select appropriate Azure compute (Functions, Durable Functions, containers) based on workload characteristics.
Design OLTP schemas for sub-second application retrieval, not just offline reporting.
Architect data flows for RAG workflows. Ensure unstructured text is processed and indexed for vector search.
Build reusable Python frameworks. Enforce unit testing, CI/CD, and code reviews within the data team.
Own synchronization between operational databases (SQL/NoSQL) and analytical workloads. Design for consistency, replay capability, and clear separation between serving and analytical layers.
Required Skills
10+ years in IT, with 5+ years as a Lead or Architect.
Software engineering background (APIs, microservices) before moving into data. You understand that cloud data engineering is software development.
Expert Python skills for backend processing, API integration, and automation. Clean, testable, production-grade code.
Hands-on Azure experience: Functions (Consumption/Flex), Event Grid, Event Hubs. You understand tradeoffs between serverless, container-based, and dedicated compute.
Experience building pipelines that feed LLMs or vector databases.
Lakehouse architecture: medallion patterns, Delta Lake semantics, separation of storage and compute. Experience with Microsoft Fabric or Azure Databricks.
Data modeling for both transactional systems and analytics. You know the difference between designing for an app vs. a data lake.
Good to Have
DP-203, DP-600, or Databricks Data Engineer Associate certification.
Azure AI Search experience.
Commercial SaaS product background.
Tech Stack
Python, SQL, PySpark, Spark, Azure Data Lake (ADLS Gen2), Azure Functions, Event Grid, Event Hubs, Delta Lake, Microsoft Fabric, Azure Databricks, REST APIs, Kafka, Airflow, Docker, Kubernetes, Git, Terraform, Data Pipelines, Streaming, Big Data, Data Governance, Data Quality
Regards
Talent Acquisition
3Pillar Global