Data Analytics Principal Software Engineer required to architect and design scalable data platforms, driving the data architecture vision across Data Lake, Pipelines, and/or Mesh.
You will need to have software developer background with hands-on experience on several large enterprise data lake projects, preferably with strong Python. Must also have experience as a technical lead across multiple teams (both onshore and offshore) building data platforms, customer facing data products and/or machine learning systems, with experience of product analytics tools (Mixpanel, Power BI, Athena). Experience working with LLMs in Data engineering and using AI as an accelerator is also key to this role.
Technology Requirements:
Data Architecture & Design: Data Lakes (e.g., AWS S3, Azure Data Lake, Google Cloud Storage), Data Mesh principles, domain-oriented data ownership and federated governance, data modeling (OLAP/OLTP, dimensional modeling, schema evolution)
Data Engineering & Pipelines: ETL pipelines (using tools like AWS Glue, Apache Spark), Map-Reduce, streaming data platforms (e.g., Kafka, SQS), real-time and batch processing paradigms
Cloud & Infrastructure: cloud-native data services (AWS Glue, Azure Synapse, GCP BigQuery, Databricks), Infrastructure-as-Code (IaC) (using Terraform, CloudFormation, Lakeformation)
Programming & Scripting: Python and SQL, C#, CI/CD pipelines and DevOps practices for data workflows
Data Governance & Security: Data cataloging and lineage tools (e.g., Collibra, Apache Atlas, OpenMetaData), data privacy, encryption, access control (e.g., IAM, RBAC, ABAC), and compliance frameworks (GDPR)
Observability & Reliability: Monitoring and alerting for data systems, data quality frameworks (e.g., Great Expectations, Monte Carlo), designing for resilience, fault tolerance, and disaster recovery