ABOUT THE TEAM

Data Engineering is part of the Data Team, which is responsible for building and maintaining the company’s data infrastructure that enables analytics, reporting, experimentation, and business decision-making across the organization.

We work with a Data Lake, CDC pipelines, compute clusters, workflow orchestration tools such as Apache Airflow, and large-scale data processing workloads. Our goal is to make data reliable, accessible, timely, and useful for teams across the business.

This role is ideal for someone who enjoys building systems, solving data problems at the root cause, and improving the reliability and quality of data pipelines that others depend on.

ABOUT THE ROLE

We’re looking for a Junior/Mid Level Data Engineer who is curious, technically hands-on, and eager to grow into building reliable production-grade data systems.

You’ll work with senior engineers, analysts, product teams, and business stakeholders to build, maintain, and improve data pipelines and datasets used for analytics, reporting, and operational decision-making.

This role suits someone who enjoys going beyond surface-level fixes: tracing data issues across systems, validating assumptions, reading pipeline logic, improving data quality checks, and learning how to operate data systems at scale.

You do not need to know everything from day one, but you should be comfortable learning quickly, asking good questions, writing clean SQL and Python, and taking ownership of assigned work.

WHAT YOU’LL DO

Build and maintain data pipelines

• Design, build, and maintain data pipelines that ingest, transform, validate, and serve data for analytics and business use cases.

• Work with batch data processing workflows, and gradually learn streaming or CDC-based patterns where applicable.

• Support end-to-end pipeline development from source ingestion to data lake storage, transformation, modeling, and serving layers.

• Write SQL and Python code for data transformation, automation, validation, and pipeline support.

• Help improve pipeline reliability, performance, and maintainability over time.

Support data infrastructure and orchestration

• Work with workflow orchestration tools such as Apache Airflow to schedule, monitor, and troubleshoot data jobs.

• Support workloads running on compute clusters and data lake environments.

• Help maintain datasets, tables, partitions, schemas, and transformation logic used by analytics and reporting teams.

• Assist in improving data pipeline documentation, runbooks, and operational playbooks.

Data quality, reliability, and troubleshooting

• Investigate pipeline failures, data discrepancies, freshness issues, and unexpected metric changes.

• Perform root-cause analysis by checking source data, transformation logic, job logs, SQL queries, and downstream reports.

• Build or improve data quality checks for completeness, freshness, accuracy, duplication, and anomaly detection.

• Work with analysts, product teams, and engineers to clarify expected data behavior and resolve issues.

• Help implement fixes that prevent recurring problems, not just temporary patches.

Engineering practices

• Write clean, readable, modular, and maintainable code.

• Participate in code reviews and learn good engineering practices such as testing, version control, dependency management, and CI/CD.

• Follow team standards for naming, documentation, data modeling, and pipeline development.

• Contribute to technical documentation, including data flow notes, pipeline logic, data contracts, and troubleshooting guides.

Collaboration and communication

• Partner with Data Analysts, BI users, Product, Engineering, and Operations teams to understand data needs and translate them into reliable datasets and pipelines.

• Explain data issues, pipeline behavior, and trade-offs clearly to both technical and non-technical stakeholders.

• Raise risks early when data quality, pipeline stability, or delivery timelines may be affected.

WHAT WE’RE LOOKING FOR

Technical Competencies — Junior Level

• Comfortable writing SQL queries involving joins, CTEs, aggregations, filtering, and basic performance awareness.

• Able to write Python scripts for data transformation, automation, validation, or analysis.

• Basic understanding of data pipelines, ETL/ELT concepts, and data warehousing or data lake concepts.

• Familiarity with version control, preferably Git.

• Basic understanding of data quality concepts such as freshness, completeness, accuracy, duplication, and anomaly checks.

• Willingness to learn orchestration tools such as Apache Airflow and distributed processing concepts.

Technical Competencies — Mid Level

• Solid experience building or maintaining production data pipelines.

• Strong SQL skills, including query optimization awareness and data modeling considerations.

• Good Python coding ability with attention to clean, reusable, and testable code.

• Hands-on experience with workflow orchestration tools, preferably Apache Airflow.

• Experience working with data lakes, warehouses, or large-scale analytical datasets.

• Understanding of data modeling concepts such as OLTP vs OLAP, partitioning, fact/dimension tables, and how models affect usability and performance.

• Able to troubleshoot pipeline failures, performance issues, and data quality problems with minimal supervision.

• Familiarity with observability concepts such as logs, metrics, alerts, SLA/SLOs, and pipeline monitoring.

Behavioural Competencies

• Strong curiosity and willingness to learn deeply.

• Enjoys solving ambiguous technical problems and tracing issues to root cause.

• Strong sense of ownership over assigned pipelines, tasks, and data quality.

• Analytical thinking and attention to detail.

• Clear communication with both technical and non-technical stakeholders.

• Able to work under guidance while progressively taking more independent ownership.

• Comfortable asking questions, receiving feedback, and improving through code reviews.

• Reliable, structured, and proactive in following through on issues.

THE “EXTRA MILE” MINDSET WE VALUE

We value engineers who don’t stop at “the job passed” or “the query returned results.” You demonstrate this by:

• Treating data issues as problems to understand, not just tickets to close.

• Tracing data from source to transformation to downstream usage when needed.

• Reading documentation, pipeline code, SQL logic, job logs, and configuration to understand behavior.

• Speaking with other teams to clarify source system behavior, business definitions, and ownership boundaries.

• Creating small scripts, checks, or reproducible examples to validate assumptions.

• Testing fixes carefully before promoting them into production workflows.

• Thinking about how to prevent the same issue from happening again.

• Improving documentation so the next person can debug faster.

NICE-TO-HAVES / ADVANTAGES

• Experience with Apache Airflow or other workflow orchestration tools.

• Experience with Spark, EMR, Databricks, Flink, or similar distributed processing tools.

• Experience with CDC, event-driven data, Kafka-style patterns, or streaming pipelines.

• Experience with cloud data platforms and object storage.

• Experience implementing data quality checks or anomaly detection.

• Familiarity with CI/CD, unit testing, integration testing, or pipeline testing.

• Active involvement in open-source projects, technical writing, hackathons, side projects, or data engineering portfolios.

• For fresh graduates: strong academic track record, internships, competitions, leadership experience, or standout technical projects.

QUALIFICATIONS

Junior Level

• 0–2 years of experience in data engineering, software engineering, analytics engineering, BI engineering, or a strong portfolio of data projects.

• Bachelor’s degree in Computer Science, Software Engineering, Data Science, Statistics, Mathematics, Engineering, Economics, or equivalent practical experience.

• Internships, academic projects, freelance work, or side projects involving SQL, Python, pipelines, automation, or data processing are welcome.

Mid Level

• 2–5 years of experience in data engineering, analytics engineering, software engineering, or production data systems.

• Experience building, maintaining, or operating data pipelines in a production or business-critical environment.

• Strong practical experience with SQL, Python, orchestration, and data quality practices.

WHAT SUCCESS LOOKS LIKE IN THE FIRST 3–6 MONTHS

• You understand the team’s core data pipelines, datasets, orchestration patterns, and common failure points.

• You reliably deliver SQL, Python, and pipeline tasks that are reviewed, tested, documented, and production-ready.

• You contribute to improving pipeline reliability through better checks, monitoring, documentation, or fixes.

• You proactively identify data quality issues and help trace them to the root cause.

• You can take a moderately defined data engineering task, clarify requirements, propose an approach, and deliver with guidance.

• You build trust with analysts, engineers, and stakeholders by being dependable, curious, and thorough.

• For mid-level candidates, you begin owning selected pipelines or datasets end-to-end and help guide junior teammates through reviews, debugging, and best practices.