Career Category
Information Systems
Job Description
Roles & Responsibilities
• Develop, test, and maintain data pipelines using Databricks, PySpark, and Python.
• Ingest, transform, and process structured and semi-structured data from multiple sources.
• Support the development of scalable ETL/ELT workflows for analytics, reporting, and machine learning use cases.
• Work with data engineers, analysts, and data scientists to understand data requirements and deliver reliable datasets.
• Perform data cleansing, validation, and quality checks to ensure accuracy and consistency.
• Optimize Spark jobs and Databricks notebooks for performance, reliability, and cost efficiency.
• Create and maintain documentation for data pipelines, workflows, data definitions, and processes.
• Assist in troubleshooting pipeline failures, data issues, and performance bottlenecks.
• Follow best practices for version control, code quality, testing, and deployment.
• Support basic AI/ML data preparation activities, including feature engineering, dataset creation, and model input preparation.
• Monitor scheduled jobs and workflows to ensure timely and successful data delivery.
• Collaborate with cross-functional teams in an Agile or iterative development environment.
Basic Qualifications and Experience
• 2-6 years of experience with Bachelor's degree in Computer Science, Data Engineering, Information Systems, Engineering, Mathematics, or a related field, or equivalent practical experience
Must-Have Qualifications
• Bachelor's degree in Computer Science, Data Engineering, Information Systems, Engineering, Mathematics, or a related field, or equivalent practical experience.
• Hands-on experience with Python for data processing, scripting, and automation.
• Strong working knowledge of PySpark and distributed data processing concepts.
• Proven hands-on experience using Databricks for data engineering, including notebooks, clusters, jobs, workflows, Delta tables, and performance optimization.
• Ability to build, maintain, and troubleshoot scalable ETL/ELT pipelines in Databricks.
• Experience working with Delta Lake and lakehouse architecture concepts.
• Working knowledge of SQL for querying, transforming, and validating data.
• Ability to work with structured and semi-structured data formats such as CSV, JSON, Parquet, and Delta.
• Understanding of data engineering concepts such as ETL/ELT, data pipelines, data lakes, data warehouses, batch processing, and data quality.
• Basic understanding of AI and machine learning concepts, including features, training datasets, model inputs/outputs, and model evaluation basics.
• Experience supporting data preparation or feature engineering for AI/ML use cases.
• Familiarity with cloud-based data platforms, preferably AWS, Azure, or GCP.
• Understanding of Git or other version control tools.
• Strong analytical, problem-solving, and troubleshooting skills.
• Good communication skills and ability to work collaboratively with technical and non-technical stakeholders.
• Willingness to learn new tools, technologies, and data engineering best practices.
Preferred Qualifications
• Exposure to Delta Lake, Unity Catalog, or Lakehouse architecture.
• Experience with workflow orchestration tools or Databricks Jobs.
• Familiarity with CI/CD practices for data engineering projects.
• Exposure to machine learning workflows using MLflow, scikit-learn, or similar tools.
• Experience with Tableau, Power BI, or similar data visualization tools to create dashboards, support reporting needs, validate datasets, and perform exploratory analysis.
• Understanding of data governance, security, and access control concepts.
• Experience working in an Agile/Scrum environment.
Shift Information
This position may require working during later shifts (evening or night) depending on business needs.
.