Job Responsibility
Infrastructure Management:
– Design, develop, and maintain robust and scalable data pipelines to handle large datasets using both on-premise and cloud platforms (e.g., AWS, GCP, Azure).
– Implement and manage data storage solutions, including databases and data lakes, ensuring data integrity and performance.
Data Integration:
– Integrate data from various internal and external sources such as databases, APIs, flat files, and streaming data.
– Ensure data consistency, quality, and reliability through rigorous validation and transformation processes.
ETL Development:
– Develop and implement ETL (Extract, Transform, Load) processes to automate data ingestion, transformation, and loading into data warehouses and lakes.
– Optimize ETL workflows to ensure efficient processing and minimize data latency.
Data Quality & Governance:
– Implement data quality checks and validation processes to ensure data accuracy and completeness.
– Develop data governance frameworks and policies to manage data lifecycle, metadata, and lineage.
Collaboration and Support:
– Work closely with data scientists, AI engineers, and developers to understand their data needs and provide technical support.
– Facilitate effective communication and collaboration between the AI and data teams and other technical teams.
Continuous Improvement:
– Identify areas for improvement in data infrastructure and pipeline processes.
– Stay updated with the latest industry trends and technologies related to data engineering and big data.
Skills and experience
Education:
– Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field. A Master’s degree is a plus.
Experience:
– Minimum of 3-5 years of experience in data engineering or a similar role.
– Proven experience with on-premise and cloud platforms (AWS, GCP, Azure).
– Strong background in data integration, ETL processes, and data pipeline development.
Led the design and development of high-performance AI and data platforms, including IDEs, permission management, data pipelines, code management and model deployment systems.
Skills:
– Proficiency in scripting and programming languages (e.g., Python, SQL, Bash).
– Strong knowledge of data storage solutions and databases (e.g., SQL, NoSQL, data lakes).
– Experience with big data technologies (e.g., Apache Spark, Hadoop).
– Experience with CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI).
– Understanding of data engineering and MLOps methodologies.
– Awareness of security best practices in data environments.
– Excellent problem-solving skills and attention to detail.
Preferred Qualifications:
– Managed on-premise Spark cluster for hands-on big data processing – focuses on both deployment and usage.
Salary:
$2,000 – $3,500, depending on the candidate’s skills and experience.
Allowances:
- Onboarding allowance, meal allowance, and transportation allowance.
- Flight ticket to return home every six months.
Benefits:
- Provided with necessary equipment and office supplies to support work.
- Annual salary review and extensive promotion opportunities across all positions.
- Participation in company activities such as holiday celebrations, monthly events, running clubs, team building, year-end parties, and annual company trips.
- Bonuses for holidays and special occasions, quarterly and annual outstanding employee awards, project bonuses, and a 13th-month salary++ bonus.
- Commitment to employee competency development through professional training programs.
- Days off: 7 days off per month, flexible scheduling.