Long Bui bio photo

Long Bui

I am Long, Data Engineer and Technical Writer

Email Twitter LinkedIn Github Youtube

Overview

Exposing Data Engineer

As supporting skill for data engineer, CloudOps, DevOps, SecOps, DataOps is most of relevant key-words that are mentioned in Search Engine Platform.

Overview

1. Introduction
2. Revising scope of Data Engineer
3. Supporting Data Pipeline Component
4. Exposing scope of Data Engineer
    - Required Skill for Data Engineer
    - Strongly supportive tools
5. Use case of Data Pipeline building from Scratch

Introduction

This topic is not focus on Data Engineering Concept

Data Engineering Community is in Slack channel: data-engineering-community

Scope of Data Engineer

Generally, the responsibility of Data Engineer is crucial in ensuring that data-driven business of company or organization. Participating into data systems with effectively store, process and analyze amount of data that gain valuable insights and make it easier to help others in decision making.

  • Building and maintaining data pipelines for efficient data processing and storage.
  • Implementing and managing data warehouses, data lakes, and other data storage systems.
  • Developing and maintaining data processing applications and tools.
  • Designing and implementing data security and access control policies.
  • Optimizing data retrieval and analysis performance.
  • Collaborating with data analysts, data scientists, and other stakeholders to ensure data quality, integrity, and usability.
  • Troubleshooting and debugging data infrastructure and systems issues.
  • Staying up-to-date with emerging data technologies and industry best practices.
  • Developing and maintaining documentation and training materials for data infrastructure and systems.
  • Ensuring compliance with relevant data privacy and security regulations.
  • Shaping and modeling data into structured format as business or product driven, organize and optimize data into value-oriented.
  • Might support DevOps and Security to provide details of data operations (aka DataOps) and security (aka Data Governance)
  • Monitoring and responding and resolving data issues, problem issues within SLA.

Supporting Components for Data Engineer

Skill sets of data engineer

  • Programming languages: Data engineers need to be proficient in at least one programming language such as Python, SQL, NodeJS, Java, or Scala.
  • Big data technologies: Data engineers should be familiar with various big data technologies such as Databricks, Hadoop, Spark, Kafka, Hive, and HBase.
  • Cloud services: Data engineers should be familiar with cloud services such as AWS, Azure, and Google Cloud Platform.
  • Data storage: Data engineers should be familiar with different types of data storage such as relational databases, NoSQL databases, and Data Lake.
  • ETL tools: Data engineers should be familiar with ETL (Extract, Transform, Load) tools such as Airbyte, Apache NiFi, Apache Airflow, and AWS Glue.
  • Data modeling: Data engineers should be familiar with data modeling techniques such as relational modeling, dimensional modeling, and schema design.
  • Data visualization: Data engineers should be familiar with data visualization tools such as Tableau, Power BI, and Looker.
  • Data governance: Data engineers should be familiar with data governance principles and regulations such as GDPR, CCPA, and HIPAA.
  • Communication and collaboration: Data engineers should have excellent communication and collaboration skills, as they often work with cross-functional teams including data scientists, business analysts, and software engineers.

Support skill sets of data engineer

Software Development Tools

  • Jira
  • Azure DevOps
  • Rally
  • Asana
  • Git Workflow

Testing and debugging

  • Logging
  • Pytest
  • VSCode