Meet the Data Engineer: He builds the systems that support data and analytics
To close out our series on data and analytics roles and why they are important to business, we’re going to explore the position of data engineer. Unlike data analysts, who focus on understanding business data, or data scientists, who leverage data to make business predictions, data engineers have an entirely different focus. In our introductory post, we summed it up like this:
Data engineers ensure raw data is accurately collected, securely and reliably stored, and structured for easy use by data scientists and analysts.
And this is no small task. Data engineers are able to speak the language of data science and analytics so they can define project requirements. They must also be adept architects and coders comfortable working with the latest cloud technologies. As such, mature data teams often require multiple data engineers with varying expertise to support the work of a single data scientist. It’s clear that the demand for data engineers could soon overtake that of data scientists and analysts.
Why businesses hire data engineers for their data teams
If you’ve already added a data analyst, data scientist, or both to your team, you might wonder if you really need a data engineer. After all, don’t data analysts and data scientists work with code, too? The answer is, to some degree, yes—and in smaller companies, this may be more than enough. But as the amount and complexity of data grows, so do the business demands on it. Data scientist and analyst roles focus on making sense of data and understanding what it means for business. Their application of coding will be to that end. However, gaining access to and bringing together the massive amounts of data collected by businesses requires purpose-built systems. And building those systems requires highly specialized engineering skills. Enter the data engineer.
In the most general sense, data engineers support the work of data scientists and analysts. A data scientist or analyst will design models they need to answer business questions. Data engineers figure out how to acquire, store, and combine the necessary raw data in ways so the analytics team can use it. A typical data engineering project will require an expert who can:
- Understand the purpose of the desired analysis and its requirements.
- Identify the sources of raw data and the granularity of information needed.
- Develop technical frameworks to filter, extract, and combine the raw data so it can be used for analysis.
- Build systems to hold and provide speedy access to the transformed data.
- Assess and ensure the quality of the transformed data.
Everything is coming up data engineers!
Even a couple years ago, investing in data engineering was an afterthought to the much-hyped focus on data science. But the field is rapidly coming into its own. Due to the complexity of dealing with massive (and continuously growing) amounts of data, the role is ever-broadening with endless variations, specializations, and combinations of the two. And while data engineering generalists do exist, accomplishing the five steps of a typical engineering project listed above is rarely a one-person job. Instead, it’s common for data engineers to focus in one of three areas:
Data engineers focused on data storage build and maintain the systems necessary to hold the data a business collects. This can include distributed systems, such as databases, or centralized systems such as data warehouses and data lakes. These systems may be housed on prem, in the cloud, or in a combination of the two.
Storage-focused engineers have a wide range of job titles, these include:
- Data Architect
- Data Warehouse Engineer
- Big Data Engineer
- Cloud Engineer
- Database Engineer
- Data Modeler
- Database Administrator
Data engineers focused on data flow will plan and build systems like data pipelines. They will facilitate the transfer of data among systems. And they will design filters and algorithms to ensure that the necessary data is accessible to data scientists and it’s combined and organized for analysis. These data engineers perform jobs similar to those of traditional ETL engineers, although with newer tool sets.
Data engineer job titles focused on data flow include:
- Data Pipeline Engineer
- ETL Engineer
- Data Integration Engineer
- Data Quality Engineer
Some data engineering roles focus intensely on preparing data for analytic access. These engineers work closely with data scientists to scale and adapt models for business intelligence and visualization tools. They must be fluent in data science concepts but must also be skilled coders, bridging the gap between the two disciplines.
Data application–focused data engineers include job titles such as:
- Machine Learning Engineer
- Business Intelligence Engineer
- Analytics Engineer
- DBT Engineer
What to expect when hiring a data engineer
Hiring a data engineer is no simple feat. We find clients are often surprised at how much of a challenge it can be to fill the role. Here are some important considerations to keep in mind when you are looking to add a data engineer to your data team:
While all data and analytics roles are in high demand, data engineers have started to outpace the much-touted data scientist and data analyst roles. As a career, data engineering has recently seen 40% – 50% YOY growth. And we don’t expect demand to wane anytime soon, as businesses demand even more from their data. Like the other analytics roles, data engineers are heavily recruited. This means experienced data engineers are regularly invited to apply to premium job openings. For a data engineer to initiate a job search is practically a thing of the past.
Because of the extremely specialized nature of data engineering, the role comes at a premium.
According to Indeed at the time of writing, the average salary for a data engineer in the U.S. is just over $135,000 per year. At Dataspace, we often fill openings for data engineers that pay more than data scientists. We’ve also seen requirements for highly-experienced data engineering lead roles paying in excess of $200k. In fact, even the junior-level data engineer roles we work on rarely pay less than $100k.
As we note above, in some organizations it’s difficult for a single data engineer to satisfy the needs of an established data team. While there is a lot of overlap among the three focus areas of data engineering, many data teams require multiple data engineers. And while you may find data analysts or data scientists directly supporting end user business departments, you will rarely find data engineers doing so. Their work is to support the data team.
The skills that fall under the wheelhouse of data engineering are vast and depend on the particular focus area. That said, all data engineers should understand some level of the analysis process, since that is what they are supporting. Like other roles in data science and analysis, SQL knowledge is a must. But data engineers must also be able to code using programming languages, such as Python, Scala, and Java.
Storage-focused data engineers will also need skills in database platforms, such as Oracle and MySQL, cloud technologies, such as Amazon Web Services and Google Cloud Platform, and big data tools like Hadoop and Spark.
Data engineers focused on data flow will rely heavily on their SQL and programming skills, but also require knowledge in ETL tools and API interfaces.
Data engineers who work in data application will require strong knowledge of data science concepts, building algorithms, and machine learning. Additionally, they often must be able to work with data visualization and business intelligence tools, such as Tableau and Power BI.
This is by no means an exhaustive list, but it gives you an idea of variety and complexity of the data engineering skillset.
Data engineering has such a broad scope, it’s vital to specify the experience and skills required for the role. Not all of these technologies are analogous, and maintaining any one data stack can require in-depth knowledge in the particular tools that it runs on. If you are hiring your first data engineer(s), look for career experience. Building out your company’s first serious data architecture is not a junior role.
When reviewing resumes, keep in mind much data engineering experience is hands-on. College degrees are becoming less and less important for data engineers. It’s impossible for a single degree program to churn out an appropriately-prepared data engineer—the field is simply too fast paced and mutable. Additionally many talented data engineers evolve into the field from other programming and development careers.
Furthermore, businesses with highly-specialized demands should be prepared to consider non-traditional candidates, which may be remote workers, overseas talent, or domestic candidates who require sponsorship.
Mature data teams require expert data engineers
If your business already employs multiple analytics-focused data experts, it’s only a matter of time until you’ll need to add data engineering power to your team. And if you’re already there, your data engineering needs likely continue to grow. Modern business runs on data. As businesses demand more insight from the information they collect, they require more support systems to facilitate that demand. Data engineering will continue to boom as a high-value role in business.
Would you like help with the challenge of filling a vital data engineer role?
Does your data team require engineering talent with a highly-specialized skill set? We can help you evaluate your hiring needs and find candidates with unique expertise. Contact our data talent recruiters to get started.