Data professionals: An overview of specialisations and responsibilities

Watch video

The data industry is one of the fastest-growing areas of IT. To personalise marketing communications and public expertises, corporations and governments accumulate and process vast amounts of diverse data. Working with data is a substantial challenge for the IT industry as well. The software can be competitive only if it has effective algorithms for storing, processing and protecting data. Data are becoming one of the most valuable assets for a competitive advantage. As a consequence of this trend, companies are targeting employees with data skills.

Demand forecasts for data specialists

The market demand for data specialists is growing in proportion to the growth in data volumes. The Bureau of Labor Statistics (USA) ranks these specialists as one of the 20 fastest-growing specialities. The Final Report on the European Data Market Monitoring Tool 2020 predicted that the number of data specialists in the EU, from 2019 to 2025, may increase by 41%.

The situation in the labour market reflects the data professional skills gap. This metric shows the difference between supply and demand for data skills. Between 2019 and 2025, the unmet demand for data skills in the European Union is forecast to increase from 399,000 to 484,000 jobs. The structure of demand for data specialists is also changing due to the rapid emergence of new methods and technologies for working with data. These methods and technologies affect not only the corporate sector but also the population. An IDC report stated that by 2025, 6 billion users, or 75% of the world’s population, will interact with online data every day.

It is, therefore, important to understand the areas of responsibility, core competencies and skills of the various data specialists; the following sections discuss these areas in detail.

Data professionals: specialisations and niches, key competencies, traits

According to the definitions adopted in the EU, data professionals are workers who collect, store, manage, and/or analyse, interpret, and visualise data as their primary or as a relevant part of their activity.

Data specialists comprise two main groups.  Some specialisations are involved in collecting, storing and managing data, for instance, data architects, data engineers, data QA engineers, database developers and database administrators. There is also another direction – data analysis, interpretation and visualisation. These professionals are presented by data scientists and data analysts. For clarity, it is convenient to follow the data flow model as a sequence for collecting, storing, transforming, analysing data and building forecasts based on those data.

Data architect

Data architects determine what data must be collected according to project requirements, and how it should be stored, organised and used. These experts form the data environment and develop the rules and standards for it using conceptual, logical and physical data models. To perform these tasks, data architects should be skilled in data modelling or involve a data modeller — a systems analyst specialising in computer databases.

Data architects are also concerned with integrating data within an organisation and ensuring the security and availability of information. This specialist develops a detailed data processing plan and provides the necessary tools for working with data for data engineers, database developers and other data users.

Database developer

Database developers deal with the design, development and optimisation of databases. They are also involved in preparing technical documentation and reports on database operations. Maintaining the technological relevance of databases and their modernisation are also the responsibility of database developers.

With an emphasis on prioritising business intelligence (BI) tasks in database development, some companies define this position as a database and BI developer. The BI engineer described below also works with data using BI technologies.

Database administrator (DBA)

These specialists are involved in data storage and organisation, database maintenance and data infrastructure support. DBAs deal with capacity planning, configuration, monitoring, troubleshooting and database security; they also provide access to information for authorised users, organise backups and restore data.

Similar jobs can also be referred to as data managers and data coordinators. Data managers are also expected to manage staff in the use of databases and create rules and procedures for data sharing. Data coordinators, in addition to working with databases, are also assumed to execute data queries.

Data engineer

Data engineers are engaged in collecting, processing, storing and transforming data. They ensure the readiness of the data for further use, as well as the resilience, scalability and security of these data.

Representatives of this speciality build, test and update the data infrastructure of a project or organisation. These engineers essentially power everything that the data architect has designed.

Typical functional responsibilities in this position are as follows:

  • Ensuring data collection (particularly large, real-time data)
  • Organisation and optimisation of data storage and access
  • Data governance. All established data systems require monitoring, support and validation
  • Data transformation

The latter item is worthy of particular attention. Data engineers use ETL (extract, transform and load) systems to extract, transform and load information into a data warehouse. Data can be stored and exchanged in various formats, the entire set of which can be divided into binary, using any sequence of binary data, and text, based on plain text.

It is important to bring all data to a single standard. Thus, in the transformation process, the data are first extracted, then the source formats are compared, and the target format is determined. The transformation process ends by converting the data and saving them in an already-transformed form. Transformation can be done manually, using programming languages, as well as using developed or cloud-based ETL systems.

Data engineers build pipelines for processing and transferring data from the source to the user.

Data warehouse engineer

Data warehouse engineers create and manage corporate data warehouses. These experts maintain a full stack of data warehouses — from design to deployment and customisation. These specialists also code and design data warehouse software and ETL, and they are responsible for corporate applications for working with data as well as for the relationship between local and cloud infrastructures. Working with all the stakeholders, the data warehouse engineer optimises warehouse functioning, troubleshoots data access problems and analyses queries.

Data warehouse engineers are also responsible for data migration — moving data from one warehouse to another. For example, migrations from servers to the cloud or from one cloud to another are widespread.

BI engineer

BI focuses on examining the impact of data on operations and business profitability. BI engineers consider customer requirements, translate technical concepts into business terms and format the data so that they can be used in business analysis.

The efforts of such professionals are aimed at processing data to produce useful insights. Other areas of focus include designing and creating a data repository using BI tools and developing data preparation processes and data models for analysis and reporting. BI Engineers provide data loading and convenient access to these data for analysts and other users. These engineers frequently also perform research and analysis of the selected data.

Data QA engineer

Data quality assurance is about identifying and correcting any data anomalies through data profiling and cleansing processes.

Data QA engineers are responsible for validating data pipelines and ensuring the accuracy of the datasets that are produced. Such engineers prepare data quality reports using various test methods, particularly automated tests, statistical analysis tests, etc.

Now, we are moving the discussion to those experts who apply the data.

Data scientist

Data scientists create and implement models for extracting knowledge, which is further analysed and becomes the basis for forecasting and decision-making. They research what data must be collected and used to achieve the business objectives.

Data science is an interdisciplinary field. Hence, a data scientist must have knowledge in many related fields, such as data mining, machine learning (ML) and big data. Data scientists work with raw data using statistical and mathematical techniques, computing and modelling.

The tasks of a data scientist include:

  • Designing, training and testing models and algorithms for data processing
  • Prepare large datasets for analysis
  • Analyse algorithm results and build statistical reports
  • Automation of forecasting and decision-making processes – creation and maintenance of model pipelines, where all models are automatically trained and updated so that their forecast is based on the latest available data
  • Exploratory data analysis with a focus on big data and the design of experiments to validate hypotheses
  • Creating requirements for engineering teams to collect business-critical data

Data scientists represent a connecting link for all participants in the data flow. Such experts, on the one hand, determine the requirements for engineers and developers on how to work with data. On the flip side, they create the basis for the work of analysts and data users. These professionals understand how to adapt data configuration and data parameters to meet the needs of computer systems and enterprises. Data scientists took part in an AI-driven Leadership Tool project.

Data analyst

Data analysts are responsible for validating, grouping, transforming and reporting data that is used for decision-making. These experts analyse and visualise data insights for stakeholders. Their main task is to discover useful information, substantiate conclusions and provide data-driven recommendations for achieving business goals

It is important for a data analyst to be able to organise and analyse A/B testing as well as to conduct exploratory data analysis (EDA), which identifies trends and dependencies that are important for decision-making. The data analyst’s role in a software development project is evident in the Market Analysis Tool case.

A review of a data analyst’s work would be incomplete without touching on a similar position — a BI analyst. Data analysts tend to focus more on future forecasts and trends by performing predictive analytics, whereas BI analysts summarise and present past data by performing descriptive analytics.

Machine learning engineer

Machine learning (ML) engineers design, implement and deploy machine learning models. They build artificial intelligence systems that leverage huge data sets to generate and develop algorithms capable of learning and making predictions.

To prepare the model well, the training data must describe all possible situations. ML engineers are responsible for building the machine learning data pipelines and data transformation pipelines.

Current niche specialisations for ML engineers include natural language processing (NLP), object detection, computer vision and deep learning — one of the machine learning areas based on the use of neural networks.

Full-stack data scientist. Does such a professional exist?

Recently, it has become increasingly common to define one type of data specialist as a full-stack data scientist. These scientists can be defined as generalists who can work at all major stages of the data cycle, from their analysis to the deployment of automated predictive solutions. Several years ago, Forbes, in fact, raised the question: ‘The Full Stack Data Scientist: Myth, Unicorn, or New Normal?

It should be noted that the trend of universalisation is inherent in the entire IT sector, and many professionals somehow combine the skills of two or more specialisations. Working with data is no exception.

Data industry fields

Interest in the appearance of full-stack data scientists is quite logical. By understanding the business needs and not only analysing the data and developing the model but also deploying that model and integrating it with the business application, these scientists increase the value created in software development projects.

However, the number of such full-cycle specialists is unlikely to be sufficient for the market. It takes too much time and practice to explore and master all three required areas, such as data science, production code and business acumen, equally well. It is more likely that the few professionals who can design and implement end-to-end full-cycle data solutions are best suited for small companies, especially start-ups.

Requirements for data specialists

To recruit a data professional for a software development project, it is always necessary to detail a set of key skills and qualities of such a team member. The following section lists the requirements for a qualified data specialist.

Proven competence

To work with data, employees must be educated in computer science, applied mathematics or other related fields. Recently, data science has become a separate speciality in an increasing number of higher education institutions, providing good basic training for future data professionals.

Professional certificates also help evaluate the overall level and the strengths of data specialists. For example, training and certifications from database software companies such as Microsoft, MongoDB and Oracle are appreciated. Depending on the cloud infrastructure on which the project is being implemented, project team members will benefit from the certification of cloud vendors such as AWS, Microsoft Azure, Google, and IBM.

Technical skills

Mathematics, statistics and software coding are the basic knowledge of data specialists. Working with data requires an advanced mathematical apparatus and includes inter alia, calculus, linear algebra, probability theory as well as discrete mathematics subjects such as set theory, combinatorics, graph theory, algorithmics, information theory, Markov chains (Markov process) and Petri net (place/transition net).

Among the programming languages, the most common in the data community are R, Python and SQL. When working with data, Java, Ruby, C ++ and Perl are also frequently used. These specialists work with SQL and NoSQL databases, data lakes, large amounts of information and large-scale processing systems.

For each specialisation in the data industry, a different set of technical skills is relevant. Data scientists, for example, require knowledge in the areas of GLM/regression, decision trees, time series, AI techniques, transfer learning and classical ML, such as deep learning. The ability to work with visualisation tools, including creating dynamic dashboards, is also highly important for data professionals. In particular, tools such as SAP Business Objects, Tableau, Power BI, QlikSense, Dash and Shiny are rather common.

Technical skills of data specialists

Analytical skills

Critical thinking and analytical skills are expected from any data professional. These qualities help professionals capture relationships, patterns and trends in the data, formulate and test hypotheses and make predictions about future events.

Soft skills

Priorities in the development of soft skills can also differ for various specialisations. For example, data engineers are more likely to focus on effective communication within a project team.

Data analysts and data scientists are expected to have the skills to communicate with stakeholders and the ability to represent and visualise data. Basic writing and communication skills are no longer sufficient for data specialists. A storytelling method and expository writing aimed at explaining specific information about the data are required for data professionals to be well understood by stakeholders.

Domain expertise

Each project is implemented in a specific targeted subject area. A data professional with experience in the desired domain increases the potential of the project team. Such a specialist easily adapts to the project, quickly finds the reasons for the difficulties that have arisen and suggests ways to solve them.

Domain expertise helps to better understand the client’s needs and the product’s key features. In addition, new niche specialisations are constantly emerging in the data industry. Hiring highly specialised professionals creates additional opportunities for successfully solving the specific business issues of the chosen domain.

Seniority

There is the same gradation of seniority among data specialists as in other IT specialities. There are junior, middle and senior-level specialists.

Expertise, experience and a high level of proficiency in professional tools are important for working on the development of IT solutions. Therefore, it is better to invite a senior specialist to work with the data. Such a team player brings to the project best practices and proven methods. Typically, a senior data specialist has at least 5 years of experience. For a senior database developer, for example, it is desirable to have at least 8 years of successful work.

However, this rule is not common to all specialities in the data industry. For example, a data QA engineer and data analyst may well begin their employment without prior experience. The following are the recommended work experience thresholds for a successful occupation in specialities.  

Work experience threshold for data professionals

To summarise the qualities and skills of data experts, it should be added that for almost all categories of these professionals, coding skills and knowledge of software engineering are important. The success of all types of engineers working in this field largely depends on a deep understanding of the patterns of processing, structuring and storing data.

For data scientists, knowledge of mathematics, statistics and data modelling is essential. In professions focused on using data, understanding the specifics of a client’s business objectives and the subject area comes to the fore. Such specialists are also distinguished by an ability to clean, select, visualise and report data.

The most important skills for the various data specialities are ranked in the following table.

data professionals skills

Let us visualise the evolutionary paths of data professionals and their possible transitions to related specialities.

Evolutionary paths of data professionals

Conclusion

The role of data professionals is substantial in both software development projects and post-project support of the implemented IT product. Statistics show that not paying attention to working with data is expensive. According to a Gartner study, ‘The average financial impact of poor data quality on organisations is $ 15 million per year.’

Conversely, engaging data specialists enables companies to leverage data to add value to their business, create a data-driven environment and provide information support for effective decision-making. Collaboration with an effective software development team of experienced experts is highly likely to ensure the high quality of the software product and IT infrastructure.

Thank you for reading this article. If you have any questions, please feel free to write to us. Contact the SSA Group team to open new business opportunities through data-driven solutions.

Video

Watch more videos

You may also like

you're currently offline