Data professionals: An overview of specialisations and their role in creating data-driven solutions

The data industry is one of the fastest-growing areas of IT. The global market for data is estimated to reach US$86.1 billion by 2024, and the upward trend in the volume and cost of data is expected to continue. To personalise marketing communications and public services, corporations and governments accumulate and process vast amounts of diverse data. Working with data is a substantial challenge for the IT industry as well. Software can be competitive only if it has effective algorithms for storing, processing and protecting data. Data are becoming one of the most valuable assets for a competitive advantage. As a consequence of this trend, companies are targeting employees with data skills.

Prospects for working with Data. Demand forecasts for data specialists

The market demand for data specialists is growing in proportion to the growth in data volumes. The Bureau of Labor Statistics (USA) ranks these specialists as one of the 20 fastest-growing specialties, with the number of people working with data expected to increase by 31% during 2019–2029. The Final Report on the European Data Market Monitoring Tool 2020 predicted that the number of data specialists in the EU, 2019 to 2025, may increase by 41%.

The situation in the labour market reflects the data professional skills gap. This metric shows the difference between supply and demand for data skills. Between 2019 and 2025, the unmet demand for data skills in the European Union is forecast to increase from 399,000 to 484,000 jobs. The structure of demand for data specialists is also changing due to the rapid emergence of new methods and technologies for working with data. These methods and technologies affect not only the corporate sector but also the population. An IDC report stated that by 2025, 6 billion users, or 75% of the world’s population, will interact with online data every day. Hence, technologies and personnel structure for working with data will also change.

It is, therefore, important to understand the areas of responsibility, core competencies and skills of the various data specialists; the following sections discuss these areas in detail.

Data professionals: specialisations and niches, key competencies, traits

The many employees who are engaged in data can be referred to as data professionals or data specialists. For example, according to the definitions adopted in the EU, ‘Data professionals are workers who collect, store, manage, and/or analyse, interpret, and visualise data as their primary or as a relevant part of their activity. Data professionals must be proficient with the use of structured and unstructured data, should be able to work with a huge amount of data and be familiar with emerging database technologies’ (Final Report on the European Data Market Monitoring Tool 2020).

There are many different data workflows. The field of working with data is growing and becoming more complicated, and it is, therefore, logical to divide work and specialisations between different data experts.

Data specialists comprise two main groups. Most of the work of some specialists is aimed at collecting, storing and managing data. In this regard, data architects, data engineers, data QA engineers, database developers and database administrators (DBAs) are primarily involved. In the work of specialists in another direction, the analysis, interpretation and visualisation of data prevail. These professionals include data scientists and data analysts. For clarity, it is convenient to follow the data flow model as a sequence for collecting, storing, transforming, analysing data and building forecasts based on those data. Hence, the following section discusses those who collect and process data.

Data architect

Data architects, based on business requirements, determine what data must be collected for the requirements of projects and enterprises, as well as how these data should be stored, organised, integrated and used. These experts form the data environment and develop the rules and standards for it using conceptual, logical and physical data models. To perform these tasks, data architects may be skilled in data modelling or involve a data modeller—a systems analyst specialising in computer databases.

Data architects are also concerned with integrating data within an organisation and ensuring the security and availability of information. This specialist develops a detailed data processing plan and provides the necessary tools for working with data for data engineers, database developers and other project participants.

Database developer

Database developers deal with the design, development and optimisation of databases. They are also involved in preparing technical documentation and reports on database operations. Maintaining the technological relevance of databases and their modernisation are also the responsibility of database developers. With an emphasis on prioritising business intelligence (BI) tasks in database development, some companies define this position as a database and BI developer. The BI engineer described below also works with data using BI technologies.

Database administrator (DBA)

These specialists are involved with data storage and organisation, database maintenance and data infrastructure support. DBAs deal with capacity planning, configuration, monitoring, troubleshooting and database security; they also provide access to information for authorised users, organise backups and restore data. Similar jobs can also be referred to as data managers and data coordinators. Data managers are also expected to manage staff in the use of databases and create rules and procedures for data sharing. Data coordinators, in addition to working with databases, are also assumed to execute data queries.

Data engineer

Data engineers are engaged in collecting, processing, storing and transforming data. They ensure the readiness of the data for further use, as well as the resilience, scalability and security of these data.

Representatives of this specialty build, test and update the data infrastructure of a project or organisation. These engineers essentially power everything that the data architect has designed.

Typical functional responsibilities in this position are as follows:

  • Ensuring data collection (particularly large, real-time data)
  • Organisation and optimisation of data storage and access
  • Data Governance. All established data systems require monitoring, support and validation
  • Data transformation

The latter item is worthy of particular attention. Data engineers use ETL (extract, transform and load) systems to extract, transform and load information into a data warehouse. Data can be stored and exchanged in various formats, the entire set of which can be divided into binary, using any sequence of binary data, and text, based on plain text.

Data formats are constantly evolving. Previously, raw data (unstructured) formats were more commonly used. Then, chunk-based (container-based) formats were developed. This type of format includes, for example, widespread XML. The desire to improve the convenience of working with data for programmers and users has led to the introduction of formats such as, in particular, JSON and YAML. Directory-based formats are also used, which can be thought of as file systems (OLE, etc.). Similarly, new solutions are being sought in data interchange to improve the concept and standards of EDI (electronic data interchange) and the implementation of other similar developments. Thanks to data transformation, users will be able to perform various operations with the data, such as analysis or reporting.

The transformation of data when moving from legacy systems to new computer systems or when moving data contained in legacy systems to new databases has specific areas of concern. Legacy systems are typically referred to as computer systems or software that are outdated or built using previous-generation technologies. Newer computer systems are most often created to other standards. Accordingly, it becomes necessary to replace the legacy system or to ensure compatibility between the legacy system and the new system. If legacy computer systems or data storage are decommissioning, data migration is performed. If the data contained in legacy systems should be used in newer systems, data transformation is carried out.

An important condition for data processing is to bring such data to a single standard, uniformity. To achieve this aim, the main (target) format is most often determined, to which the data taken from different sources are reduced. Thus, in the transformation process, the data are first extracted, then the source formats are compared, and the target format is determined. The transformation process ends by converting the data and saving them in an already-transformed form. Transformation can be done manually, using programming languages, as well as using developed or cloud-based ETL systems.

Data engineers build pipelines for processing and transferring data from the source to the user. Thus, the integrity of the process of working with data is achieved—from collection to conversion into the desired format.

Data warehouse engineer

Data warehouse engineers create and manage corporate data warehouses. These experts maintain a full stack of data warehouses—from design to deployment and customisation. These specialists also code and design data warehouse software and ETL, and they are responsible for corporate applications for working with data as well as for the relationship between local and cloud infrastructures. Working with all the stakeholders, the data warehouse engineer optimises warehouse functioning, troubleshoots data access problems and analyses queries.

Each organisation, for reasons of economic efficiency, information security and management optimisation, can decide to change the location and conditions for storing its data. Therefore, data warehouse engineers should always be ready to plan and conduct data migration—moving data from one warehouse to another.

For example, migrations from servers to the cloud or from one cloud to another are widespread. In the case of the decommissioning legacy systems mentioned above, data migration to new systems is also carried out. Such migration is preceded by careful preparation, which typically includes selecting, preparing, extracting and transforming data. Validating the migrated data is also part of the migration process.

BI engineer

BI focuses on examining the impact of data on operations and business profitability. BI engineers consider customer requirements, translate technical concepts into business terms and format the data so that they can be used in business analysis.

The efforts of such professionals are aimed at processing data to produce useful insights. Other areas of focus include designing and creating a data repository using BI tools and developing data preparation processes and data models for analysis and reporting. BI Engineers provide data loading and convenient access to these data for analysts and other users. These engineers frequently also perform research and analysis of the selected data.

Data QA engineer

Initially, the data can come in various forms:

  • text – in the form of symbols denoting language tokens
  • numerical – as numbers and signs of mathematical operations
  • visual – in the form of images, videos, events, objects, etc.
  • sound – obtained acoustically

The data may be inconsistent. Data quality assurance is about identifying and correcting any data anomalies through data profiling and cleansing processes.

Data QA engineers are responsible for validating data pipelines and ensuring the accuracy of the datasets that are produced. Such engineers prepare data quality reports using various test methods, particularly automated tests, statistical analysis tests, etc.

By consistently examining a typical data flow, we now move the discussion to those experts who apply the data.

Data scientist

Data scientists create and implement models for extracting knowledge, which is further analysed and becomes the basis for forecasting and decision-making. A representative of this specialisation researches what data must be collected and used to achieve the business objectives.

Data science is an interdisciplinary field. Hence, a data scientist must have knowledge in many related fields, such as data mining, machine learning (ML) and big data. Data scientists work with raw data using statistical and mathematical techniques, computing and modelling.

The tasks of a data scientist include:

  • Designing, developing, training and testing of models and algorithms for data processing
  • Prepare large datasets for analysis
  • Analyse algorithm results and build statistical reports
  • Automation of forecasting and decision-making processes – creation and maintenance of model pipelines, where all models are automatically trained and updated so that their forecast is based on the latest available data
  • Exploratory data analysis with a focus on big data and the design of experiments to validate hypotheses
  • Creating requirements for engineering teams to collect business-critical data
  • Data scientists represent a connecting link for all participants in the data flow. Such experts, on the one hand, determine the structure for working with data for engineers and developers, and on the other hand, create the basis for the work of analysts and data users. These professionals understand how to adapt data configuration and data parameters to meet the needs of computer systems and enterprises

Data scientists took part in an AI-driven Leadership Tool project. Read more in the case study.

Data analyst

Data analysts are responsible for validating, grouping, transforming and reporting data as an informative basis for decision-making. These experts analyse, interpret, visualise and report data to stakeholders. This specialty is characterised by the tasks of discovering useful information, substantiating conclusions and providing information support to prepare business solutions and strategies.

Let us define the difference between a data scientist and a data analyst. Data analysts primarily use models based on relationships and patterns in the datasets that the data scientist creates. It is important for a data analyst to be able to organise and analyse A/B testing as well as conduct exploratory data analysis (EDA), which identifies trends and dependencies that are important for business and product decisions. The data analyst’s role in a software development project is evident in the Market Analysis Tool case.

A review of a data analyst’s work would be incomplete without touching on a similar position—a BI analyst. Data analysts tend to focus more on future forecasts and trends by performing predictive analytics, whereas BI analysts summarise and present past data by performing descriptive analytics.

Machine learning engineer

Machine learning (ML) is data intensive. Input data for ML can be classified in the following manner:

  • Feature vector – characterising the individual, as a rule, measured properties of an object or phenomenon
  • Distance matrix – contains the distances between the set’s elements
  • Time series – a sequence of data collected at different times by the parameters of an object or process
  • Digital image or video sequence

To prepare the model well, the training data must describe all possible situations. In any case, one should strive for the training dataset to be sufficiently diverse and representative. ML engineers design, create, implement and deploy ML models to solve business tasks and can develop ML models on their own or in conjunction with a data scientist.

ML engineers are responsible for the ML data pipelines and data transformation pipelines. A valuable skill in this profession is the ability to deploy ML models of any complexity and integrate them with other systems.

Current niche specialisations for ML engineers include natural language processing (NLP), object detection, computer vision and image processing and deep learning—one of the areas of ML based on the use of neural networks.

Full-stack data scientist. Does such a professional even exist?

In recent years, it has become increasingly common to define one type of data specialist as a full-stack data scientist. However, a standard definition of the profile of such a specialist has yet to be agreed upon. Nevertheless, these scientists can be defined as generalists who can work at all major stages of the data cycle, from their analysis to the deployment of automated predictive solutions. Several years ago, Forbes, in fact, raised the question: ‘The Full Stack Data Scientist: Myth, Unicorn, or New Normal?

It should be noted that the trend of universalisation is inherent in the entire IT sector, and many professionals somehow combine the skills of two or more specialisations. Working with data is no exception.

Data industry fields

Interest in the appearance of full-stack data scientists is quite logical. By understanding the needs of the business and not only analysing the data and developing the model but also deploying that model and integrating it with the business application, these scientists increase the value created in software development projects.

However, the number of such full-cycle specialists is unlikely to be sufficient for the market. As rightly pointed out in the aforementioned Forbes article, the likelihood that one person will be able to master all three required areas (data science, production code and business acumen) equally well is particularly low. It takes too much time and practice to explore all three areas. It is more likely that the few professionals who can design and implement end-to-end full-cycle data solutions are best suited for small companies, especially start-ups. In turn, the following trends are expected for data specialists engaged in large-scale projects.

First, studying business requirements and implementing the proposed models in enterprise practices has become an important vector for developing data specialists. In addition, it is occasionally possible to compensate for the lack of any specific experience through the use of ready-made solutions. As in other areas of IT, for example, in software development, services, frameworks and libraries for working with data are gradually emerging, making it easier to access best practices and use proven patterns.

Requirements for data specialists

To recruit a data professional for a software development project, it is always necessary to detail a set of key skills and qualities of such a team member. The following section lists the requirements for a qualified data specialist.

Proven competence

To work with data, employees must be educated in computer science, applied mathematics or other related fields. Recently, data science has become a separate specialty in an increasing number of higher education institutions, providing good basic training for future data professionals.

Professional certificates also help evaluate the overall level and the strengths of data specialists. For example, training and certifications from database software companies such as Microsoft, MongoDB and Oracle are appreciated. Depending on the cloud infrastructure on which the project is being implemented, project team members will benefit from the certification of cloud vendors. In this context, it is worth noting the certifications issued by the leading cloud-solution providers:

  • IBM – data analyst professional, data science professional, data engineering professional, etc.
  • Google – data analytics, professional data scientist, professional ML engineer, etc.
  • Microsoft – Azure data scientist, data analyst, Azure database administrator, Azure data engineer, etc.
  • AWS – data analytics, etc.

And this list could be continued.

Technical skills

Mathematics, statistics and software coding are the basic knowledge of data specialists. Working with data requires an advanced mathematical apparatus and includes inter alia, calculus, linear algebra, probability theory as well as discrete mathematics subjects such as set theory, combinatorics, graph theory, algorithmics, information theory, Markov chains (Markov process) and Petri net (place/transition (PT) net).

Among the programming languages, the most common in the data community are R, Python and SQL. When working with data, Java, Ruby, C ++ and Perl are also frequently used. The organisation of data storage provides for the ability to work with SQL and NoSQL databases, data lakes, large amounts of information and can support the architecture of large-scale processing systems and databases.

For each specialisation in the data industry, a different set of technical skills is relevant. Data scientists, for example, require knowledge in the areas of GLM/regression, decision trees, time series, AI techniques, transfer learning and classical ML, such as deep learning. The ability to work with visualisation tools, including creating dynamic dashboards, is also highly important for data professionals. In particular, tools such as SAP Business Objects, Tableau, Power BI, QlikSense, Dash and Shiny are rather common.

Technical skills of data specialists

Analytical skills

Critical thinking and analytical skills are expected from any data professional. These qualities help professionals capture relationships, patterns and trends in the data, formulate and test hypotheses and make predictions about future events.

Soft skills

Priorities in the development of soft skills can also differ for various specialisations. For example, data engineers are more likely to focus on effective communication within a project team.

Data analysts and data scientists are expected to have the skills to communicate with stakeholders and the ability to represent and visualise data. Basic writing and communication skills are no longer sufficient for data specialists. A storytelling method and expository writing aimed at explaining specific information about the data are required for data professionals to be well understood by stakeholders.

Domain expertise

Each project is implemented in a specific targeted subject area. A data professional with experience in the desired domain increases the potential of the project team. Such a specialist easily adapts to the project, quickly finds the reasons for the difficulties that have arisen and suggests ways to solve them.

Knowledge of the specifics of the business in a particular industry helps to better understand the client’s needs and the product’s key features. In addition, new niche specialisations are constantly emerging in the data industry. Hiring highly specialised professionals creates additional opportunities for successfully solving the specific business problems of the chosen domain.

Seniority

There is the same gradation of seniority among data specialists as in other IT specialties—specialists at the junior, middle and senior levels.

Expertise, experience and a high level of proficiency in professional tools are important for working on the development of IT solutions. Therefore, it is better to invite a senior specialist to work with the data. Such a team player brings to the project best practices and proven methods. Typically, a senior data specialist has at least 5 years’ experience. For a senior database developer, for example, it is desirable to have at least 8 years’ successful work.

However, this rule is not common to all specialties in the data industry. For example, a data QA engineer and data analyst may well begin their employment without prior experience. The following are the recommended work experience thresholds for successful occupation in specialties.  

Recommended work experience thresholds for data specialties

To summarise the qualities and skills of data experts, it should be added that for almost all categories of these professionals, coding skills and knowledge of software engineering are important. The success of all types of engineers working in this field largely depends on a deep understanding of the patterns of processing, structuring and storing data.

For data scientists, knowledge of mathematics, statistics and data modelling is essential. In professions focused on using data, understanding the specifics of a client’s business objectives and the subject area comes to the fore. Such specialists are also distinguished by an ability to clean, select, visualise and report data.

The most important skills for the various data specialties are ranked in the following table.

Ranking the importance of skills for different data specialties

Let us visualise the evolutionary paths of data professionals and their possible transitions to related specialties.

Evolutionary paths of data professionals

Conclusion

As can be observed from the table, there are many complexities and important features in working with data. The role of data professionals is substantial in both software development projects and post-project support of the implemented IT product.

Statistics show that not paying attention to working with data is expensive. According to a Gartner study, ‘The average financial impact of poor data quality on organisations is $ 9.7 million per year.’ It is also estimated that US businesses lose $ 3.1 trillion annually due to poor data quality.

Conversely, engaging data specialists enables companies to leverage data to add value to their business, create a data-driven environment and provide information support for effective decision-making. Collaboration with an effective software development team of experienced experts is highly likely to ensure the high quality of the software product and IT infrastructure. Such teams of specialists with professional certificates in the core data technologies and cases of successfully implemented large-scale projects have been assisting SSA Group clients to meet the most complex challenges in the field of data science and big data since 2007.

Thank you for reading this article. If you have any questions, please feel free to write to us. Contact SSA Group team to open new business opportunities through data-driven solutions.

You may also like

you're currently offline