Data Science for the Biomanufacturing Workforce


In Nov 2021, I attended the “Broadening Data Science Education for the Future Biomanufacturing Workforce” virtual meeting [1]. The goal of the event was to discuss the unique challenges associated with educating and training future biomanufacturing employees in the areas of data science, artificial intelligence (AI), and machine learning. According to one presentation, an R&D World survey found that “62% of professionals say AI will lead to faster R&D, but is held back by a skills gap and data bias [2].”

The meeting opened with an introduction to the complexities of cell manufacturing. As noted in previous blogs [3,4], biotechnology-based therapies are becoming more individualized and cellular. CAR-T (chimeric-antigen-receptor T-cells), regenerative medicine, gene therapy, and others use a person’s own cells to treat their disease or repair their tissues. Unlike small-molecules (think aspirin), or purified proteins (think insulin or antibodies), cells are living materials that need to be isolated, prepared, grown, maintained, and packaged in special ways. They have a natural variability that needs to be understood in terms of safety and efficacy. Understanding the critical variability parameters, and how they can be used in predictive ways, requires that we collect the rights kinds of data from a large enough number of processes. The resulting data need to organized into models that can in turn be interrogated by new analytical methods to monitor processes and ensure safety and efficacy. Importantly, the manufacturing systems that produce cell- and tissue-based therapies must track the chain of custody (provenance) from individual, to lab, and back to that individual with absolute certainty. As one might imagine, data management and analysis systems are critical to cell manufacturing. Hence, the need for a data science capable workforce.

After the introductory details about cell manufacturing processes, the need for new kinds of biosensors that can measure processes in real time were discussed along with issues related to the cell manufacturing supply chain. From here, presentations moved into issues related to workforce in the context of two challenges. The first being related to developing relevant data science courses and hands on experiences. The second, and greater challenge, focused on how to stimulate interest in cell manufacturing to build the entry level workforce needed to meet employment demand. With respect to the first challenge, the presenters had consortia, clear ideas, recommendations, and examples of success. When entry level workforce (here defined as two-year degrees/certificates) and student recruiting were discussed, the details became less defined and aspirational.

A good way to start recruiting individuals is to describe the jobs. Toward that goal, a slide I liked presented the data life cycle in the context of job roles and skills. The figure (see below), from a National Academies report [5], lists the data life cycle phases from questions to using data in assessments. Job roles are listed below the data life cycle and a shaded grid is used to show which roles contribute in the different data life cycle phase. Job roles are organized in terms of two Data Science Capability Levels: 1) Acumen (practitioner abilities), and 2) Literacy (data use and knowledge). Data engineer, scientist, and analyst roles fall into acumen. These are the people who write algorithms, develop data models and databases, perform machine learning, and mine data. The data users, domain experts, and leaders (decision makers and managers) need data literacy to develop the questions, review data for consistency, identify anomalous patterns, and make decisions. It's worth noting that these roles overlap and people can (and will) move into different roles as their careers develop. What’s important about the chart is that it illustrates that effective data science requires a village of individuals with different kinds of deep expertise. A later presentation, from NIH, took this concept a step further and talked about skill development profiles that range from support to planning, to software development, to data use.

Data-related job roles and the data life cycle (modified from [5]). Several job roles are required to support data analysis and decision making in cell therapy and regenerative medicine. Tasks associated with data engineering, science, and analysis are listed below the table. 

As I listened to the presentations, it occurred to me that highly technical areas such as biomanufacturing become opaque as acronyms and jargon creep into the conversation. TLA*s were liberally sprinkled throughout all presentations. Some acronyms were appropriately defined on a slide and maybe expanded in speech. Mostly they were not, or you forgot. Over the course of the five-hour event, more than 30 acronyms were included: NIIMBL, CMaT, armi, CQA, CPP, MoA, QbD, IoT, In-Line, At-Line, PAT-based, AI, MSC, LARS, GBM, many cryptic biology terms, EWD, FMNet, DSC-WAV, DSERT, NASEM, TIER, USP, DSP, NCMC, NIST, SCB, SMART. Which ones do you know?

While useful for condensing speech and communicating with experts, acronyms and specialized speech create barriers and limit participation. Something to consider when we want to be inclusive and recruit new people.

Notes and References

*TLA = Three letter acronym.

[1] To view a replay of the event visit
[2] -
[3] Individualized Medicine Creates Job Opportunities
[4] What is a Biomedical Device?
[5] Empowering the Defense Acquisition Workforce to Improve Mission Outcomes Using Data Science ( page 55
†Presentations were given by scientists and others from the National Academies, Georgia Tech, Amherst College, FDA, NIH/NLM, Bristol Myers Squibb, Intel, Kordon Consulting, and Carnegie Mellon University.


Submitted by Todd Smith on Sat December 04, 2021.

Related Blogs