We Need More Data Science
I have become more and more fascinated with Enterprise AI lately. Some of the work being done in how we manage, interpret, and apply data is incredible. One thing I have noticed is that the once nascent market of Data Science is exploding. Whatโs equally astounding is not just a need for more data scientists, but also the development of tools that enable better and more scalable Data Scienceโโโwhile enabling other business functions to do Data Science. What gets me excited is what happens when data scientists, engineers, business analysts, marketing directors, product leads, and HR leaders are all able to build and deploy AI based applications.ย
But first, letโs briefly revisitย Artificial Intelligenceย (AI)โโโwhich can be simply defined as computer systems with the ability to perform tasks that ordinarily require human intelligence. Within AI, there are subsets of capabilities that are powered byย Machine learning (ML)ย andย Deep learning (DL). Machine Learning is the ability for computer systems to automatically learn and improve from experience without being explicitly programmedโโโaccessing and parsing through data to repeat tasks or notice patterns. Deep Learning is apart of ML, but involves using neural networks to mimic how our brains learn. This allows machines to solve complex problems even when using a data set that is very diverse, unstructured and inter-connected. ML generally works best with numerical data, categorical data, time-series data, and text data. Deep Learning is more specialized for images, video, audio, and more difficult types of data.ย
Data Scienceย is the practice of analyzing and interpreting complex digital data in order to assist in decision-making. Data Science applies ML and DL to data (numbers, text, images, video, audio, etc.) and produces specialized AI systems to do specific tasksโโโsuch as checking for risks in supply chains, or looking for fraud within banking transactions. These AI systems produce enterprise value in automating, optimizing, or producing actionable insightโโโwhich impacts earnings.
Now that we generally covered data science and the field of AI. Letโs go a bit deeper. First, itโs important to note that being a data scientist requires a diverse array of skills, and there are not enough of these highly skilled individuals. There are predicted to beย 2.7 million open jobsย in data analysis, data science, and related careers in 2020, withย 39% growth in employer demandย for both data scientists and data engineers by 2020 (source: IBM). In fact, data scientists have an average earning potential ofย $8,736 more per yearย than any other bachelorโs degree job (source: IBM).
Second, itโs important to know that data science sits in the center of a skills venn diagram ofย domain expertiseย (do you know your industry),ย programming skillsย (are you technically capable), andย mathematical/statistical skillsย (can you apply the right thinking).
Third, the skills and tools for data science are rapidly advancing. The old way for making predictions and getting insights followed these steps:
Prepare you dataset from the data source
Import data
Structure your dataset
Model assessment and validation
Collecting new data & retrain the model
Deploy
Monitoring & Management
Make predictions and get insights
Most machine learning algorithms need parameterization and even if some empirical strategies can help this is still complex and there is generally no deterministic way to find the optimal solution. There is also risk for error as the creation and maintenance of ML/DL models and AI systems involve choices and manual interventions that will impact the efficiency of the ML/DL pipeline.
The new way involves MLOps.ย Machine Learning Operations (MLOps)ย is an ML engineering culture and practice that aims at unifying ML system development (Dev) and ML system operations (Ops). Practicing MLOps means that you advocate for automation and monitoring at all steps of ML system construction - including integration, testing, releasing, deployment and infrastructure management. Simply said, MLOps is the technology and practices that provide a scalable and governed means to rapidly deploy and manage ML applications in production environments.
MLOps is being enhanced by enterprise tools to automate AI processes into simpler and more efficient steps. This is called AutoML.ย Automatic machine learning (AutoML)ย is a general discipline that involves automating any part of the entire process of AI system applications. By working with various stages of the machine learning/deep learning process, engineers develop solutions to expedite, enhance and automate parts of the AI system pipeline. These tools enable data scientists to do their job better and faster. But these tools also will allow anyone to do data science work. Business analysts already use these tools, and soon we will see data science tools across all orgs within a businessโโโfrom HR to Marketing.ย
So whatโs the actual impact we are talking about here? For one, companies who lead in AI adoption are the ones who are investing more in their future. Look below at which sectors are the leading sectors measured against their adoption of AI. In fact, the global projected spend on AI technologies in 2020 was $125B and the projected global GDP impact by 2030 is $15.7Tโโโyes trillion.
In conclusion, we need more data scientists to implement AI systems, and we need to empower data scientists with the best tools. We also need AutoML and data infrastructure and management tools to enable all kinds of business functions to do data science to scale AI systems and applications within their organization. This is still a growing market with immense potential and we are just beginning to see the breadth of impact it will have.
โOpinions expressed are solely my own and do not express the views or opinions of my employer, 137 Ventures