Welcome!

Blog Feed Post

How Many Data Scientists are out there?

By

Editor’s note: This post by Gregory Piatetsky first appeared at KDnuggets.comIt it he dives into a key question regarding the possible shortage of data scientists. -bg

Many people have read the McKinsey report on Big Data (May 2011) which predicted 

The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data.


However, it seems that so far the shortage is much less. 

The job title “Data Scientist” has grown tremendously in popularity, according to job siteindeed.com 

Job trend<br /><br /><br />
      for Data Scientist positions, 2006-2014 

However, notice that the demand stopped increasing sometime in 2013. 

As of March 13, 2014, Search for “Data Scientist” jobs (US-based) on indeed.com gives only 1,000 positions. We find about 10,000 jobs when searching for Data Scientist - without quotes, but many of these jobs have title “Scientist” or something to do with data, and not necessarily represent “Data Scientist” positions. 

Of course, many people may do similar work without having the title of “data scientist”. 

Several estimates may be relevant. 

Kaggle is the leading platform for data science competitions and claims to be world’s largest community of data scientists. Kaggle reached 100,000 in July 2013, reported110,000 in Sep 2013, 120,000 members on Oct 23, 2013, reported to have 140,000 on Feb 24, 2014. 
Latest numbers, from Kaggle CEO Anthony Goldbloom are: 157,142 Kaggle members, of whom 67,776 active in the last 6 months. 

A quick examination of the top 10 ranked Kagglers shows that only one has a title of “Data Scientist”. Top 10 include neuroscience researchers, PhD mathematicians and physicists, and while they are clearly talented competitors on Kaggle, their actual job may not involve data science. 

LinkedIn has many groups related to data science, Big Data and Analytics – see my analysis Top 2013 LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science

The two largest of these groups are:


Most members of these groups do not have the job title “Data scientists”. There is a “Data Scientists” LinkedIn group, but it has at present only 6,750 members. 

LinkedIn Data Scientist Peter Skomoroch, @PeteSkomoroch wrote 

Using the public LinkedIn search interface, with the job title in quotes – I see 12,170 members with the phrase “data scientist” anywhere their profile. Using the advanced search facet to look only at profiles with a current or past title containing the phrase “data scientist”, I see 6,896 results. Doing a plain keyword search will return many members that mention the words “data” or “scientist” anywhere in their profile, but the majority of those people have nothing to do with data science.


He further estimated that perhaps 150-250K people would be a match for a data scientist based on their skills and education. 

I remain optimistic that data scientist is a great profession, but I doubt that there is a demand for 100,000 new data scientist positions. There may be a re-branding of existing positions, or creation of teams which collectively do the data science job.

 

Gregory Piatetsky-Shapiro, Ph.D., is a well-known expert in Business Analytics, Data Mining, and Data Science. Gregory is the Editor and Publisher of KDnuggets.com, a Business Analytics “Guru” on Twitter, and a Top Influencer in Big Data, Data Mining, and Data Science. Gregory is a co-founder of KDD (Knowledge Discovery and Data mining conferences) and SIGKDD, professional organization for Knowledge Discovery and Data Mining. Gregory has over 60 publications and edited several books and collections on data mining and knowledge discovery.

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley writes on enterprise IT. He is a founder and partner at Cognitio Corp and publsher of CTOvision.com

Latest Stories
Redis is not only the fastest database, but it is the most popular among the new wave of databases running in containers. Redis speeds up just about every data interaction between your users or operational systems. In his session at 19th Cloud Expo, Dave Nielsen, Developer Advocate, Redis Labs, will share the functions and data structures used to solve everyday use cases that are driving Redis' popularity.
I wanted to gather all of my Internet of Things (IOT) blogs into a single blog (that I could later use with my University of San Francisco (USF) Big Data “MBA” course). However as I started to pull these blogs together, I realized that my IOT discussion lacked a vision; it lacked an end point towards which an organization could drive their IOT envisioning, proof of value, app dev, data engineering and data science efforts. And I think that the IOT end point is really quite simple…
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 19th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world and ThingsExpo Silicon Valley Call for Papers is now open.
You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
"My role is working with customers, helping them go through this digital transformation. I spend a lot of time talking to banks, big industries, manufacturers working through how they are integrating and transforming their IT platforms and moving them forward," explained William Morrish, General Manager Product Sales at Interoute, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
To leverage Continuous Delivery, enterprises must consider impacts that span functional silos, as well as applications that touch older, slower moving components. Managing the many dependencies can cause slowdowns. See how to achieve continuous delivery in the enterprise.
WebRTC is bringing significant change to the communications landscape that will bridge the worlds of web and telephony, making the Internet the new standard for communications. Cloud9 took the road less traveled and used WebRTC to create a downloadable enterprise-grade communications platform that is changing the communication dynamic in the financial sector. In his session at @ThingsExpo, Leo Papadopoulos, CTO of Cloud9, discussed the importance of WebRTC and how it enables companies to focus...
Up until last year, enterprises that were looking into cloud services usually undertook a long-term pilot with one of the large cloud providers, running test and dev workloads in the cloud. With cloud’s transition to mainstream adoption in 2015, and with enterprises migrating more and more workloads into the cloud and in between public and private environments, the single-provider approach must be revisited. In his session at 18th Cloud Expo, Yoav Mor, multi-cloud solution evangelist at Cloudy...
Aspose.Total for .NET is the most complete package of all file format APIs for .NET as offered by Aspose. It empowers developers to create, edit, render, print and convert between a wide range of popular document formats within any .NET, C#, ASP.NET and VB.NET applications. Aspose compiles all .NET APIs on a daily basis to ensure that it contains the most up to date versions of each of Aspose .NET APIs. If a new .NET API or a new version of existing APIs is released during the subscription peri...
Security, data privacy, reliability, and regulatory compliance are critical factors when evaluating whether to move business applications from in-house, client-hosted environments to a cloud platform. Quality assurance plays a vital role in ensuring that the appropriate level of risk assessment, verification, and validation takes place to ensure business continuity during the migration to a new cloud platform.
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
Ovum, a leading technology analyst firm, has published an in-depth report, Ovum Decision Matrix: Selecting a DevOps Release Management Solution, 2016–17. The report focuses on the automation aspects of DevOps, Release Management and compares solutions from the leading vendors.
Continuous testing helps bridge the gap between developing quickly and maintaining high quality products. But to implement continuous testing, CTOs must take a strategic approach to building a testing infrastructure and toolset that empowers their team to move fast. Download our guide to laying the groundwork for a scalable continuous testing strategy.
Adding public cloud resources to an existing application can be a daunting process. The tools that you currently use to manage the software and hardware outside the cloud aren’t always the best tools to efficiently grow into the cloud. All of the major configuration management tools have cloud orchestration plugins that can be leveraged, but there are also cloud-native tools that can dramatically improve the efficiency of managing your application lifecycle. In his session at 18th Cloud Expo, ...
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.