top of page
Search

10 Data Science Platforms and Tools to Shape Your Future.

Over the past few months, we’ve released a number of articles looking at the top skills in data science-related fields, such as NLP, machine learning engineering, and others. As we wrote each one and looked at the data from 25,000 job descriptions, we noticed a number of common data science platforms that each specific job role had in common with one another. As such, here’s our rundown of the top 10 data science platforms that employers are looking for in 2022.





Programming


As expected, the most common trend was the languages used by each data science professional. The first two should be no surprise, as Python led each job description by a wide margin, with many descriptions also looking for R users. Scala also saw a decent amount of mentions, largely thanks to its prevalence in big data and usefulness in scaling into large projects.

Related ODSC East 2022 Sessions:

  • Sculpting Data for ML: The first act of Machine Learning: Jigyasa Grover, Machine Learning Engineer and Rishabh Misra, Senior Machine Learning Engineer | Twitter

  • Programming with Data: Python and Pandas: Daniel Gerlanc | Sr. Director – Data Science & ML Engineering | Ampersand

  • Distributed Python with Ray: Hands-on with the Ray Core APIs: Stephanie Wang | Software Engineer | Anyscale

  • Beyond the Basics: Data Visualization in Python: Stefanie Molin | Data Scientist, Software Engineer, Author of Hands-On Data Analysis with Pandas | Bloomberg

The Data Science Key to an Unpredictable Future: Hands-on Guide to Solving Complex Challenges with Python and Gurobi: Ehsan Khodabandeh, PhD | Principal Operations Research Scientist | Decision Spot


Data Engineering


Even when not only looking at data engineering job descriptions, other data science disciplines are expected to know some core skills in data engineering, mostly around workflow pipelines. This includes popular tools like Apache Airflow for scheduling/monitoring workflows, while those working with big data pipelines opt for Apache Spark. Kafka is the only notable streaming platform to make any lists, but it’s the gold standard for real-time analytics and streaming so that makes sense.

Related ODSC East 2022 Sessions:

  • Tutorial: Building and Deploying Machine Learning Models with TensorFlow and Keras: Yong Tang, PhD | Director of Engineering | MobileIron

  • Vector Databases: Bob van Luijt | CEO & Co-Founder | SeMI Technologies

  • An Introduction to Drift Detection: Ed Shee | Head of Developer Relation | Seldon

  • What We’ve Learned Pushing Nearly 100M Hours of GPU Pompute: Daniel Kobran | COO and Co-Founder | Paperspace

  • Automation for Data Professionals: Devavrat Shah, PhD | Professor, Founding Director, Co-founder, CTO | Statistics and Data Science at MIT, IkigaiLabs

Cloud Platforms


A common theme among all job descriptions in 2022 is something that likely wouldn’t have been as dominant 5 or 10 years ago – cloud processing. The only two to make multiple lists were Amazon Web Services (AWS) and Microsoft Azure. Most major companies are using one of the two, so excelling in one or the other will help any aspiring data scientist. Google Cloud is picking up steam, but is likely not going to reach the market dominance of AWS and Azure any time soon.

Related ODSC East 2022 Sessions:

  • Data Science in the Cloud-Native Era: Yuan Tang | Founding Engineer, Co-chair | Akuity, Kubeflow

  • Building and Operating Cloud Native Analytics Systems at Scale: Scott Haines Software Architect Twilio

  • Scaling AI Workloads with the Ray Ecosystem: Robert Nishihara CEO and CoFounder Anyscale

  • Run Azure Machine Learning anywhere in multi-cloud or on-premises: Doris Zhong | Product Manager | Microsoft

  • The Wisdom of the Cloud: Allen Downey, PhD | Computer Science Professor | Olin College and Author of Think Python, Think Bayes, Think Stats and Isaac Slavitt | Co-founder, Data Scientist | DrivenData

Frameworks


A few machine learning frameworks appeared on multiple lists. Both PyTorch and TensorFlow/Keras are still the go-to machine learning frameworks for a number of tasks, largely thanks to their ability to be scale and be used for more resource-intensive tasks like deep learning; these two frameworks aren’t limited to just basic ML. Scikit-learn also earns a top spot thanks to its success with predictive analytics and general machine learning. Knowing all three frameworks cover the most ground for aspiring data science professionals, so you cover plenty of ground knowing this group.

Related ODSC East 2022 Sessions:

  • Quick to Production With the Best of Both Spark and TensorFlow: Ronny Mathew | Senior Data Scientist | Rue Gilt Groupe

  • Tower of Babel: Making Apache Spark, Apache Mahout, Kubeflow, and Kubernetes Play Nice: Trevor Grant | Director of Developer Relations | Arrikto

  • Quantization in PyTorch: Jerry Zhang | Software Engineer | Meta

  • Profiling and Optimizing PyTorch Applications with the PyTorch Profiler: Sabrina Smai | Program Manager | Microsoft

  • Intermediate Machine Learning with Scikit-learn: Evaluation, Calibration, and Inspection: Thomas Fan | Senior Software Engineer | Quansight Labs

Honorable Mention: Hadoop

The Apache Hadoop framework is an ecosystem in itself, as it’s actually a collection of open-source tools. It allows for the distributed processing of large data sets across clusters of computers using simple programming models. This makes it a very attractive platform for general data science and big data.

Learn more about data science platforms and tools at ODSC East 2022

All of these data science platforms, frameworks, and tools will be represented at ODSC East 2022 this April 19th-21st. By registering for ODSC East 2022 – now 30% off – you’ll be able to see all of the sessions mentioned above and more. This includes our virtual Career Lab & Expo where you can see what our hiring partners are looking for and how all of these data science frameworks will help you get a job.


5 views0 comments
bottom of page