Meet ContextCheck: Our Open-Source Framework for LLM & RAG Testing! Check it out on Github!

in Blog

June 26, 2020

Data Science and Machine Learning – Comparison

Author:




Edwin Lisowski

CSO & Co-Founder


Reading time:




8 minutes


In our past articles, we were comparing machine learning with other data-based technologies–Artificial Intelligence and Deep Learning. Today, however, we are going to reach the roots of AI and talk about data science. You see, data science is at the very beginning of every AI-related project and algorithm, machine learning ones as well. How so? And what is the difference between data science and machine learning? You’ll find out in a minute.

We ought to begin by explaining data science itself. It’s a crucial and absolutely essential element of AI. Without data science, we wouldn’t have business intelligence, machine learning, deep learning, computer vision, augmented reality, and many other fascinating technologies that drive our world and make it better, quicker, and more efficient. You can think of data science as an umbrella that encompasses all these technologies.

Difference between data science and machine learning

Generally speaking, data science is a field of study that aims to extract meaning and insights from data. Typically, we talk about business-related information here. Such data can be used in many business, educational, and commercial endeavors.

Let’s stop here for a second and think what kind of data are we talking about? The list is very long and by no means can be squeezed into just one article. But let’s check out a couple of examples:

  • Social media data (users’ activity, popular posts, types of comments, shares…)
  • Google (user’s activity, most popular and trending searches)
  • Your website (your customers’ activity, sections they visit, contact details they leave)
  • Healthcare (patient data, diseases, drugs, treatments, medical images)
  • Finances (stock data, banking data, currency exchange rates)
  • HR (employees, salaries, sick leaves, efficiency)
  • Transportation (flights, vehicles, traffic accidents)

Transportation, data science

And the list goes on and on. We’ve barely scratched the surface here! Moreover, you have to remember that data can be divided into three main categories:

Unstructured data

It’s data that lacks a specific form or structure. Email is an example of unstructured data. So is a medical image (for instance, an x-ray of a broken leg). Because it doesn’t have any specific structure, it’s challenging and time-consuming to process and analyze unstructured data. Currently, we do have various applications and algorithms that help us classify and analyze unstructured data. The most straightforward example is Google Images, a search engine that allows you to search for a picture identical or similar to yours. It’s a great example of how we can analyze unstructured data much quicker.

Semi-structured data

It’s the most complicated type of data. It contains unstructured and structured elements alike. Let’s get back to our email example. It can be an example of both unstructured and semi-structured data. If you write an email with just a text or a picture–it’s an unstructured type of data. But, you can add an attachment containing, let’s say, an MS Excel sheet with information about orders from the past year. In such a situation, we deal with semi-structured data. The email contains both unstructured and structured elements.

Structured data

It’s the simplest and most flexible type of data. It refers to data that can be processed and stored in a fixed format. Structured data is a highly-organized type of information that can be readily and seamlessly stored in a database and accessed from it. Again, every MS Excel sheet is a structured type of data. Almost everything that can be put in a table, listed or organized, is structured.

One more thing you need to know is that we stop talking about data at some point and start calling it big data. Big data is still the same data, but this term refers to information that is so voluminous that it cannot be processed or analyzed using conventional data processing techniques.

With this foundation developed, we can talk about data science and data analysis.

Structured data, Excel

Data science

Data science is a blend of various tools, algorithms, and machine learning principles with the goal of discovering hidden patterns in the raw data[1]. The thing is, you can possess massive amounts of data, but until it’s cleaned, processed, and analyzed–it’s useless. Big data can make an impact on your company only when it’s thoroughly analyzed and ingested. Furthermore, you always have to compare your new findings and insight with the past ones. By doing so, you can spot trends and patterns, and that’s what you need.

Data science comprises many operations, i.a.:

  • Cleansing
  • Preparation
  • Analysis
  • Retrieval
  • Transformation
  • Ingestion

Operations that need to be done depending on the way you store your data. Companies typically use data warehouses and data lakes. Although these are different solutions, they have the same goal–to store big data effectively and ensure that it’s readily available at every stage of your project.

Data scientists gather data from various sources and try to extract critical information from them. Next, that insight can be used in many AI-related projects, or just to improve the efficiency of a company. Data science is responsible for bringing structure to big data, searching for patterns, and improving the decision-making process to implement changes and adjustments that will suit the needs of a given organization.

Companies use data science to build recommendation engines, predict users’ behavior, predict future sales, asses stock prices, and many other tasks.

Now, the question arises–where is the place for machine learning in all this?

stock prices analysis

The relation between data science and machine learning

Simply put–machine learning could have never come into existence without big data and data science. But let’s start from the beginning. Machine learning, in general, refers to a group of techniques and methods that allow computers and other machines to learn from data.

As a result, after just an initial training, ML algorithms and applications can perform their tasks without human assistance or additional programming.

This means that data science is a base for every ML project. If you want to program and train machine learning models, you need datasets they can operate on. And data science is responsible for designing and optimizing these datasets so that they can be useful in various AI-related endeavors, among many, machine learning.

data science and machine learning datasets

What does training an ML model look like?

It’s a massive simplification, but more or less the typical process looks like that:

  1. You input data into the algorithm. The user tells the machine what the features or independent variables (input) are and what is the expected output (dependent variable). In other words–you give the ML application bricks and train it to build a building out of them.
  2. The machine learns correlations between the independent and dependent variables present in the data. Provided data is called the training set.
  3. Once the learning phase or the training is complete, the ML model is tested on a new dataset–a  piece of data that the model has not encountered before. This new dataset is called the test dataset.
  4. When our model is trained and verified that it gives reliable and accurate results, it’s being deployed to a production setup where it will be used against more new datasets. The model is trained in such a way so that it can analyze these new datasets all on its own and learn from them in order to improve itself in time.

As you can see, data plays a crucial role in this process and is present at every single stage of training an ML model. So this is the exact relation between machine learning and data science:

Data science provides a necessary foundation that allows machine learning specialists to build and train their machine learning models and algorithms.

machine learning models and algorithms

Data science and machine learning – conclusion

Today, machine learning is being applied actually everywhere, both in B2B and B2C companies. And no wonder, because there are many benefits to that solution:

  • It allows companies to cut costs (machines can perform tasks previously reserved for human workers)
  • Allows them to make more informed decisions (ML is broadly used in business intelligence and business data analysis)
  • It offers much higher precision (for instance, ML predictions in the medical field are typically more accurate than human)
  • Increases effectiveness (no surprise here, machine learning algorithm are much quicker than human employees, they can operate 24/7, they are never tired or sick, they are never on vacation)

If you want to find out more about machine learning and see how it’s applied in our world, we encourage you to read two of our other articles–about machine learning techniques and machine learning in marketing.

If you are interested in data science or machine learning solutions – Addepto is at your service. We help companies with every AI-related challenge and project. We are keen to show you all of the benefits that await you around the corner called AI. Drop us a line or give us a call and find out how your work can be enhanced.

References

[1] Hemant Sharma. What Is Data Science? A Beginner’s Guide To Data Science. Nov 25, 2020. URL: https://www.edureka.co/blog/what-is-data-science/. Accessed Jun 26, 2020.

 



Category:


Machine Learning

Data Science