Importance of Mathematics in Data Science Hitesh Nayak September 13, 2021

Importance of Mathematics in Data Science

Data science has evolved to become a culmination of various pieces like computer science, statistics, business knowledge and communication excellence, which carries mathematics in its heart. If practiced with the right philosophy, mathematics, as a subject, aids logic development and the mindset to look at distinct options to reach a solution.

Let me divide my point of view into the following parts:

  • Scenarios where mathematical thinking is more important than mathematics itself
  • How mathematics is the centre of Computer science and statistics
  • What are the major areas of mathematics to focus on to get started with data science?
    • How is it important in Neural networks?
    • How is it important for statistical learning algorithms?
  • Answer to the famous question: “Why should we learn mathematics as all we have to do is import a library and get going!”
Scenarios where mathematical thinking is more important than mathematics itself
 
 Let me start with the most dreaded task of this data world i.e., getting data into a form that can be consumed for a purpose. This means creating a database with a good schema for faster use, getting the right variables in the right form to suit analytical functions, and the famous Exploratory Data Analysis (EDA) that takes 99.99% of the time 😁. In any of these tasks what is most needed is the ability to think mathematically.

Let’s say we have a variable that is continuous or categorical, we can use the whole variable itself, but we must have an aggregated form of it. How will we use the variable which is represented by one number or two for a group? Here mathematical thinking and understanding that it can be represented by mean and skewness (taking a simplistic scenario) is important. It will optimise the space of storage; modelling becomes easy as the format is achieved while retaining the essence of data spread.

How mathematics is the centre of statistics and computer science
 
 Let me start with statistics, which is the study of guessing. Therefore, it becomes the laying stone of machine learning and artificial intelligence (It is not a subset of mathematics just because we use numbers, if it were then physics and physical chemistry would also become mathematics😀). Statistics use many mathematical approaches to arrive at a conclusion. For example, understanding how a probability density function works so that we can use the right distribution for a problem statement. Looking at the data we must know how that function integrates to form a curve that is replicative of our data distribution. This is one of the simplest examples – we have many other aspects of statistics where mathematical theory is needed to bridge the distance between data and informed guess.
 

Similarly, let’s discuss computer science. Let me narrow it down to the major aspects of computer science that impact data science – database and using a coding language, e.g. python, to do any machine learning task. Both the cases undergo numerous matrix operations. Approaches to do database operation or any machine learning task implementation without the knowledge of linear algebra can do the work but it would be sub-optimal. That is where the expertise of database performance engineers and machine learning engineers come into play. They optimize and implement with suitable cost and resources.

What are the major areas of mathematics to focus on to get started with data science?

Mentioning all the areas of focus in mathematics for data science is difficult, but these are a few that data scientists need most of the times:

  • Linear algebra
  • Calculus
  • Probability theory
  • Statistics

These four areas broadly cover most of the mathematical dependency a data scientist might have on mathematics. Let me explain some parts of this by taking examples.

How is it important in Neural networks?

It becomes important to have an intuitive understanding of which algorithm to use when we try to understand and implement neural networks. To build that intuition, at least once every enthusiast must go through the grinding derivation of computational graphs leading to the loss function and differentiating the loss function to get a complete understanding of backpropagation. This would help avoid the vanishing gradient issue and to understand the loss function fit to the kind of data.

How is it important for statistical learning algorithms?
 
 When we understand statistical learning algorithm, we try to find the best fit and check which algorithm will work best or give the best accuracy. Like neural networks, we then see how the algorithm arrives at a conclusion or stoppage point. To understand this, we go through various mathematical implementations on our computers. If we don’t understand the base ideology behind it, we can never explain that the algorithm is best for a purpose.

So, mathematics is the centre of any major aspect that data science touches upon.

To give a conclusive remark let me try to answer the rhetoric.

Answer to the famous question “why should we learn mathematics as all we have to do is import a library and get going !”

Well, let me put it with an example. Let’s say we use an algorithm for scene text detection. For this, we can use a CNN model and get the result out. The onus would be on the guy who doesn’t know the maths behind it and keeps on tuning the CNN model to get the accuracy because it won’t serve the purpose alone. There is a certain way of optimising it by merging algorithms which can give better results. Wondering how? Take some time to connect with us and get our use-case on this and know how we solve the problem in no time by finding the right fit.

If you are looking for a starting point for your business, take advantage of our personalized FREE consultation workshop 

Subscribe for more Gen AI updates: Stay ahead with the latest on Gen AI by joining our mailing list.