What is Data Science?

  • By
  • July 29, 2019
  • Data ScienceMachine Learning
What is Data Science?

What is Data Science? An easy introduction to Data Science terminologies.

What is Data Science?

So you want to be a “data scientist”?

A data scientist has been called “the sexiest job of the 21st century,” probably by someone who has never visited a fire station. data science is a buzzword, and it doesn’t take a great deal of investigating to find analysts breathlessly prognosticating that over the next 10 years, we’ll need billions and billions more data scientists than we currently have.

But what is data science? Josh Wills of Cloudera stated as “A data scientist is a person who is better at statistics than any software engineer and better at software engineering than any statistician.” In contrast, complementing data scientists are business analytics people, who are more familiar with business models and paradigms and can ask good questions of the data.

The concept “Data scientist” was coined by D.J. Patil. He was the Chief Scientist for LinkedIn. In 2011 Forbes placed him second in their Data Scientist List, just behind Larry Page of Google.

The World Wide Web inventor Tim Berners-Lee is often quoted as having said, “Data is not information, information is not knowledge, knowledge is not understanding, understanding is not perception.” This quote suggests a kind of pymarid, where data are the raw materials that make up the foundation at the bottom of the pile, and information, knowledge, understating and wisdom represent higher and higher levels of the pymarid. In one sense, the major goal of the data scientist is to help people to turn data into information and onwards up the pyramid.

For Free, Demo classes Call:  8605110150

Registration Link:Click Here!

Data Science refers to an emerging area of work concerned with the collection, preparation, analysis, visualization, and preservation of large collections of information.

Data Scientist Active Role?

Data Scientists play active roles in the design and implementation work areas of 4 A’s related data:

  • Data Architecture,
  • Data Acquisition,
  • Data Analysis and
  • Data Archiving

The architecture of Data Science:

Data Science comprises of several domains, so computer science is one domain which is needed to be a good data scientist apart from that math as well as business expertise so you should understand what your business does? How it is performing? , you should also understand the linear algebra, the statistical programming techniques and if you are able to cover all these domains you would be able to do Data Science.


Domain Knowledge: The data scientist must quickly learn how the data will be used in a particular context.

Communication: A data scientist must possess strong skills for learning the needs and preferences of users.

How data can be represented: Data Scientist must have a clear understanding of how data can be stored and linked.

Data Transformation and analysis: When data become available for the use of decision-makers and he must know how to transform, summarize and make an inference from the data.

Presentation: Numbers often have the edge in precision, a good data display can often be a more effective means of communicating results to data users.

Machine learning: Machine learning is the ability to learn without being explicitly programmed.

Focus on Quality: No matter how good a set of data may be, there is no such thing as perfect data. Data Scientist must know the limitations of the data that they work with, know how to quantify its accuracy, and be able to make suggestions for improving the quality of the data in the future.

 Biggest confusion between AI, Machine learning, Data Science and Deep Learning

One of the big confusions for many of our potential student is what is the difference between artificial intelligence, machine learning , deep learning and data science as these terms are maturing today some of these terms are not still concrete and there is a lot of media hype where people use these terms interchangeably so we wanted to clarify and throw some light at what is the difference between some of these terms, of course, the explanation that I’m giving is not necessarily the only correct explanation as some of these areas are very  fast-evolving but let me take a short at explaining what these terms mean okay. First, let’s go and understand what is Artificial intelligence is,

What is Artificial Intelligence?

Artificial Intelligence is a broad area which enable computers science that makes machines seem like they have human intelligence.


ds 2

What is Machine learning?

Machine learning is a subset of Artificial intelligence which provides machines the ability to learn automatically and improve from experience without being explicitly programmed.

What is Data Science?

Data Science is a science which uses computer science, machine learning to learn, interact,  interpret and visualize the result.

What is Deep Learning?

Deep learning is a much more recent area which has taken shape since 2006 and this is all about using something called multi-layer neural networks, right now a huge impact of AI what people refers to as Google’s AI systems they are mostly referring to Google’s deep learning systems, so some of the most important advances in AI in the last 10 years have been happening in this small sub-area called deep learning.

Let us move forward and see what exactly is Machine Learning,

Machine learning is a technique to teach programs that use data, to generate algorithms instead of explicitly programming an algorithm from scratch. It is a field of computer science that originates from the research into artificial intelligence.

For Free, Demo classes Call:  8605110150

Registration Link: Click Here!

Here are some of the terminologies that will be used in Machine learning :

  • Features: This refers to distinctive traits that help define the outcome
  • (Case ) Samples: It could be an image, audio, document or a CSV file
  • Feature extraction: This refers to the processing of a feature vector where data is transformed from a high-dimensional space to a lower-dimensional space.
  • Training set: This refers to a set of data that discovers potentially predictive relationships
  • Testing set: This refers to a set of data that tests out predications
What is machine learning?

So basically machine learning algorithms allow us to build self-learning machines that evolve by itself without being explicitly programmed. Now based on user behavior data patterns and past

experience it makes important future decisions and I consider it personally as a descendant of artificial intelligence.

Machine Learning finds its application in various fields, it can be used for data mining, natural language processing, image recognition, robotics development etc.

There is a huge list of applications of machine learning so I hope you all are clear with what exactly is machine learning if you have any doubts or questions you can ask me.

So let’s move forward and see what are the various steps in order to implement machine learning so these are basically general steps that I’ll be focusing on ,

These are the steps that are involved in machine learning so we start with collecting data because you can not actually train your machine without data so you need to provide certain data where you connect that data and then clean it so in this case you actually make sure that your data has all useful fields so there are no other useless feeds that you don’t actually require, it can be certain columns or anything basically  after that,

You analyze your data that perform certain analysis on it now. you can take the example of e-commerce website so over there what happens whatever things you search or whatever your past

For Free, Demo classes Call:  8605110150

Registration Link:Click Here!

Purchase history is there right so it makes certain analysis on that suppose you are searching watches right so it will make the analysis that what kind of watches you are searching, you might be searching for smartwatch or you might be actually searching for a Rolex if you can afford that so basically it will be analyzing all those things and then what happens is train the algorithms based on that analysis right after that it will test the algorithm how it is going to test by providing certain set of data they know the actual output and they are going to test whether the algorithm gives us the output which is equal to the actual output and finally you use it so these are general steps that are involved in machine learning so if you have any doubts till there you can ask me.

Machine learning Process


Applications of Machine Learning:

Now, we will like to look at some applications of the machine learning system.

  • Medicine – Diagnosis in Medical Imaging, diagnosis a disease.
  •  Recommendation:

When you visit Amazon or Flipkart so based on your previous history they recommend you certain products. these are actually done by machine learning.

  • Weather forecasting:

Analyzing huge sensor data and then predicting the outcome for example,

Whether forecasting what will happen as we can see in this diagram is well if there’s a guy who’s giving the weather forecast for the day so basically, these kind of calculations are not possible


Manually so you need your machine to actually use this kind of application for calculation so this is another use of machine learning now.

For Free, Demo classes Call:  8605110150

Registration Link:Click Here!

  • Computer Vision:

Another domain is computer vision was given an image; you want to find out what objects appear in an image and where the objects appear in an image.

  • NLP – Sentiment Analysis
  • Forecasting product sales quantities taking seasonality and trend into account.
  • Optimizing product location at a supermarket retail outlet.
  • Fraud detection
  • Determine whether or not someone will default on a home mortgage.
  • Netflix movies recommendation
  • Self Driving cars on the road.
  • Amazon Product recommendations
  • Accurate result in Google Search
  • Speech recognition in your smartphone

Let us move forward and see various types of machine learning

Types of Machine Learning:

Machine learning is broadly categorized into three categories:

  • Supervised Learning – Using Labeled data, to create a classifier that can predict the output for unseen inputs.

In this technique, a model is able to predict with the help of labeled dataset. But what is the labeled dataset, the dataset which you already know the target answer called labeled data?

Supervised Learning is basically of two types:

1.1) Regression: When the output variable is continuous.i.e. that is where a change in one variable is associated with a change in other variables.

1.2) Classification: When the output variable is categorical i.e. with 2 or more classes.

Ex – predicting housing price from a given dataset of some or lot of given existing house’s price, recommendation system etc.

  • Unsupervised Learning – Using Unlabeled data to create a function that can predict the output.

Unsupervised learning is basically of two types:

2.1) Clustering: The method of diving the objects into clusters which are similar between them and are dissimilar to the objects belonging to another cluster.

2.2) Association: Discovering the probability of the co-occurrence of items in a collection.

Ex – understanding handwritten digits, clustering websites based on particular words count on each webpage to understand what those web sites are actually talking about etc.

  • Reinforcement learning – It is the ability of an agent to interact with the environment and find out what is the best outcome.

For Free, Demo classes Call:  8605110150

Registration Link:Click Here!


The most popular Machine Learning languages:

  • Python – is a general purpose, high level, object oriented and functional programming language.
  • R – is a statistical programming language.
  • SAS – is a software suite developed by SAS Institute for advanced analytics, multivariate analysis, business intelligence, data management, and predictive analytics.
  • Matlab – is a multi-paradigm numerical computing tool.
  • SQL- stands for a structured query language, this programming language is designed to interact with databases. SQL is another must-learn language for data scientists for data creation.


Data Analysis and visualization libraries:

  • Pandas – powerful data analysis library.


  • Matplotlib – data visualization library.


  • Seaborn – Seaborn is a Python data visualization library based on matplotlib.

For Free, Demo classes Call:  8605110150
Registration Link:Click Here!

Frameworks for general machine learning :

  • Numpy –an extension package for scientific computing with python
  • Scikit-learn – Machine learning framework
  • Scipy – Scientific library

3 free Python IDE for Machine learning

  • Anaconda
  • Pycharm and
  • Visual studio

Why Python for Machine Learning

We have C, C++, Java, and .net still Python – why?

Because – 1) Python has been in the market for a very long time and its community is very big. Thus,    it is easy to find Python developers and supports.

  • Python is more productive.
  • Large community
  • Supports AI libraries.
  • Simple and easy to learn.
  • It has more than 150000 open source libraries.
  • Python is a dynamically typed programming language.


Do I need Hadoop to be a Data Scientist?

According to CrowdFlower survey of 3490 Linkedln Jobs – “Apache Hadoop has been ranked as the second most important skill for a data scientist with 49% rating”. A data scientist job is not to build a Hadoop cluster or administer a Hadoop cluster, they should know how to glean valuable insights. Different people use different tools for different things.  Data Science is a general term. It is widely used, but it is not the only platform that is capable of managing and manipulating data, even large scale data. A Data scientist should be familiar with concepts like Spark, hive, pig MapReduce, distributed systems, distributed file systems, and the like, but I wouldn’t judge someone for not knowing about such things.

Let’s understand the Roles available for Machine Learning Engineer across all domains:

  • Data Scientist
  • Machine learning Engineer
  • Data Engineer
  • Data Analyst
  • Decision Scientist
  • Software Developer
  • Data Architect

What are the objectives of Machine learning certification Training using Python?

                After completing this course, you should be able to :

  • Work on real-time data
  • Automate data analysis task using python
  • Split data into train and test.
  • Understand Machine learning algorithms
  • Evaluate Machine learning algorithms.

What are the prerequisites to learn Machine learning?

This course can be taken by anyone with a working knowledge of a modern programming language like C/C++/Java/Python.


Please feel free to leave your comments in the comment box so that we can improve the guide and serve you better, also, follow sevenmentor.com to get updates on new blogs.

If you wish to learn Machine Learning Algorithms  such as linear regression, logistic regression, decision tree, kNN, random forest, Svm and build a career in Machine Learning then check out our course on sevenmentor.com

Name – Harshal Patil,
Designation – Python/ Data Science Trainer


For Free, Demo classes Call:  8605110150

Registration Link:Click Here!

Call the Trainer and Book your free demo Class now!!!

call icon

© Copyright 2019 | Sevenmentor Pvt Ltd.

5 thoughts on “What is Data Science?

  1. I feel very grateful that I read this. It is very helpful and very informative and I really learned lot from it.

  2. Very precise quick tutorial for those who want to gain insight of machine learning

  3. Nicely complied. Every explanation is crystal clear and very easy to digest. Thanks for sharing knowledge.

  4. Wonderful! Really helpful

  5. This is most important and useful article for those who are fresher from statistics and engineering background and they won’t join such course.
    Thanks you so much Harshal sir for sharing such important information.

Submit Comment

Your email address will not be published. Required fields are marked *