Machine Learning: Data Preprocessing

  • By Aniket Kulkarni
  • March 31, 2023
  • Machine Learning
Machine Learning : Data Preprocessing

Machine Learning: Data Preprocessing

Data preprocessing is preparing raw data suitable for machine learning models. It is the most important status. Are you interested in Machine Learning: Data Preprocessing, one of the hottest fields in technology today? Our Machine Learning Course in Pune is the perfect way to dive deep into this exciting field and gain hands-on experience with cutting-edge tools and technologies.

Data Preprocessing involves the following status

  1. Data Import 
  2. Data Exploration
  3. Data Wrangling
  4. Data Manipulation
  5. Encoding Categorical Data
  6. Splitting Data Set
  7. Feature Scaling


For Free, Demo classes Call: 7507414653
Registration Link: Click Here!


Methods of Machine Learning: Data Preprocessing


  1. Data Import

Before starting with the dataset, the first step is to load the dataset.

Code : 

df  = pandas.read_csv(“filepath”) : to load .csv file

df  = pandas.read_excel(“filepath”) : to load .xlsx file

df  = pandas.read_json(“filepath”) : to load .json file

df  = pandas.read_sql(“filepath”) : to load .sql file


  1. Data Exploration Techniques 
    1. Dimensionality Check : df.shape
    2. Type of Dataset : type(df)
    3. Slicing and Indexing : df.iloc[ : , : ]
    4. Mean : df.mean()
    5. Median : df.median()
    6. Mode : df.mode()
    7. Identifying Unique Elements : df.unique()
    8. Value Extraction : df.values()


  1. Data Wrangling
    1. Missing Value and missing value treatment.
    2. Inconsistent Data
    3. Presence of noisy Data
    4. Developing a more accurate model.


For Free, Demo classes Call: 7507414653
Registration Link: Click Here!

Machine Learning: Data Preprocessing

  1. Data Manipulation 


A data object is a two dimensional data structure on which following functions can be applied.

  1. Returns first n rows : df.head()
  2. Returns last n rows : df.tail()
  3. Returns actual data in a series : df.values()
  4. Returns data frame in groped format :
  5. Concatenate combines two or more data structures : df.concatenate()
  6. Merging is the Pandas operation that perform database joins on object 


  1. Encoding Categorical Data


The machine learning model completely works on mathematics and numbers, but if our dataset has a categorical variable, then it may create trouble while building the model.

To perform it we apply LabelEncoder() and One Hot Encoding.


  1. Splitting Data Set

In machine learning data preprocessing, we divide our dataset into training and testing datasets.


For Free, Demo classes Call: 7507414653
Registration Link: Click Here!


  1. Feature Scaling 


It is a technique to standardize the independent variable of the dataset in a  specific  range



Aniket Kulkarni

Call the Trainer and Book your free demo Class for Machine Learning now!!!

© Copyright 2021 | SevenMentor Pvt Ltd.

Submit Comment

Your email address will not be published. Required fields are marked *