# Exploratory Data Analysis

• By
• January 2, 2023
• Data Science

# Exploratory Data Analysis

Data Exploration by using Scatter Plots
• Before you start the modeling for two continuous variables, you need to decide the pattern for these two variables. This pattern is called as
“Functional Form” of the relation between two continuous variables.
• We need to check whether two variables are linear or not.
• Check the relationships between two variables.
• We can achieve the above tasks either by using scatter plots or by using correlation statistics between two variables.
• We have already seen the scatter plot in picturing the distribution.

Correlation measures the degree of linearity between two continuous variables. Data Exploration by using Scatter Plots are useful to accomplish the following:
– explore the relationship between two variables
– locate unusual observations
– identify the possible trend
– identify the basic range of X and Y
– communicate the data analysis

Through Scatter Plot, we can identify the series of possible trend between two continuous variables and based on that trend, we can use or transform the variable in our modeling work. SevenMentor Best Data Analytics Training in Pune essentially focused on novices just as working experts.

## Correlation Analysis

• Correlation measures the degree of linear association between two
variables. A common correlation statistic used for continuous variables is
“Pearson Correlation” Its value is as follows:
– between -1 to 1
– closer to either extreme if there is a high degree of linearity
– close to 0 if there is no linear association
– greater than 0 if there is a positive linear association
– less than 0 if there is a negative linear association

## Correlation Analysis

The null hypothesis for a test of the correlation coefficient is p=0. Rejecting the null
the hypothesis only means that you can confident that true population correlation is
not 0 Small p-values can occur because of very large sizes.
Even a correlation coefficient of 0.01 can be statistically significant with large
sample sizes. Therefore, it is important to look at the value of r itself to see
whether it is meaningfully large or not…