# Statistics for Data Science (sets and Probability):

• By
• February 9, 2021
• Data SciencePython

Greetings to eager learner!!!

Concepts of statistics will enhance the understanding of data. Statistics is another mile stone to become Data analyst and Data Scientist. This blog is totally based on the probability. Before moving to the mathematical formula of probability we will understand the concept behind the probability and where can we use probability. Probability is describing event and the ways of interaction with each other.

Every event has set of outcomes.

E.g. 1) check if the value is even or odd.

2) Result after tossing the coins

There are many real time scenarios where we can check the probability.

Here are some characteristics about set:

1. For creating set it’s not mandatory to provide only numeric values. Set can hold strings also.
2. Any set can be either empty or having value in it i.e. empty set or non-empty sets.
3. Non Empty set can be finite and infinite.

If the element (x) is the part of set (A) then the same statement can be written as: x ϵ A

The same statement can be read as x is the element of set A.

If the same expression is as A ϶ x

then read the statement as set A contains x element.

1. If A ,  x is not the element of A.
2. If  A ∌x  A does not contain x
3. Generalized statements about multiple elements:

ꓯ represents  for all / any elements in the set.

ꓯ x ϵ A

The above syntax represents for all x elements in set A

1.  ꓯ:  colon represents we want to make statements about the specific group of elements within a set. For example for all x in set A, such that, x is even. Same example can be represented as ꓯ ϵ A : x is even.
2. Subset: If every element of A is the part of B then we can say Set A is the subset of B. and the same can be represented in the mathematical formula as: A⊆B
3. Every set contains 2 subsets

i.e A⊆A A is the subset of A itself  and another is null set is the subset of set A.   ∅⊆A

Now we will see different operation on multiple events. Operations for events are as mentioned below

1. Never touch: Here set A and set B are not intersecting with each other. That means two sets never happens simultaneously. If event A is occurring that means event B is not occurring and vice versa.

e.g. in cards if we are getting card as diamond then it cannot be heart simultaneously. 1. Intersect: Here set A and B is intersecting each other. That means two events can occurs at same time. e.g. set A is having set of all diamond cards and set B is all set of Queen.  And intersection of set A and Set B results in possibility that card is diamond queen.

1. Complete overlap(subset) : set B is the subset of set A. e.g. set A is set of all red cards and set B is the set of all diamonds. Here we have just seen for only two sets and subsets there can be multiple subsets.

Intersection

A and B happening at the same time can be checked using intersection. All outcome from intersection are favorable for both A and B.

A∩B

There is the concept of empty set.

E.g. intersection of all diamond cards and heart cards is the empty set. That means there is no outcomes that satisfies both the set simultaneously.  And it can be represented as : A∩B=

If there are set of red cards A and set of all diamonds B, intersection of A and B represents set of B

A∩B=B

Union:

Whenever we need values either from set A and set B. And it can be presented as : A⋃B

How can be the outcome in three cases:

Case 1: set A and set B does not meet at all. None of the elements is common in both set hence we can get the output as  :  A⋃B= A+B

e.g. red card is the union of diamond card and heart card.

Case 2: if two sets are intersecting each other.  A⋃B= A+B-A ∩ B

Independent and dependent events:

Independent events: Probability is unaffected by other events. E.g flipping the coin. Previous throw of coin does not effect on the result of next flipping. Both events are totally independent. And every flipping has 50 % of chance getting head.

Dependent events: Probability of dependent event varies based on conditional changes. E.g. probability of getting diamond queen from the set of cards is p ( Q ) = 1 / 52

Probability of getting queen from set of all diamond cards is p ( Q ) = 1 / 13

Probability of getting queen from set of all Queens is p ( Q ) = 1 / 4

The probability of getting A if we are given that B has occurred p ( A | B ). We can read as A given B

If P ( A ) = P ( A | B )

if two events are independent ? P ( A ∩ B ) = P(A) * P ( B )

In the data survey of pet owner, we have to find the probability if randomly selected person is male and own a pet?

Formula used is :

P ( Male | Pet Owner ) = P ( Male ∩ Pet Owner ) / P ( Pet Owner )

Figure out P ( Male ∩ Pet Owner ) from the table. The intersection of male/pets (the intersection on the table of these two factors) is 0.41 or 41%.

Figure out P (Pet Owner) from the table. The “Total” shows 86% (0.86) of individuals had a pet.

Insert all derived values into the formula:

P ( Male | Pet Owner ) = P ( Male ∩Pet Owner ) / P( Male )

= 0.41 / 0.86 = 0.477, or 47.7%. Probability can be explained by the Multiplication rule and Bayes Theorem. More related to probability we will discuss in the next blog.