Data Science Course Certification For Freshers
What is mean by Domain in Data Analysis?
Data science is a big umbrella and includes many technologies, tools and administrative part. Data is the new gold in the current era. Generation of the data is increased tremendously after 2012. Each and every click is generating the data. Handling, storing and analyzing data is the new challenge. This blog emphasize more on domain analysis of the data.
So what is analysis?
Data analysis always starts with the problem statement. For example an “XYZ” company want to start the new product. Before starting of the new product the company has to do the market survey. Market survey is based on collection of the data, which can have hundreds to billions of records. Analysis of each and every case with various permutation and combination may lead to take proper decision. Doing the same process again and again is time consuming and tedious task. Tools for analysis will be helpful for better insight with visualization of the data.
Skills required for Data Analysis:
Statistics/Mathematics: There are some prerequisites to become a data scientist/data analyst. It’s mandatory to have a good knowledge about the statistics. There are various mathematical formula/techniques for the better understanding of the data. The job of data analyst is to interpret of the result in proper way.
Domain: What is the domain? The major reason for popularity of the Data science is the domain. Data science or data analysis can be used in every domain.
- Stock market: Forecasting the stock price for next few days or months and finding out which stock will return better results.
- Business: The right business decision help to achieve the targets, revenue. However it helps to meet customer expectation along with reducing expenses. Analyzing complaints help to minimize risk factors in business.
- Marketing: where can we use data analysis skill? And the answer is finding out best price for the product. Marketing campaigns analysis. Content Marketing.
- Market mix modelling: Market Mix Modelling (MMM) is statistical analysis of multiple factors affecting on the sales and marketing with respective to time.
- Supply chain: Managing the flow of goods and services. Generating request before product is out of the stock. This leads to continuous supply of the inventory without affecting the daily business.
- HR analysis: Payroll analysis, Performance analysis, Analyzing Attrition rate.
- Agricultural research: Manipulating the environment like irrigation, pest control, crop rotation, and usage of fertilizer. Finding out clusters based on environment for cultivation of particular crop. Sensor based monitoring of animals and crops in farm. Monitoring of the sericulture.
- Weather forecasting: Collecting data of temperature, humidity and wind. Day today collection of the data help to track the environmental evolution, changes in the atmosphere. Accurate weather prediction help farmer to plan cultivation of crops. Aeronautical engineers may decides route for flight.
- Oil and gas refinery: Digital transformation of oil and gas industry is based more on automation of the process. Analyzing the asset performance help to reduce unplanned downtime and increase asset utilization asset performance and predictive analytics.
- Health care: Computerizing medical records for drug discovery, genetic disorders, finding out trends in diseases and life style affecting health, reducing expenses on pharmaceutical products.
- Fashion: In fashion service data can be collected from different social media or E-commerce site using web-scrapping technique. This data will help designers to create trending item.
In short whatever the domain is, data analysis plays vital role.
Steps in analysis:
Define the problem statement: Data analyst should have clear idea about the problem statement before starting of the analysis.
Collection of the data: Data can be collected in various forms. Information can be retrieved from database, website, or any raw material. Data can be stored and analyze in Microsoft Excel, Spreadsheet and Database.
Database: In database there are two main type RDBMS and NoSQL Database. These database will help to store information on the server and can be accessed by client. There are many Relational database having tables. Structural Query Language (SQL) is used to retrieve information from RDBMS. NoSQL is used to retrieve the information from data stored in dictionary format. ETL tools can be used for extraction, transfer, load the data from one database to another database.
Preprocessing of the Data: Before starting of the preprocessing, we need to analyze the data. Analysis is based on central tendency of the data.
Analysis of the data:
Tools: Microsoft excel is providing plug-in for analysis of data. Power BI, Tableau are visualization tool. Python, R, VBA are open source scripting languages. Many companies have developed their own tools for analysis. E.g. IBM WATSON
Types of Statistical analysis of the data:
Data is described with the help of statistical measurements. There is population is the set of similar events. And sample is the part of population
Descriptive Statistics: Presenting, organizing and summarization of the data. Describing the system based on central tendency, spread of the data and visualization.
Central Tendency is measured with:
- Mean :- average of all similar type of data
- Median:- The record which divided the data into equal part.
- Mode:- Frequency of the data is measured in terms of mode
Spread of the data is measurement of dispersion:
- Variance:- To calculate the spread of the data members is the variance.
- Standard deviation: – square root of variance is the standard deviation.
- Range :- range is the data between minimum and maximum value.
- Inter-quartile range:- Data is having minimum value, 25%, 50% , 75% and maximum values of the total data.
IQR = Q3-Q1
Data Visualization: Visualization is the graphical presentation of the data. Graph gives insights of all the data and provides all information at-a-glance. List of few graphs are mentioned below:
- Bar chart
- Pie chart
- Scatter plot
- Box-whiskers plot
- Stack plot
- Heat map
And there are many more graphs. The dashboard is the type of graphical user interface often provides bird eye view for the company.
Predictive Analysis: Machine learning algorithm used for prediction analysis. More about predictive analysis is explained at https://www.sevenmentor.com/blog/category/data-science/
Prescriptive Analysis: Prescriptive analysis is more towards risk handling. Prescriptive analysis is depends on whatever prediction arrived from Predictive analysis. Domain knowledge is more important for the prescriptive analysis.
Designation: Software Trainer
Company: Seven Mentor Pvt. Ltd.
Call the Trainer and Book your free demo Class for now!!!
© Copyright 2019 | Sevenmentor Pvt Ltd.