As a Machine Learning engineer, it is always important to find that the data are correlated or not. “Correlation is not causation” is a statistics mantra. Wherever statistics appear, correlation and causation appear too.
But today, in this article, we will see how correlation and
causation affect Machine Learning.
"Correlation does not imply causation" means that if two things correlate does not mean that one causes the other. Let's see an example to understand it batter.
Here we are going to understand it with a famous example - "Ice-cream sales correlated with Homicides in New York City."
The researchers of New York did the study on ice-cream sales
and death rates. And they find as the ice-cream sales increase, the homicides
rates increase and vice-versa.
Here, Ice-cream sales correlated with homicides.
But, statistics say that correlation does not imply causation. It means ice-cream sales and homicides are correlated, but there is no causal relationship between them.
Let's prove it.
Statement: Ice-cream is not causes deaths.
From the given data, we can easily say that both are correlated because we are not able to see under the cover. The more limited the information we have, the more we are forced to examine correlations. Likewise, the more information we have, the more transparent things will become, and the more we will be able to see the actual casual relationships.
What are the whole activities based on?
If you have sufficient data, then you able to find the cause of both activities. In this example, the matter of causing both things are the weather. Yes, right.
After properly observed all things, we can say 'Sunny' weather is fit into this situation.
As temperature increases, more people buy the ice-creams, and as more people go outside. These will cause many reasons for people's death. Like- accidents, health issues, etc.
In many cases, there is a third factor called "Confounder", which affects both things.
In our case, the weather is a hidden factor, which affects both things.Conclude:
There is a correlation between ice-cream and rate of homicide but no causal relationship. Sunny weather is affecting both together. Also, both have individual causal relationship with weather.
But here is a thing. Don’t conclude too fast. Take a time to analysis your data and find other underlying hidden factors. After finding each factor verify it and then conclude.