Logistic Regression is one of those non-intuitive terms. For normal people (as opposed to data scientists), logistics is a familiar term. In general, people understand that it means movement of goods. For those with a software engineering background, regression means running previous test cases with new code to check none of them have failed.

Turns out logistic regression has nothing to do with these notions of logistics or regression.

In the world of statistics, the word regression is used to mean prediction when applied to practical problems. If you have many input variables, and they are combined with a mathematical equation to calculate the value of an output variable, you do regression. One you have an equation or a formula, you can just plug in any values of the input variables, and you derive the value of the calculated variable. That means you can predict. Thus, regression = prediction.

If the prediction is to decide whether a given entity belongs to one of two classes, then the method is binary classification. Typically, you would have an entity and have to decide whether it belongs to a "type" or its mutally exclusive "non-type".

In linear regression, the output is numeric. But in binary classification, you can't use that mathematical formula directly because the output values are categories (text) like "spam" / "not-spam" or "sick" / "not-sick". So, what do we do? We take the help of probability theory -- we calculate the odds.

Odds is the ratio where the numerator is the probability of an event of interest and the denominator is 1 - probability of the event. Statistics is not a happy place with just ratios; we need logarithms to simplify things. Enter logit, which is the natural logarithm of the odds. This can be any decimal number. Once more, not happy and we simplify again. We transform it into the range 0 - 1.

To do the transformation, the numbers are passed through a function called inverse logit function. The inverse logit function takes a real number and transforms it to a value in the range 0 - 1. So you input the log of odds ratio to the logit function and you get a probability in the range 0 - 1.

Our classification problem is binary. So you decide a threshold say, 0.5. If the output probability is 0.5 or above, then the entity belongs to the "type". Otherwise, we classify that it belongs to "not-type".

So there we are. The process is logarithmic ratio transformation, and applied to log-odds, but the word used is logistic. In fact we should be calling "logistic regression" as "binary category log-odds classification". To my mind, this is more intuitive.

Turns out logistic regression has nothing to do with these notions of logistics or regression.

In the world of statistics, the word regression is used to mean prediction when applied to practical problems. If you have many input variables, and they are combined with a mathematical equation to calculate the value of an output variable, you do regression. One you have an equation or a formula, you can just plug in any values of the input variables, and you derive the value of the calculated variable. That means you can predict. Thus, regression = prediction.

If the prediction is to decide whether a given entity belongs to one of two classes, then the method is binary classification. Typically, you would have an entity and have to decide whether it belongs to a "type" or its mutally exclusive "non-type".

In linear regression, the output is numeric. But in binary classification, you can't use that mathematical formula directly because the output values are categories (text) like "spam" / "not-spam" or "sick" / "not-sick". So, what do we do? We take the help of probability theory -- we calculate the odds.

Odds is the ratio where the numerator is the probability of an event of interest and the denominator is 1 - probability of the event. Statistics is not a happy place with just ratios; we need logarithms to simplify things. Enter logit, which is the natural logarithm of the odds. This can be any decimal number. Once more, not happy and we simplify again. We transform it into the range 0 - 1.

To do the transformation, the numbers are passed through a function called inverse logit function. The inverse logit function takes a real number and transforms it to a value in the range 0 - 1. So you input the log of odds ratio to the logit function and you get a probability in the range 0 - 1.

Our classification problem is binary. So you decide a threshold say, 0.5. If the output probability is 0.5 or above, then the entity belongs to the "type". Otherwise, we classify that it belongs to "not-type".

So there we are. The process is logarithmic ratio transformation, and applied to log-odds, but the word used is logistic. In fact we should be calling "logistic regression" as "binary category log-odds classification". To my mind, this is more intuitive.

## No comments:

## Post a Comment