FacebookTwitter
Hatrack River Forum   
my profile login | search | faq | forum home

  next oldest topic   next newest topic
» Hatrack River Forum » Active Forums » Books, Films, Food and Culture » Statistics Moguls: Logarithmic Transformations and Linear Regression (I Was Right!)

   
Author Topic: Statistics Moguls: Logarithmic Transformations and Linear Regression (I Was Right!)
Dagonee
Member
Member # 5818

 - posted      Profile for Dagonee           Edit/Delete Post 
We had an exam question in statistics about a study that compared savings rate and GDP growth for 113 countries over a 35 year period. We were supposed to use linear regression to check correlation and determine if a model was feasible. The growth was given as total growth over 35 years (ex., 5000%), not as an annual growth rate.

I could not remember the formula for converting such a number to an annual growth rate. I know for a fact this isn't something we were taught for the class, nor were we expected to know it.

I did a linear regression that had residuals that were uncorrelated but fanning out as the savings rate increased. I checked a quadratic and cubic, and they did the same thing. When the residuals fan out like that, it means inferences from the sample to the population become unreliable, but that the model is sound for this particular sample.

I remembered that taking the natural log of the growth rate could help in that situation, so I went ahead and did it. The residuals stopped fanning out and fit into an almost perfect rectangle, with approriate random variation throughout the full range of data. Basically, it totally solved every problem with the data: it gave p-values of significance at 10 to the -6 and 10 to the -92 for the coefficient and the intercept, it created residuals with no correlation, and gave me a good r-squared value.

However, I know that just because the answer looks good in statistics, it doesn't mean you did it right. Is the logarithmic transformation valid in this scenario? My assumption was that it was OK, and I found an example on population growth that suggested it was, but I'm relying on other's views, not a deep conceptual understanding.

Dagonee
P.S., I've turned in the exam, you won't be helping me cheat. [Smile]

[ June 24, 2005, 09:41 AM: Message edited by: Dagonee ]

Posts: 26071 | Registered: Oct 2003  |  IP: Logged | Report this post to a Moderator
Papa Moose
Member
Member # 1992

 - posted      Profile for Papa Moose   Email Papa Moose         Edit/Delete Post 
You posted this just to make Jonathan feel better about not getting replies, didn't you....
Posts: 6213 | Registered: May 2001  |  IP: Logged | Report this post to a Moderator
Dagonee
Member
Member # 5818

 - posted      Profile for Dagonee           Edit/Delete Post 
[Grumble]
Posts: 26071 | Registered: Oct 2003  |  IP: Logged | Report this post to a Moderator
Bob_Scopatz
Member
Member # 1227

 - posted      Profile for Bob_Scopatz   Email Bob_Scopatz         Edit/Delete Post 
Ha!

Okay, it's been awhile since I did logarithmic transformations of data, but basically, it is used for variance reduction when the data span a wide range of values. I'm not an economist, but it seems like it ought to apply in a situation like GDP over decades.

The result you describe with residuals "fanning out" is exactly what you'd expect with a set of numbers that has variance in proportion to the absolute magnitude of the numeric value.

If variance is running about 10% of the mean at every annual data point, as inflation increases the overall magnitude of the mean, the variance is going to to larger in proportion.

By taking the log (or natural log -- both transforms do the same basic thing, but some people prefer using LN because it has the word "natural" in it), you reduce this growth in the variance.

Your model should fit better, especially if the underlying data had an exponential increase.

Because inflation is a compounding function (next year's inflation rate is multiplied by this years already inflated number, which was inflated from the year before, etc...), it does behave like a an exponentially increasing function.

I've forgotten the real shape, but a LN transform would seem highly appropriate to me.

Now, I'm betting there are statisticians and economists who have gotten PhDs for proving that it isn't the right transformation to use, and you probably should've used Poisson regression or stood on your head while reciting Bayes' theorem instead.

But the fact that it worked means that at least over the period of time in your data set, it was a reasonable fit to the data.

If I were teaching it, even if you were wrong, I'd give you points for thinking of it.

Posts: 22497 | Registered: Sep 2000  |  IP: Logged | Report this post to a Moderator
Dagonee
Member
Member # 5818

 - posted      Profile for Dagonee           Edit/Delete Post 
quote:
The result you describe with residuals "fanning out" is exactly what you'd expect with a set of numbers that has variance in proportion to the absolute magnitude of the numeric value.
I wish I'd put that in my answer - it was an essay question where we had to insert our calculations at various points.

quote:
If I were teaching it, even if you were wrong, I'd give you points for thinking of it.
Whew. That makes me feel better, although your discussion above makes me even more confident it's right. Which is good, because I totally screwed up at least one problem - and there were only 4.

Thanks for shoring up my confidence and sharing your expertise.

Dagonee

Posts: 26071 | Registered: Oct 2003  |  IP: Logged | Report this post to a Moderator
Bob_Scopatz
Member
Member # 1227

 - posted      Profile for Bob_Scopatz   Email Bob_Scopatz         Edit/Delete Post 
You're welcome. It's been a LONG time since I did that stuff, but I'm pretty certain it's correct.

I was taught the technique for psychology experiment data. All they said is "use this to reduce variance." Turns out, you should probably have a better reason for doing it than just "lots of variance floating around."

Your excuse sounds a lot more substantial than the ones we used.

Good luck!

Posts: 22497 | Registered: Sep 2000  |  IP: Logged | Report this post to a Moderator
Dagonee
Member
Member # 5818

 - posted      Profile for Dagonee           Edit/Delete Post 
I guess I didn't screw up one of the problems. I just got my grade: A+.

*does the happy dance*

Posts: 26071 | Registered: Oct 2003  |  IP: Logged | Report this post to a Moderator
Scott R
Member
Member # 567

 - posted      Profile for Scott R   Email Scott R         Edit/Delete Post 
Head a'splode.
Posts: 14554 | Registered: Dec 1999  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
I don't remember seeing this thread before!

(that's very rare . . . probably too rare)

What you did was theoretically sound. Oddly enough, I'd prolly have to think for a decent bit before doing the actual problem, as my prob&stats course was almost all theory, but given a description of the problem and your steps I can see the reasonable distribution for the sample as you considered it, and that using a logarithm wouldn't lose any essential properties.

Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
Dagonee
Member
Member # 5818

 - posted      Profile for Dagonee           Edit/Delete Post 
I didn't think of it until after I wrote my conclusion. I had to run the regression and do the write-up and I had already used the alloted time for the problem. Which is why I thought I messed up a question worth 30% of the grade. I only had 15 minutes to do it. I identified one fallacy, explained it, and that was it. No analysis. Evidently the fallacy was all he cared about.
Posts: 26071 | Registered: Oct 2003  |  IP: Logged | Report this post to a Moderator
   

   Close Topic   Feature Topic   Move Topic   Delete Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:


Contact Us | Hatrack River Home Page

Copyright © 2008 Hatrack River Enterprises Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.


Powered by Infopop Corporation
UBB.classic™ 6.7.2