hi
im trying to find a simple definition of the t-test and linear regression( wald test).
i know its a left field question but other web sites are way above my head in description
also if anyone would like to help get my head around critiquing a research paper i would be ever so grateful
thanks
x_LoUiSe_x
12-04-2005, 10:33
I did an A-Level in stats and i recognise the names of the tests ur talking about but cant remeber what they are, i'll see if i can dig out my notes and i might be able to help you :thumbsup:
p.s i havent had chance to look through it yet but this looks like a good site....
>>>here<<< (http://members.aol.com/johnp71/javastat.html)
Linear Regression :
the relation between variables when the regression equation is linear: e.g., y = ax + b
Lucy_Smith
12-04-2005, 11:52
A t-test is used to figure out whether there is a statistally significant difference between the means of two seperate groups.
Speedy_Jim
12-04-2005, 12:27
A t-test is used to calculate the probability that two sets of data represent two distinct groups with different means.
If you had two infinitely large sets of data, then the problem is trivial (conceptually...) - you just take the average of each set and see if they're different. But, of course, you don't have infinitely large data sets. You can't ask (for example) all the women and all the men in the world how tall they are. So you have a small sample of data which you hope is a reasonable representation of the total set of data.
Each sample of your datasets has some random deviation from the average value. The smaller this random deviation is, the less individual points you need to be confident that the average of your sample is close to the average of the entire data set. If the individual data points vary a lot, you need many more of them to be confident that your average is close to the 'real' average for each group.
A t-test is a way of comparing the averages of your two datasets, which takes into account the number of samples you have and the variability between samples. It generates a probability estimate which can be stated something like:
"there is an X% proability that the data from your two sample datasets have been taken from the SAME underlying set of data". So this probability value shows you how likely it is that your two sample datasets DON'T really differ.
Even if the averages of each dataset appear to be very different, there's still a chance that this could be a random occurance. For example, your height measuring experiment might have included a couple of unusually short women which skewed the averages.
To be "significant", it's usually said that there must be less than a 5% chance that the data from the two groups have actually come from the same underlying group. This is the definition of statistical significance in this context. In papers, it's usually written something like "group A is taller than group B, with P<0.05 (or P<5%)". "P" is the number that drops out the end of the T-Test.
When I do my measuring men and women experiement, if I collect enough data in my samples and do a T-Test, I'd hope to say that there is less than a 5% chance that the difference between my men and women heights is just a random occurance caused by person to person variations in height within the sample I measured. So there's actually a 95% chance that, on average, men really are taller than women.
To close (phew...) it's also necessary to start with one (and only one) hypothesis. In my height experiment, I *must* start out by saying which of the sexes is taller, then do my T-Test to confirm or deny this statistically.
As for regression tests, no idea :)