**INTRODUCTION**

IBM SPSS is a software utilized mainly by social science academicians, business statisticians and students like some of us to analyze collective data. What does it mean by analyzing data? I can simply say that it is to validate the significance of the data. As to whether its pattern fits or syncs with another set of variable or data. Then, you correlate the extent of the relationship and quantify it. And then...you conclude an equation for future probable prediction.

You know all the hype in the science journals about people validating their data with all that confusing p>0.01 or r=3.495? Well, these are all the values that represent the stand your data make in your research.

OK. Statistics is just a tool to objectify your data and explain it with legit validation of your research. The tests I'll be talking about may be foreign but all of them all talking about fitting your data into a bivariate relationship; a linear model.

A linear model is actually a bivariate plot of two continuous variable. So, basically, you're just going to connect 2 datas one-to-one.

General linear model or equation or relationship is the foundation of many statistical tests; t-test, ANOVA, ANCOVA and regression analysis and also multiple factor analysis. This requires alot of reading and processing and reading again, so I'm not going to go there. Science students or social science student have at a certain point in their basic degree years, learned statistics and probabilities. Here, we started learning those when we were 13 (correct me if I'm wrong). So, relax and understand that you got this.

You wanna start analyzing your data but you don't know where to start; how to arrange your data, what test to use, what does 'p' even means, what the hell is regression analysis etc.

**WHAT TYPE OF DATA DO YOU HAVE?**

Well, start identifying the type of data you have. Just segregate it into these 2 types; categorical or continuous.

Categorical data is just a data that states a setting apart without any scale or number. For example:

You're analyzing the effect of plants on a slope erosion. So, to do that, you decided to show how it is without plants, with one type of plant and with mixed types of plants. This is categorical variable. Each and every one of this different 'treatment' is data.

Continuous data is a data that consist of an interval, ratio or plain numbers that you obtain from equipments; 212.3 mg/L of oxygen.....2.1 psu....18% mercury....

Say you're taking soil samples from differently treated soil; fertilized, burnt and natural soil. And you analyze their pseudo-total nutrients (after digestion...etc), the readings that you get correlates with their origin and the circumstances of their existence before you sample them. So, just how much can this variation in nutrients be connected to their 'setting' or land management?

Well, with this data, you can use analysis of variance (ANOVA). It is used to compare the means of differing variables and establish as to whether they are, in fact, different and statistically significant. In another language, it means that, the means@average of the data is different enough to warrant an 'eureka'; yes, they are connected to each other and the data did not come out like that by pure chances.

*ANOVA is a hypothesis test to compare the means of more than two population.***From the case above, the data you obtained of the total nutrients from each of the place could be understood as total nutrients from 3 different populations. Hence, you're going to be comparing their means to see if they are significantly different from each other!**

Is this how your data looks like?

If so, then power up that SPSS, and segregate your data into differing columns.

How you guys holding up? Still here?

So, here I have land-use, depth and the concentration of K. Since I am an environmental geologist, we look at soil analyze their content. LAND USE and DEPTH are my categorical variable. K CONCENTRATION is my continuous variable. Since I am trying to see the effects of land-use on my soil total K, I will be using ONE-WAY ANOVA. Why? Cause it's only one factor; land-use against total K. Land use is the factor that affects total K. Land-use is the independent variable and total K is the dependent variable.

Well...let's just go straight to the analysis. Go to the following instruction:

ANOVA doesn't tell you to what extent is the mean difference, they just tell you whether it is significant or not. So, that's where Post-Hoc test comes along. Here, I am using Tukey HSD as my post-hoc test. It's commonly used and easier for me to understand although some may prefer LSD. Here, if you notice, the test will be conducted to the significance level of 0.05...which is quite common. You can go for 0.01 too. I take both into consideration.

So far....for the people with basics in ANOVA but a novice at SPSS, is the instructions clear?

Feel free to ask if there are things you're not too sure about :3

Interpretation of the data will be shown in the next post which will be...soon-ish. Below, I have attached the links to some quick review of what ANOVA is in case you can't wait for the next post.

All the best to research students nationwide and I hope this post opened a sort of understanding on...ANOVA.

References & Recommendations:

Prof. Serna

Laerd Statistics