The major steps involved in statistical analysis include data collection and entry, examination of the data, summarizing the data and reporting findings.
In some cases, data will be available for the problem under investigation. You might, for example, keep data as a routine task. For example, a teacher who keeps a record of student grades on class work, tests and homework assignments is conducting statistical analysis. In other cases, however, you must collect your own data. Once you collect your data, you might need to alter its format to meet your analytical needs. Data from a customer satisfaction survey, for example, has to be numerically coded in such a way that you can analyze customer responses. The data for your analysis can be entered into a spreadsheet, such as Excel.
It is strange that even trained statisticians sometimes do not take the time to examine their data before conducting analyses. At this stage of the analysis, it is sometimes useful to produce some kind of visual display or graph that will tell you more about the data being collected. The most appropriate type of graph will depend on the type of data. Pie charts, for example, are an excellent choice with financial or budget data. Other graphs include bar graphs and line charts.
The purpose of summarizing the data is to arrive at one or two numbers that describe the characteristics of a much larger set of data. A classroom teacher, for example, might calculate an average grade for each student to summarize the quality of each student's work over a semester grading period. Key summaries in basic statistical analyses include measures of central tendency and measures of dispersion, or spread.
Measures of central tendency are generally known as averages and include such measures as the mean and median. The mean is calculated by summing the values in a set of data and dividing the total by the number of values. If the data are arrayed in order from the highest value to the lowest, the median is the middle value, where half of the values are higher and the other half are lower.
Measures of spread or dispersion include the range, which is the difference between the highest and lowest values in the data, and the standard deviation. The latter measure is more complex to calculate and generally requires a computer or at least a calculator. The standard deviation is the square root of the variance, which is the mean of the sum of squared deviations from the mean score.
You can present the results of your statistical analyses in the form of tables or graphs. Spreadsheet programs such as Excel can perform most basic statistical analyses, as well as present the findings in tables or graphs. Excel can perform a variety of statistical procedures, both basic and advanced. Spreadsheet programs, however, are not specifically designed for more complicated analyses. Many scientists and university researchers use specialized statistical software packages such as SPSS and SAS to analyze data.