In science, a *hypothesis* is an informed conjecture or statement of what might be true. Most philosophers (and scientists) hold that we do not know anything with absolute certainty. What we call facts are hypotheses that have acquired so much supporting evidence that we act as if they were true. A statistical test assesses the evidence provided by data against some claim. In this simulation you will gather data in order to test hypotheses.

In this made-up example, a small fish (*Sonoras werlei*) lives in freshwater as well as in brackish estuaries. In each environment, a small percentage of *Sonoras werlei* individuals have spotted fins. Some observers have reported that spotted fin *Sonoras werlei* are more common in freshwater. Other observers have suggested that the spotted fins are more common in brackish water. How can we test these claims?

One way to assess these claims is to sample individual fish from a freshwater body of water and an estuary, and compare the percentages of fish with spots from the two localities. That is what we will do in this exercise.

In the simulation below, enter the number of individuals to sample from the population and click the Sample button. As the individual fish are collected, the number of fish with spotted fins are counted.

You will first sample 40 fish from both the freshwater and the brackish estuary.

**Question 1**

How many spotted fish are present in the freshwater and the estuary? What percentage of fish from each body of water have spots?

**Question 2**

What conclusion can you draw about the relative frequencies of spotted individuals in the two bodies of water? Are you sure? How can you be sure?

Yes, how can you be sure whether spotted individuals are more common in one of the bodies of water? Let us suppose you obtained more spotted individuals in freshwater than you did in the brackish water. Could that difference be real or simply due to chance?

To determine whether differences obtained in samples are real or just due to chance, scientists employ statistical tests. One such test is the chi-square test on a 2 × 2 contingency table. In this test, the numbers (not frequencies) of spotted and unspotted individuals from both the freshwater and brackish water sites are entered (as in the table). These are the observed data. From these observed data, the expected values for the spotted and unspotted individuals are calculated with the assumption that the proportions of spotted and unspotted individuals are the same in freshwater and brackish estuaries. These are the expected data (values).

This statistical test is designed to test the hypothesis that there is no real difference between the sample from the freshwater site and the brackish estuary, and the observed results obtained are simply due to chance. A hypothesis like the one above that posits no real difference between samples is known as the “null hypothesis.” As the difference between the observed values and the expected values gets larger, it is less likely that the differences are due simply to chance and thus, more likely that the null hypothesis can be rejected. Statisticians have developed the chi-square test to determine how likely the differences are simply due to chance. If the chi-square value is greater than 3.84, then the probability that the results are due to chance (the *p*-value) is less than 5%. Biologists usually use a 5% cut-off point to determine statistical significance, the point at which the null hypothesis can be safely rejected: *p*-values that are below 5% are deemed statistically significant and *p*-values above that deemed not significant. Keep in mind, however, that there is nothing magical about the 5% significance level; in reality, there is little difference between a *p*-value of 0.046 and a *p*-value of 0.054. In addition, some fields and studies use more stringent cut-off points, rejecting the null hypothesis only when the *p*-value is much less than 0.05 (for instance, 0.01 or even 0.001); other fields and studies are less stringent and will reject the null hypothesis at *p*-values greater than 0.05 (e.g., 0.1).

The chi-square test cannot prove the null hypothesis; it can only reject or not reject it. Even if the samples show identical proportions of spotted individuals in freshwater and brackish water sites, there may still be an underlying difference between the sites. If the null hypothesis is rejected, then the data are consistent with an alternative hypothesis: There is a difference in the proportion of spotted fish in the two samples.

**Question 3**

Enter the values that you obtained from Question 1 into the chi-square calculator. What is the chi-square value? What is the *p*-value?

Now sample nine more times (for a total of ten). In each attempt, sample 40 fish from each of the two sites. After completing each attempt, write down the numbers of spotted and unspotted individuals in each locality. Enter those numbers into the chi-square calculator and record the chi-square values and *p*-values.

**Question 4**

Are any of the results from any of the sampling attempts statistically significant?

Record the cumulative total for all ten sampling runs: 400 individuals from each of the freshwater and brackish water sites should be sampled. Enter these values into the chi-square calculator and record the results for chi-square values and *p*-values.

**Question 5**

Can you reject the null hypothesis? What conclusions can you draw?

**Question 6**

Discuss why the chi-square test is performed on the observed numbers of individuals in each of the sites and not the frequencies.