Let's assume you've sampled gene expression for 10,000 genes in two different conditions X and Y and found 300 genes differentially
over-expressed. In the entire gene set, 2000 are known to be associated with a particular biological function B. You've noticed that there
are quite a few of these F-associated genes in your list of differentially expressed genes, 60 to be precise.
The hypergeometric test might help you to assess whether your observation is indeed statistically significant, i.e. whether function F is
enriched in condition X beyond what might be expected by chance.
From our little story above we can extract the following numbers to feed into the test:
(total number of genes)
Number of successes in population:
(all F-associated genes)
Number of successes in sample
(F-associated genes in condition X)
Which should result in a probability of p ~ 0.52 to draw 60 F-associated genes or more from 300 randomly selected genes in the list -- not really very significant at all!