For a stretch of DNA of unknown sequence, calculate how frequently restriction endonuclease recognition sites would be expected to occur which may include alternative possible bases at some of the positions.
Alternatives are indicated by ambiguity symbols: N (any base), R (purine) and Y (pyrimidine).
Assume that each base — A, T, G or C — occurs equally frequently in the DNA.
If the probability of finding a restrictions site at any position in DNA is ‘1 in x’, one can expect to find that site once every x base-pairs. This (overall) probability may be calculated by multiplying the probabilities of occurence of the base at each individual position in the site. In this case for an A, T, G or C multiply by 4, for an R (A and G) or Y (T and C) multiply by 2, and for an N multiply by 1. The following is a specific example:
Take the case of the frequency of occurence of the tetranucleotide, GRYC.
As for the elementary example, the chance that the first base in an unknown stretch of DNA will be G is 1-in-4.
For the second base, the chance that it will be the required R will be 1-in-2 as A and G represent 2 of the 4 possible bases.
Likewise the chance that the third base will be the required Y will be 1-in-2.
Finally the chance that the fourth base will be the required C will be 1-in-4.
The overall chance of finding GRYC is obtained by multiplying these individual probabilities, 4x2x2x4: 1-in-64.