Restriction Sites 2

in Biology

For a stretch of DNA of unknown sequence, calculate how frequently restriction endonuclease recognition sites would be expected to occur *which may include alternative possible bases at some of the positions.*

Alternatives are indicated by ambiguity symbols: N (any base), R (purine) and Y (pyrimidine).

Assume that each base — A, T, G or C — occurs equally frequently in the DNA.

One site every base-pairs

If the probability of finding a restrictions site at any position in DNA is ‘1 in *x*’, one can expect to find that site once every *x* base-pairs. This (overall) probability may be calculated by multiplying the probabilities of occurence of the base at each individual position in the site. In this case for an A, T, G or C multiply by 4, for an R (A and G) or Y (T and C) multiply by 2, and for an N multiply by 1. The following is a specific example:

- Take the case of the frequency of occurence of the tetranucleotide, GRYC.
- As for the elementary example, the chance that the first base in an unknown stretch of DNA will be G is 1-in-4.
- For the second base, the chance that it will be the required R will be 1-in-2 as A and G represent 2 of the 4 possible bases.
- Likewise the chance that the third base will be the required Y will be 1-in-2.
- Finally the chance that the fourth base will be the required C will be 1-in-4.
- The overall chance of finding GRYC is obtained by multiplying these individual probabilities, 4x2x2x4: 1-in-64.

David P. Leader