MAS in breeding
Marker-assisted selection
Marker-assisted selection (MAS) uses the genotype of a marker instead of, or in combination with, the phenotype of a plant in selection during a breeding program. This is especially helpful for traits that are highly influenced by the environment (e.g. yield, protein contents of seeds), or that are expensive or difficult to test, for example drought tolerance, fruit characteristics in crops with a long juvenile phase such as apple or vine, malting quality in barley or the resistance to a quarantine organism. MAS has the additional advantage that it can be performed off-season and off-site and in the very early stages of plant development since only a small amount of DNA (usually from a leaf sample) is needed. However, MAS also has disadvantages and limitations. The efforts needed to find markers and to create linkage maps are large and not justified for each crop or trait, and many traits may be more easily or just as easily selected using conventional visual selection on the trait of interest. Moreover, the selection result by MAS may be less good than in conventional breeding. This has to be kept in mind when studying the contents of this chapter.
Next to selection purposes, genetic markers can also be used, e.g.:
Summary
→ MAS has advantages but also important disadvantages and limitations
Towards the end of this chapter, use of MAS in specific steps in breeding programs will be treated, such as marker-assisted backcrossing and marker-assisted gene pyramiding. However, before markers can be used for MAS in practical breeding programs, often some work needs to be done first.
Not all markers that are discovered in scientific research are suitable for MAS-programs in practice. This could be due to a number of reasons, for example, they might be:
Therefore, markers have to be tested and often further developed to make sure they reliably predict phenotypes and can be applied in high throughput screening. To develop markers suitable for MAS, usually the following steps are performed:
Mapping studies may reveal linkage between genes and markers. However, in most cases, this linkage is not tight enough to be useful in marker-assisted breeding programs. This 'loose' linkage between a marker and a gene results in a too high percentage of recombination events between gene and markers, resulting in unreliable selection. Preferably, the mapping distance between marker and a gene has to be reduced to less than 5 cM, or rather < 1cM. This can be done by using larger mapping populations and a larger numbers of markers. This process is called high-resolution mapping.
The best marker is based on the DNA sequence of the desired allele of the gene itself. Variations in all traits are caused by variations in the DNA, which can sometimes also be made visible in markers.
|
|
Suppose Q is the target gene of interest. Markers R and P are too far away to make them full-proof as indicator of the Q allele. Saturating the region with more markers results in markers that are more closely linked with Q than P and R were.
Note: The first picture in this module, in Markers, page 1 'What is a genetic marker', shows a gene of interest which is flanked by two markers, which are located right at the begin and the end of the gene. This was an illutration to make things clear. However, the probability of finding markers at those exact positions is very low. |
If the position of a gene and two flanking markers is already known, one can use these two markers instead of high-resolution mapping. Even when the flanking markers are not very tightly linked to the gene of interest, they can be used for MAS, because double recombination within a circa 6 cM region is very rare. We discussed this in the chapter on QTL analysis in the section on Simple-interval mapping.
Sometimes, a group of markers always segregates together in a mapping population. In that case, their order on the map is not clear. When a larger mapping population is used, some recombinations among those markers may occur, and their relative positions on the map may become clear. Such a further resolution will not be possible if the DNA stretch is not or insufficiently homologous to the DNA of the other parent (see chapter on Linkage mapping, paragraph on "Other reasons for large gaps in genetical maps: non-polymorphic DNA regions".
|
|
Co-segregating polymorphic markers C, D and E (left), may segregate and be mapped on the chromosome when a larger mapping population is used (right). |
To estimate the location of a gene or QTL on the genomic map very precisely, large mapping populations are needed. The QTL is discovered on the basis of the phenotyping of the mapping population for the trait of interest. That trait is measured under a certain experimental error, and there is only a very limited probability of recombination across the chromosome, so that many linked positions have almost the same association to the trait. In addition, other unlinked or linked QTLs may also influence the trait. This results in generally large support intervals for QTLs. No general minimum number of individuals for a population size can be given to perform high-resolution mapping, but in a larger population, the probability of finding recombinants is higher than in a small population and the distance between marker and gene can be estimated with more accuracy. In order to find a marker within 1 cM of a desirable gene or another marker locus, population sizes of more than 1000 individuals have been used. 1 cM of map distance corresponds to 1% of recombinants, so theoretically 10 recombinants in a population of 1000 individuals. A large population (for example n=1000 RILs) will imply phenotypic comparison of about 500 RILs with the AA alleles of the gene, and 500 RILs with the contrasting aa alleles, which should give a better estimation of the phenotypic contrast caused by the two alleles than in the case of 100 RILs. However, phenotyping 1000 RILs is not always technically or economically feasible. The breeder has to compromise here.
Normally, fine mapping a gene of interest is done on the basis of screening of progeny of a plant that is heterozygous for the QTL region, but rather homozygous for the rest of the genome. A suitable plant is the last backcross before reaching the homozygous NIL-status. The 1000-s of individuals and their progeny are only genotyped for markers in the QTL region.
Show/hide comprehension question...
Mapping additional markers may saturate linkage maps in the region where also genes of interest are located. Techniques that produce many markers per reaction, such as DaRT or SNP platforms, are usually preferred to increase marker density (see chapter on markers). In the following pages, several other techniques are described for high-resolution mapping.
Summary
→ High-resolution mapping is often needed to find markers that are more closely linked (< 1 - 5 cM) to the gene of interest than the markers that were initially identified near the mapped gene.
→ Large segregating populations are needed for high-resolution mapping
Show/hide comprehension question...
Show/hide comprehension question...
There are two possibilities to obtain a higher resolution (a higher density of markers) of a particular chromosome region of interest:
A bulked-segregant analysis (BSA) approach can be applied to target specific chromosomal regions (between target marker loci V and W). We have seen BSA in the previous chapter, where it was used to identify markers linked to genes, by means of grouping individuals of the phenotype classes corresponding to the genotypes for the gene of interest. In this case, however, phenotypic groups are not needed, since we are trying to find markers linked to other markers, not to a gene for a phenotypic trait:
If a gene for a trait of interest was calculated to be located between marker loci V and W in a mapping study, in the BSA approach DNA is pooled from VW plants and from vw plants. The DNA from these two pools, is tested for many markers. A marker that has one specific allele in the VW pool and another allele in the vw pool, thus co-segregates, must be located very closely to marker loci V and W.
To find out whether these newly found markers are located more closely to the gene of interest than V and W, individual members of the mapping population will be genotyped with those new markers, and the position of those markers (likely near V and W) will be determined. Next, the association of those markers with the trait value will be determined to see which will appear to be the peak marker.
Also near-isogenic lines (NILs) can be used to construct high-resolution maps for specific chromosomal regions (see figure below), because NILs differ from each other only with respect to small portions of the genome. Once a QTL has been identified and its approximate position is known from mapping studies, NILs can be created that contain only a single QTL by repeated backcrossing with the recurrent parent. During this process, there may or may not be active selection against donor parent marker alleles in the background (marker-assisted backcrossing – speeds up the selection process). High-resolution mapping can sometimes also reveal the presence two closely-linked genes, which are seen as one QTL in a small mapping population. One of the difficulties in precisely locating a QTL on a linkage map for a quantitative trait is that other QTLs influence the trait as well and therefore the error in the estimated map position of a given QTL can be quite large. By creating NILs that (preferably) each only have a single segregating QTL this effect is reduced and therefore the map position can be established more accurately.

In this example, a near-isogenic line (NIL) contains a gene of interest which is situated between markers M1 and M2.
Example of fine mapping of a QTL, starting from a heterozygous Near-isogenic line containing only one segregating QTL. The blue chromosome segments represent the DNA of the + allele donor of the trait. Logical analysis shows that the contributing gene should be to the left of additional marker 2, since the fourth recombinant has still the high trait value. The low trait value of the ninth recombinant (23) also suggests that the gene should be to the left of marker 2. In this example, the gene could also be left of marker 1, so we conclude that the gene is located between M1 and marker 2. Note that we should test preferably more than one recombinant per marker interval and phenotype several replicates for every recombinant genotype if possible, to reduce environmental effects on the mean trait value of each recombinant.
Suppose that the gene of interest is located in between marker 1 and marker 2, recombinants between marker 1 and 2 may either have the higher or the lower trait value, depending on the precise location of the gene and the precise location of the recombination event. If the marker 1 – marker 2 interval is largely blue, and the gene is half-way, the trait value will be high, if it is largely white, the trait value will be low. Finding both trait value classes in the same recombinant group is evidence that the gene is in that particular marker interval.
Next, the NIL and recurrent parent are crossed to develop an F2 between them (it may also be the last backcross towards a near-isogenic line). In the F2 recombinants between M1 and M2 may be identified. Running the additional markers between them (1, 2, 3, and 4) will tell where the recombination occurred approximately. The recombinants contain different parts of the original donor fragment and can all be recognized by recombination between M1 and M2. In the figure, 10 different progeny lines are shown and 4 additional markers between M1 and M2 are needed to distinguish them. Phenotyping of the plant will show in which interval (M1 - 1, 1 - 2, etc.) the donor allele of the gene of interest is located. When a QTL is situated between M1 and M2 it can be mapped more accurately (assigned to a smaller region) using markers 1 – 4.
Note that the length and end points of each fragment cannot be known exactly from the additional markers. In figures, the fragment ends are always drawn in the middle between two markers but it may in fact be anywhere in the interval.
The homozygous NIL and the recurrent parent are identical, except for the region a little beyond the interval between M1 and M2.
Show/hide comprehension question...
DNA of the NIL and the recurrent parent can be submitted to a search for polymorphism, for example by SNP array. All polymorphism identified should in principle correspond to marker polymorphism in the region of interest. By this way, the area between M1 and M2 may be filled with more markers.
Show/hide comprehension question...
Show/hide comprehension question...
Show/hide comprehension question...
Summary
→ High-resolution mapping of specific chromosomal segments can be performed by using BSA or NILs. BSA is aimed at filling an already identified marker interval with more markers, whereas NILs are used for the actual fine-mapping, improving the estimate of the position of the gene, improving the resolution.
Show/hide comprehension question...
I want to try extra questions | Move on to: Marker validation
Show/hide extra comprehension question 1
Show/hide extra comprehension question 2
Show/hide extra comprehension question 3
Show/hide extra comprehension question 4
Markers that are found to be linked to a gene only appear if the marker is polymorphic (in fact di-morphic) between both parents. However, even if they are closely linked to the gene of interest they have to be tested ('validated') in other populations, such as breeding material, in order to be useful in marker-assisted breeding. Many markers found to be associated with a gene of interest in a mapping population are not useful in breeding material. This can be due to several reasons, e.g.:
Reliability of a marker can be predicted from its distance to a QTL. To see to what extent a marker is capable in predicting the desired phenotype, the marker should be tested in other populations. This can be done by crossing one of the parents used in the initial mapping population with a different cultivar, to develop another mapping population. In the resulting population, the ability of a marker to predict the phenotype can be evaluated and compared to the original mapping result. However, this is quite an investment, and implies that marker-assisted selection is not just a quick and easy strategy!
The best possible markers are markers that are part of the actual functional gene sequence or so tightly linked that they show no recombination with the target gene. Also markers that are consistently found to be absolutely linked to the target gene and unique in the germplasm. Such markers are highly predictive for the target gene and have a good change to be 'diagnostic' as well.
Summary
→ Markers need to be validated to make sure they are indeed tightly linked to genes of interest in breeding material
→ Markers need to be diagnostic to be followed if the donor of the gene is crossed to elite breeding lines of various origin
Show/hide comprehension question...
Show/hide comprehension question...
Show/hide comprehension question...
I want an alternative explanation | Move on to: Haplotype
Marker validation is the determination whether or not markers can predict a certain trait in the wider breeding material.
Markers are developed in mapping populations. However, a marker that is tightly linked to a gene of interest is not necessarily perfect in predicting a trait in breeding material. For example, when the genes affecting a trait are located on different parts of the genome and a specific marker is only tracking one of the genes on one chromosome, while in other material a different gene is segregating for the same trait, on a different chromosome is responsible. In other breeding lines, the genes for that trait will be associated with different markers than the one that was discovered in the mapping population.
Also, marker alleles might look the same, but are in fact different. The following SSR (simple sequence repeat) marker, consists of five ACC repeats. The number of repeats determines the size of the amplification fragment. During normal pairing, fragment length does not change. However, sometimes pairing is unequal and shifted (see figure). Recombination (cross) causes the number of repeats to change to four or six ACC motifs: new alleles are formed. This may happen on several occasions resulting in alleles that appear identical (e.g. with 6 repeats) but are formed during different events.

Therefore, even markers that are very tightly linked in a mapping population have to be validated in the relevant breeding material.
A haplotype is a specific combination of alleles that occur in linkage. In the case of markers, if such a string of closely linked markers surround a gene of interest, the chance is considerable that the haplotype is "diagnostic" (unique), allowing the breeder to follow the presence of the gene of interest in descendants of the donor that contributes the gene.
In this example, the haplotype M1m2M3m4M5M6 is typical for the desirable allele A of the gene; haplotype m1M2m3M4m5m6 is typical for the undesirable allele of the gene. Note that descendants that carry the A allele but have a recombination between for example M3 and m4 will probably form a novel and aberrant haplotype, and cannot be recognized for sure to carry allele A. If the constituent markers are tightly linked, the chance for such a recombination is very small.
The advantage is that, across a range of possible accessions, the multi-marker haplotype has a much larger chance to be diagnostic for the favorable allele than a single marker or a pair of markers flanking the gene of interest. The haplotype may be used to screen the elite lines or cultivars of a breeding program for having the desired allele A and subsequently use these as parents in crosses. If we use a parent carrying allele A and the marker haplotype as donor in a complex crossing programme involving various other parents, we may monitor the presence of allele A by scoring all six markers. It is extremely unlikely that another accession of the crop would be by chance M1m2AM3m4M5M6, without carrying allele A. If another genotype for example has M1m2M3M4m5m6, then this is a third haplotype. For this third haplotype it is not known whether it carries the A or a allele; only the known haplotype provides sufficient certainty.
A prerequisite for selection on the basis of haplotypes is that you need to be able to run large numbers of markers (e.g. on SNP platforms).
Summary
→ Haplotyping addresses several specific marker alleles to a desirable allele of a gene and diagnostic for the gene that they are used for
.
Show/hide comprehension question...
- For information about different markers, please refer to the chapter on Markers -
There are situations where markers need to be converted into other types of markers:
The exact procedure to convert markers goes beyond the scope of this course, but an example is given below that will mention several necessary steps. The method is also used for convert gene sequences into convenient markers.
To convert RAPD or AFLP marker or a gene sequence into a SNP marker for use in MAS we need to:
Summary
→ For use in practice, markers need to be converted into markers that are highly reproducible, easy to use, quick and cheap.
I want an alternative explanation | Move on to: Selection of QTLs
Marker conversion is changing markers that are unreliable, technically difficult to operate, or not polymorphic between a new pair of parents. They are converted into reliable markers that are simple to operate and relatively cheap.
To create a linkage map, the first aim is to get as many closely linked markers as possible. In this process, also markers are used that are expensive and complicated to use. For marker-assisted breeding, two flanking markers per gene may be sufficient and because of the scale of use, it is important that the operation of this marker is simple, reliable and cheap. Therefore, markers that are poorly reproducible (like RAPDs), relatively expensive and/or relatively difficult to operate (like for instance AFLP markers or CAPS markers that require an additional digestion step) have to be converted to markers that can be run at high throughput, such as SNPs.
Quantitative traits are often controlled by several to many genes. Theoretically, all markers in QTL regions could be used for MAS. However, some QTLs might only explain a very small portion of the total variation. It might not be economically feasible to include such QTLs in the breeding program. Breeders should focus on the QTLs that account for the largest proportion of phenotypic variance for the trait.
All QTLs selected for by MAS should be phenotypically expressed across environments and effective in different genetic backgrounds and in relevant growth stages of the crop. Therefore, we have to check whether the QTL found in the mapping population is also effective in other genetic backgrounds. To do this, first a NIL is developed containing only a single QTL. Then, this QTL-NIL is compared with the recurrent parent to see whether the QTL has the expected effect in the new genetic background. This is known as QTL validation.
Summary
→ In marker-assisted breeding, focus should be on QTLs with a major effect (and only after QTL validation).
Show/hide comprehension question...
By discarding most genotypes that lack certain desirable traits at an early stage, the number of plants tested in the next generation in the field is reduced (or kept the same, but with a higher proportion of promising genotypes). This can be achieved using MAS.
|
|
Early generation selection scheme using MAS (right) as compared to the conventional pedigree method (left). With MAS, only desirable F3 lines are planted in the field. Since fewer F3 lines are planted, more individual plants may be planted per F3 line. This further increases the probability of selecting superior individuals and allows more crosses to be evaluated at the same time. (Source: B. Collard & D. Mackill (2006).)
Summary
→ In early generation selection, MAS may help to discard many plants that lack important traits that are difficult to test phenotypically.
I want extra explanation | Move on to: Marker-assisted backcrossing

MAS scheme for early generation selection in a typical breeding program for disease resistance. A susceptible (S) parent is crossed with a resistant (R) parent and the F1 plant is self-pollinated to produce an F2 population.
In this example, a robust SNP marker (allele T) has been developed for a major effect QTL contributing to disease resistance. By using a marker to assist selection, plant breeders may significantly reduce the size of field trials by eliminating many unwanted genotypes (marker genotypes CC and CT) and retain only those plants possessing the desirable homozygous genotypes for this locus (highlighted cells in the F2).
Note that 75% of the plants may be discarded after one cycle of MAS. This is important because plant breeders typically use large populations (e.g. 200 to 800 F2 plants per crossing combination) and hundreds of crossing combination in a single year.
Backcrossing implies repeated crossing with a (recurrent) parent. The recurrent parent is often a modern line or variety, with a desirable overall genotype, but lacking one or several important genes. Using conventional breeding methods, it typically takes 6–8 backcrosses to achieve a sufficiently high proportion of the recurrent parent genome. This requires a long time, during which the variety used as recurrent parent might have become outdated. The objective is to eliminate the undesirable genes in the genetic background in the first place, rather than to obtain a near-isogenic copy of a certain modern cultivar. Therefore, often not the same variety is used as recurrent parent but different modern varieties may be used successively, from BC3 or BC4 onwards.
|
Generation |
Recurrent parent genome (%) |
|
F1 BC1 BC2 BC3 BC4 BC5 BC6 |
50 75 87.8 93.75 96.88 98.44 99.22 |
Note that the expected percentage for the entire BC1 population of the recurrent parent genome is 75%, but by chance, some individuals possess more of the recurrent parent genome than 75% and others less than 75%.
Marker-assisted backcrossing may accelerate the backcrossing process by selecting against as many as the marker alleles of the donor in the background (i.e. away from the locus of the desired gene to be incorporated). In each BC generation, some plants have by chance more and others less than the statistically expected proportion of donor background. Running of the genome-wide markers. will identify individuals that happen to have a relatively low proportion of donor genotype in the background. Selection against the donor background may be done with or without a linkage map being available. See more about this subject under Principles of Plant Breeding, section Selection Methods.
Foreground selection (i.e. selection for the target gene to be incorporated) requires two flanking markers, to prevent selection of the donor marker, but losing the linked gene because of inadvertent recombination.
|
|
|
Graph of computer simulations of the percentage of recurrent parent genome (RPG recovery) in the total genome, using marker-assisted backcrossing (MAB) and conventional backcrossing. The use of markers can reduce the number of generations required to achieve the desired proportion of the recurrent parent genome (here 95%, indicated by the dotted line). (Source: Collard et al., 2005). |
Considerable time savings can be made with MAB compared to conventional backcrossing, which could lead to economic benefits, resulting from the accelerated release of an improved variety. On the other hand, the initial cost involved in marker-assisted backcrossing would be higher than with conventional backcrossing and large investments are sometimes needed:
Summary
→ MAB can be used to minimize linkage drag (see Principles of Plant Breeding, Backcross Breeding) and to maximize recovery of the recurrent parent
Show/hide comprehension question...
Show/hide comprehension question...
I want an alternative explanation | Move on to: Marker-assisted gene pyramiding
Study the following example >> (this is an external link).
Gene pyramiding or gene stacking is the process of combining multiple genes for one trait into a single genotype. The most widespread application for pyramiding has been for combining multiple disease resistance genes to a same pathogen species in order to develop durable disease resistance. When several resistance genes are combined, the probability that this resistance is broken is small, since the pathogen should undergo simultaneous mutations or recombinations for the corresponding genes for avirulence. Because each resistance gene on its own provides complete resistance, it is difficult to test phenotypically whether plants have either one (e.g. R1R1r2r2) or more (e.g. R1R1R2R2) effective resistance genes. In that case, MAS may be of help.
Without MAS, for example, to discriminate the desirable genotype R1R1R2R2 (resistant) from genotypes R1R1r2r2 and r1r1R2R2 (also resistant), one needs to do several testcrosses:

With MAS, using reliable markers, plants can be selected directly in the F2:

Marker-assisted pyramiding of two dominant disease resistance genes for the same pathogen.
Each resistance gene provides 100% resistance against the pathogen. Combining two different resistance genes in a variety is thought to make this resistance more durable, because the probability of simultaneously overcoming both types of resistances presumably is small. During conventional phenotypic evaluation, no difference can be observed between Rr and RR plants. Simultaneous testing of resistance genes R1 and R2 can be done using phenotypic evaluation, for example removing several leaves from the plant and inoculating the leaves individually with different isolates (one for R1 and one for R2) of the pathogen ('detached leaf assay'). However, it will remain uncertain whether plants are R1r1 or R1R1 (Adapted from B. Collard & D. Mackill (2006).)
Summary
→ MAS may be useful in gene pyramiding when each of the individual genes result in the same phenotype
Show/hide comprehension question...
Show/hide comprehension question...
MAS can also be used to bring together multiple traits in one genotype, e.g. genes for high quality, genes for high yield, resistance genes etc. If all genes are tagged by markers, we can calculate which (very small) fraction of a pedigree derived from a complex cross (with many different grandparents) combines all desirable genes.
The cost of using 'tools' in breeding programs is a major consideration. Factors that influence the cost-benefit ratio of MAS versus conventional phenotypic selection include the following aspects.
A + indicates that MAS could be efficient
a - indicates that phenotypic selection would be more efficient
a ~ indicates that the objective and population type determines whether MAS or phenotypic selection is more efficient
In many cases, phenotypic screening is cheaper compared to MAS. Moreover, phenotypic screening remains necessary to determine which marker is associated with the trait of interest. Where phenotypic screening requires time-consuming and expensive assays, MAS, with all its pros and cons might be an attractive option. An important consideration for MAS is the large initial cost of marker development. A mapping population should be created, phenotypes, genotyped with many markers to find relation between a marker and the trait gene. This is a substantial investment, especially for traits with a low heritability. Costs for running a known marker range from several cents to € 5 and up per sample, and also depend on many factors, such as technique used and efficiency of use of lab facilities.
Benefits of MAS generally include the ability of testing more individuals and more efficient selection, resulting in a quicker release of cultivars.
Summary
→ Initial costs of MAS are high, compared to conventional selection
→ MAS can save time and money, depending on many factors
- END OF THIS MODULE -