Two approaches for calculating female fetal DNA fraction in noninvasive prenatal testing based on size analysis of maternal DNA fragments

The concentration of cell-free fetal DNA fragments should be detected before noninvasive prenatal testing (NIPT). The fetal DNA molecules have significant clinical potential in determining the overall performance of NIPT and clinical interpretation. It is important to measure fetal DNA fraction before NIPT. However, there is still little research on how to calculate the concentration of female fetuses. Two estimation approaches were proposed to calculate fetal DNA fraction, including the fragments size-based approach, aneuploid-based approach, which are all approaches based on chromosome segments. Based on high-throughput sequencing data, two approaches to calculate the DNA fraction of male fetuses were tested and obtained the experiment values, which were close to the actual values. The correlation coefficient of fragments size-based approach was 0.9243 (P < 0.0001) and the aneuploid-based approach reached 0.9339 (P < 0.0001). We calculated the concentration of female fetuses and obtained remarkable experimental results. We came up with two approaches for calculating the fetal DNA fraction of female fetuses. It provides an important theoretical basis for the detection of female fetal concentration in future clinical diagnosis.


Introduction
Cell-free fetal DNA (cffDNA) has been confirmed to be present in the plasma of pregnant women. Many studies show that the detection of cffDNA in maternal plasma has significant clinical potential for the noninvasive prenatal diagnosis of fetal genetic disorders and pregnancy-associated diseases (Jahr et al., 2001). The cffDNA in the maternal plasma accounts for about 5-20% of the total cell-free DNA fragments (Lo et al., 1999). The cffDNA from maternal plasma accounts for about 3.4% in early gestation and comprises a mean of 6.2% in late gestation (Lo et al., 1998). As the gestation time increases, the concentration of cffDNA rises sharply in the last 8k of pregnancy (Lo et al., 2007).
Noninvasive prenatal testing (NIPT) has been widely used in clinical practice. When NIPT is used to detect fetal chromosome aneuploidy, low cffDNA concentration will lead to a false negative. Therefore, cffDNA concentration ≥4% is considered an important quality control indicator for noninvasive prenatal testing (Liu et al., 2018). In addition, the accurate estimation of the fetal DNA concentration is particularly important in the detection of noninvasive single-gene diseases (Canick et al., 2013;Nygren et al., 2010). The cell fetal DNA concentration is an important parameter for analyzing site variation. Therefore, the concentration of cffDNA must be determined before NIPT (Tsui et al., 2011). The cell-free DNA fragments are mainly below 200 bp in maternal plasma (Chan et al., 2004). The maternal plasma's cell-free fetal DNA fragments are much shorter than the DNA fragments derived maternal DNA molecules. The fractional fetal DNA concentration is a paramount factor for determining the overall performance of NIPT based on the analysis of DNA in maternal plasma (Sun et al., 2019). Based on the different regional distributions derived from the length of fetal and maternal DNA-free fragments, it is possible to further explore the relationship between the two based on this principle.
The use of massively parallel sequencing (MPS) of cellfree DNA in maternal plasma for noninvasive prenatal testing has become widely adopted in prenatal care (Yu et al., 2017). The massively parallel sequencing is an approach to quantify DNA sequences for the noninvasive prenatal diagnosis of fetal chromosomal aneuploidy (Fan et al., 2010). It is a counting-based method analyzing the DNA fragments mapped to different regions of the genome in maternal plasma. The count-based methods of fetal fraction determination calculate the proportion of the number of reads mapped to chromosomes that differ in mother and fetus genotypes. They are quite reliable and can be used on samples with male or trisomic fetuses (Gazdarica et al., 2019). It can calculate fetal chromosome concentration and detect fetal chromosomal abnormalities and other related diseases through appropriate bioinformatics analysis. To determine the size distribution of DNA fragments in maternal plasma, Chan et al. (2004) developed nine real-time PCR assays to amplify different-sized amplicons targeting the leptin gene in 2004. The data generated from their study indicate that the DNA fragments in the plasma of pregnant women are significantly longer than the fetal-derived ones. The fetal DNA, which has a shorter size distribution than maternal DNA, becomes detectable (Chan et al., 2004). The research of Li et al. indicates that most maternally derived DNA molecules are considerably larger than fetal DNA molecules (Li et al., 2004). The explanation for this may be that cffDNA appears to come exclusively derived from the placenta, whereas most normal maternal cell-free DNA is a hematopoietic source (Guibert et al., 2003;Lui et al., 2002). The selective enrichment of fetal DNA sequences can be achieved by size-dependent separation.
There are a number of existing methods for the determination of fetal DNA fraction, such as the method of fetal DNA fragment estimation based on Y chromosome sequence Hudecova et al., 2014). The approach of the fetal DNA fraction determined from the proportion of chromosome Y sequences in the maternal plasma samples is simple and accurate for male fetuses but not suitable for pregnancies with female fetuses (Chen et al., 2019).
If the genotype of the biological father is available, the fetal DNA fraction can be detected by the ratio of fetalspecific alleles to the total alleles in maternal plasma DNA (Liao et al., 2011;Lo et al., 2010). However, the genotype of the biological father is not obtainable in many cases. The FetalQuant, which is an approach to estimate the fetal DNA fraction, does not need the parental genotype information (Jiang et al., 2012;Jiang et al., 2016). Its limitation is the sequencing depth is required to be very high. The cell-free fragments size ratio-based approach has been developed to directly determine fetal fraction from the plasma sequencing data because maternal plasma and fetal DNA exhibit different fragmentation patterns (Chen et al., 2019;Yu et al., 2014). In this paper, two approaches of calculating the female fetal DNA fraction are based on the DNA molecule fragments' size and counts. These approaches can directly estimate fetal DNA fraction from the high throughput sequencing data of NIPT without other operations.

Materials
Two sample sets were used, which were collected by the team of Y. M. Dennis Lo in this study (Yu et al., 2014). The first sample set includes 69 maternal plasma samples with a male fetus. These included 9 cases each with a trisomy 13 fetus, 7 cases each with a trisomy 18 fetus, 17 cases each with a trisomy 21 fetus, and 36 cases each with a euploid fetus. The second sample set 68 maternal plasma samples with a female fetus, including 8 cases each with a trisomy 13 fetus, 18 cases each with a trisomy 18 fetus, 19 cases each with a trisomy 21 fetus, and 23 cases each with a euploid fetus. Maternal peripheral blood was collected in EDTAcontaining tubes before invasive obstetrics procedures. All blood samples were collected before performing any invasive procedures (Sun et al., 2019).
All maternal plasma DNA samples were analyzed by paired-end massively parallel sequencing. The sample data contain significant information about all chromosomes (autosomes and sex chromosomes) and the counts of corresponding segments from 36 to 600 bp per chromosome. The data information is shown in Appendix 1.
Calculating the most suitable range of ratio To calculate the ratio, we first determine the range of values in the denominator. Generally, the fragment length from maternal cell-free DNA is longer than cell-free fetal DNA. We have shown that cell-free DNA fragments are mainly concentrated between 40-200 bp. We determined the overall plasma DNA size distribution for 69 male maternal plasma DNA samples. We took four samples with different fetal DNA fractions in Fig. 1. The rest of the samples are shown in Appendix 2.
In Fig. 1, the plasma sample with a higher fetal DNA fraction had a higher proportion of short fragments from 80 bp to 150 bp. It has been shown that includes the big peak at 166 bp. The plasma sample with a higher fetal DNA fraction had a lower proportion of fragments of 166 bp compared with the sample with a lower fetal DNA fraction. Therefore, we initially set the value calculation range of the ratio at 80-150 bp. We gradually narrowed the range of the interval via a traversing algorithm and finally settled on a range of DNA fragment lengths. Based on biological principles and existing research, the range of the ratio should not be too small. According to the calculation results, the maximum value of the lower boundary was set to 115 bp, and the minimum value of the upper boundary was set to 125 bp. We extended the maximum value of the upper boundary to 160 bp for proving that the value of correlation decreases as the value range increases.
The first sample set includes 69 maternal plasma samples with a male fetus. We divided it into a training set and a test set, 35 and 34 samples, respectively. We calculated corresponding the size ratio of the lower boundary from 80 to 115 in 5 bp increments and the upper boundary from 125 to 160 in 5 bp increments (Suppl. Tab. S1). We found a correlation between size ratios and fetal DNA fractions for the 35 male samples in the training group. The test results are consistent with the training results. It was shown in Tab. 1 that the fetal DNA fraction was highly consistent with the fetal concentration determined by the proportion of the Y chromosome sequence (the correlation coefficients were greater than 0.85). The calculated ratio was fitted linearly with fetal concentration, and the best fitting range was 115-140 bp. The highest value of the correlation coefficient in Tab. 1 is 0.924 (P < 0.0001, linear regression). In Tab. 1, the value of each row increased with the increase of fragment length and decreased with the increase of fragment length when reaching a certain value. In the first five rows, the maximum value of each row corresponds to 150 bp. It implied that the longer the number of fragments after 150 bp, the more likely it was to be derived from the DNA fragment in maternal peripheral blood.
The higher the fetal DNA fraction expresses, the more cell-free fetal fragments in maternal plasma. Conversely, the lower the fetal score, the more maternal free DNA fragments are present in maternal plasma. We obtained the difference between the number of segments corresponding to two samples of different concentrations (Fig. 2). To make the effect obvious, we selected samples with fetal concentrations of 5% and 21%, respectively. As shown in Fig. 2, the cell-free fetal DNA fragments from maternal plasma are roughly between 160 and 170. The optimal value interval is determined by further analysis. In Fig. 2, the curve showed a big peak, which represents the range of maternal cell-free DNA fragments. Using a cutoff value of >0.0035, we obtained the range of maternal DNA fragments size of 163 to 170 bp. Combined with the above calculation results, we determined the calculation formula of the ratio.
where size radio denotes the proportion of the sum of two segment lengths, sum (lower-upper) denotes the sum of fragments originating from 115-140 bp, and sum (163-170) denotes the sum of fragments originating from 163-170 bp.

Calculation methods of chromosome proportion
We calculated the proportion of chromosome 13, chromosome 18, chromosome 21 in each sample though using three methods. In the first method (M 1 ), the ratio is the sum of each chromosome over the sum of the total number of all chromosomes, as Eq. (2).
where P 1 denotes the proportion of target chromosome fragments in all chromosome fragments using the first method Sum chr x denotes the sum of target chromosome fragments originating from maternal plasma samples, and Sum chrall denotes the sum of all chromosome fragments originating from maternal plasma samples. The second method (M 2 ) is first to calculate the product of the size of the target chromosome fragments and the corresponding number. Eq. (3) shows the proportion of every chromosome.
where P 2 denotes the proportion of target chromosome fragments in all chromosome fragments using the second method. Size chr x Â Sum chr x denotes the product of the size of the target chromosome fragments and the corresponding number; and Sum Size chr x ÂSum chr x denotes the sum of all chromosome fragments originating from maternal plasma samples. The third method (M 3 ) is to calculate the proportion of chromosomes by the size ratio. The following Eq. (4) was used to calculate the ratio per chromosome.
where P 3 denotes the proportion of target chromosome fragments in all chromosome fragments using the third method. Sum chr x ð80 À 155Þ denotes the sum of target chromosomes fragment lengths of 80 to 155 bp; and Sum chrall ð80 À 155Þ denotes the sum of all chromosomes fragments lengths of 80 to 155 bp. The ratio of chromosomes calculated by the three methods was analyzed and detected, and it was concluded that the ratio calculated by the first method had the best effect. Therefore, in the following calculation, the ratio of chromosomes is calculated by the first method.

Results
Calculating estimate female concentration by the size of fragments Through the above series of calculations, we concluded that the optimal range of the ratio is 115-140 bp for the fetal DNA fragments and 163-170 bp for the maternal DNA fragments. The results of calculating the fetal DNA fractions of 69 male fetuses by the size-ratio were highly concordant with the fetal DNA fractions determined from the proportion of chromosome Y sequences in the maternal plasma samples with male fetuses. In both male and female fetuses, the length of cell-free fetal DNA fragments was shorter than that of maternal cell-free DNA fragments. The fetal and maternal segments were distributed over a range of areas. Therefore, we can use this fragment size-based approach to calculate the concentration of female fetuses.
The approach based on the size of fragments was used to calculate the fetal DNA fraction in 68 female cases. We found a positive correlation between the size ratio obtained from the fetal concentration and the fetal DNA fraction determined from the proportion of chromosome Y sequences in the maternal plasma samples. The calculation formula is as shown in Fig. 3. The correlation coefficient is 0.9243 (P < 0.0001, linear regression). The regression equation (Eq. (5)) obtained from the 69 male maternal plasma samples is used to calculate fetal DNA fraction in 68 female maternal plasma samples: Fig. 4 shows the concentrations of 68 female fetal samples with a trisomy 21 fetus, most of which range from 5% to 30%. The first 8 samples were the maternal plasma samples, each with a trisomy 13 female fetus. The 18 blue bars were the maternal plasma samples, each with a trisomy 18 female fetus. The 19 green bars were the maternal plasma samples, each with a trisomy 21 female fetus. And then, the last 23 samples were the maternal plasma samples, each with a euploid female fetus.

Calculating fetal DNA fraction by aneuploid
The three most common autosomal aneuploidies are trisomy 21 syndrome (Down syndrome), trisomy 13 syndrome (Patau syndrome), and trisomy 18 syndrome (Edwards syndrome). Normally, fetal chromosomes should be diploid, meaning the number of autosomes per chromosome is two. If the embryo is triploid through chromosome testing, there is one more chromosome than normal fetus. We can use this property to calculate fetal DNA fraction of an aneuploidy fetus. We have 33 aneuploid male maternal plasma samples included 9 cases each with a trisomy 13 fetus, 7 cases each with a trisomy 18 fetus, 17 cases each with a trisomy 21 fetus. We used the following equation to calculate fetal DNA fraction, the results are shown in Fig. 5.
where T chrx is the trisomy 13,18,21 sequence tag density of aneuploid fetus; the D mean chrx is the mean trisomy 13, 18, 21 sequence tag density of 36 pregnancies with a euploid male fetus.  We calculated fetal concentrations for 33 male samples with aneuploid fetuses using Eq. (6). The analysis verified that there was a high correlation between fetal concentration calculated using chromosome 21 and fetal concentration calculated using chromosome Y. We calculated the relationship between the concentration of 21 chromosome and fetal DNA fraction in male fetuses (Chu et al., 2009) (Fig. 5). We found a positive correlation between the fetal DNA fraction through the fragments size-based approach and the fetal DNA fraction through the aneuploid-based approach (r = 0.9339, P < 0.0001, linear regression). When a fetus had autosome aneuploidy, its corresponding chromosome ratio would increase. We know that if there are extra or missing chromosomes, the number of corresponding fetal DNA fragments will increase or decrease. The fetal DAN fractions by aneuploid are highly accurate, so we can use this principle to predict fetal concentration.
We used the aneuploidy-based method to calculate 45 female cases with aneuploid fetuses. The calculation results were shown in Fig. 6. It can also be seen from the scatter diagram that fetal concentration is mainly concentrated at 5-20%, which is consistent with early studies (Lo et al., 1999). The black squares represent fetal concentrations determined from the fragments size-based approach. The red squares represent fetal concentrations determined from the aneuploidy-based approach. In Fig. 7A, Pearson's correlation coefficient of the two sets of data is 0.8263   (P < 0.0001, linear regression). Fig. 7B shows that the residuals are normally distributed, zero centered, and with a standard deviation equal to 0.0504.
Among the 45 samples, the fetal DNA fraction of 19 female samples with a trisomy 21 fetus obtained the best result by using Eq. (6). We compared the calculation results with 19 female samples with a trisomy 21 fetus in 3. 2. When the two sets of data are compared, the results show that they are very similar. Fig. 8 shows that the percentages of the two results are close to 50%, which indicates that the two results are close to each other. We can apply the aneuploid-based method to calculating fetal fraction with an aneuploid female fetus. The fetal DNA fraction of the euploid female fetus is calculated by the size of fragments.

Discussion
The use of cffDNA could in theory reduce the number of invasive prenatal diagnostic procedures by 50% . For pregnant women, this greatly reduces the risk of miscarriage and other problems. In general, the concentration of cell-free fetal DNA fragments should be measured before NIPT is performed. Detection of fetal aneuploidy and autosomal recessive diseases with cffDNA is particularly challenging because only a small proportion of cell-free DNA in maternal plasma is derived from the fetus (Fan et al., 2010). The maternal DNA background has significant practical limitations on sensitivity for almost all prenatal diagnostic assays, and thus the proportion of fetal DNA in the maternal plasma is a key parameter. Due to the development of multiple detection methods, fetal DNA has many potential clinical applications in maternal plasma (Lui et al., 2002). Fetal DNA in maternal plasma is a convenient source of fetal genetic material, which can be safely obtained by collecting maternal peripheral blood samples (Li et al., 2004). An efficient fetal DNA fraction estimation method is necessary for NIPT. It is particularly important for the future development of NIPT that is the clinical application of noninvasive single-gene disease (Peng and Jiang, 2017). Cell-free DNA in maternal plasma consists of a mixture of fetal and maternal DNA. The counting approach enumerates both fetal and maternal DNA molecules in a maternal plasma sample. By using the difference in the length distribution of the cffDNA and the maternal DNA, the non-positiveness of the chromosome can be distinguished accordingly such as Down syndrome . For instance, the ratio of the concentration of the sequences from the Y chromosome to that of an autosome was used for the determination of fetal DNA fraction (Mckanna et al., 2019;Straver et al., 2016).
In the context of NIPT using massively parallel sequencing, the proportion of all sequence reads from the Y chromosome can be translated to the fetal DNA fraction.  The orange columnar represents the fetal fraction calculated using aneuploidy. The blue columnar represents the fetal fraction calculated using the size ratio.
The concentration of cffDNA can affect the accuracy of NIPT results and has potential clinical diagnostic value. In this paper, we used various cffDNA concentration estimation methods (Bestor, 2000;Chu et al., 2010). Many methods have been used to calculate the fetal fraction of male fetuses, but there are few methods for calculating female fetuses. In this paper, we verified two calculation methods of female fetal DNA fraction. Cell-free DNA size ratio-based approach and aneuploid-based approach were used to calculate the fetal DNA fraction of females. The results of these two methods are well-validated.
In the study, we calculated the proportion of different chromosome segments. It was found that the length of chromosome fragments was mainly concentrated between 40 and 200 bp. Through data analysis, we concluded that fetal segment length was mainly concentrated in the range of 115-140 bp and maternal segment length was mainly concentrated in the range of 163-170 bp. It provided a theoretical basis for the calculation of female fetal concentration. We calculated female fetal DNA fraction based on it and used the fragments size-based approach and aneuploid-based approach to calculate female fetal concentration.
The determination of female fetal concentration is still an urgent problem. Our method verifies the accuracy of the method by calculating the concentration of male fetuses, which is the premise of our work for calculating the concentration of female fetuses. The approaches are applied to calculate the concentration of female fetuses based on chromosome fragments and have a simple operation. The amount of cell-free fetal DNA in maternal plasma is extremely small, and the length of the pregnancy may affect the accuracy of fetal concentration assessment. The accuracy of our method is affected by fetal concentration. The amount of data is insufficient, and we expect to verify our method with more data in the future. In the future, further verification and clinical experiments will be required to improve our method.

Conclusions
This paper not only calculates and verifies the DNA fraction of a male fetus but also puts forward the calculation method of female fetus DNA euploid and aneuploid. The research of fetal DNA fraction is important for the performance of NIPT and its clinical assessment. Before NIPT is performed, a certain concentration is generally required to ensure the accuracy of the results. The fetal concentration can be estimated from fetal cell-free DNA fragments in maternal plasma. It has a valuable reference point for fetal clinical diagnosis.