TY - EJOU
AU - Imoto, Tomoaki
AU - Shieh, Grace S.
AU - Shimizu, Kunio
TI - Discrete Circular Distributions with Applications to Shared Orthologs of Paired Circular Genomes
T2 - Computer Modeling in Engineering \& Sciences
PY - 2020
VL - 123
IS - 3
SN - 1526-1506
AB - For structural comparisons of paired prokaryotic genomes, an important topic in
synthetic and evolutionary biology, the locations of shared orthologous genes (henceforth
orthologs) are observed as binned data. This and other data, e.g., wind directions recorded
at monitoring sites and intensive care unit arrival times on the 24-hour clock, are counted
in binned circular arcs, thus modeling them by discrete circular distributions (DCDs) is
required. We propose a novel method to construct a DCD from a base continuous circular
distribution (CCD). The probability mass function is defined to take the normalized values
of the probability density function at some pre-fixed equidistant points on the circle. Five
families of constructed DCDs which have normalizing constants in closed form are
presented. Simulation studies show that DCDs outperform the corresponding CCDs in
modeling grouped (discrete) circular data, and minimum chi-square estimation outperforms
maximum likelihood estimation for parameters. We apply the constructed DCDs, invariant
wrapped Poisson and wrapped discrete skew Laplace to compare the structures of paired
bacterial genomes. Specifically, discrete four-parameter wrapped Cauchy (nonnegative
trigonometric sums) distribution models multi-modal shared orthologs in Clostridium (Sulfolobus)
better than the others considered, in terms of AIC and Freedman’s goodness-of-fit test. The result
that different DCDs fit the shared orthologs is consistent with the fact they belong to two kingdoms.
Nevertheless, these prokaryotes have a common favored site around 70° on the unit circle; this
finding is important for building synthetic prokaryotic genomes in synthetic biology. These DCDs
can also be applied to other binned circular data.
KW - Bacterial genomes
KW - circular distribution
KW - goodness-of-fit test
KW - modeling
KW - synthetic and evolutionary biology
DO - 10.32604/cmes.2020.08466