Predictive models for cumulative confirmed COVID-19 cases by day in Southeast Asia

Coronavirus disease 2019 outbreak has spread as a pandemic since the end of year 2019 This situation has been causing a lot of problems of human beings such as economic problems, health problems The forecasting of the number of infectious people is required by the authorities of all countries including Southeast Asian countries to make a decision and control the outbreak This research is to investigate the suitable forecasting model for the number of infectious people in Southeast Asian countries A comparison of forecasting models between logistic growth curve which is symmetric and Gompertz growth curve which is asymmetric based on the maximum of Coefficient of Determination and the minimum of Root Mean Squared Percentage Error is also proposed The estimation of parameters of the forecasting models is evaluated by the least square method In addition, spreading of the outbreak is estimated by the derivative of the number of cumulative cases The findings show that Gompertz growth curve is a suitable forecasting model for Indonesia, Philippines, and Malaysia and logistic growth curve suits the other countries in South Asia © 2020 Tech Science Press All rights reserved


Introduction
Coronavirus disease 2019 is an emerging infectious disease which is shortly named as COVID-19. The human's lung which is easily attacked by Coronavirus is the main part of human body because there are a lot of receptors which match the spike proteins of Coronavirus. Then, Coronavirus will damage the human lungs. This is the main point of severe acute respiratory syndrome in the human respiratory system. The patient zero which gets the Coronavirus 2019 disease of the world was found at the end of year 2019 in Wuhan, China. Consequently, the disease has spread to all areas and it has caused the health problems and economic problems for all counties in the world [1,2]. Southeast Asia is one of the areas in the world have been affected by the outbreak of COVID-19. Southeast Asia consists of 11 countries, Brunei, Cambodia, East Timor, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, and Vietnam. It is located in the tropical zone of the world and the disease spreads in the summer season (around March to May, 2020) of Southeast Asia. The situation (May 30, 2020) of COVID-19 transmission in Southeast Asia is likely declining. However, one of the causes of COVID-19 spreading in Southeast Asia is evacuation of migrant workers who are vulnerable populations to their hometown when each country in Southeast Asia acts suddenly lockdown policy. They cannot abide by the International Health Regulations. For example, the social and physical distancing cannot practice to this group due to a lot of living in migrant workers' dormitories. Then, the migrant workers have become disease carriers and transmit the disease to their countries. However, the authorities of each country have detected this group to the state quarantine process in order to control the spreading of disease [3][4][5].
To control the outbreak of disease, forecasting of the number of infectious people is very important to make a decision and develop policy of the authorities. A lot of research about estimating growth curve of population and predictive models has been studied. Forecasting of the future and distant changes of new processes and products by using logistic curve was studied [6]. Bacterial growth experiments and population dynamic equations based on logistic growth curve was investigated [7]. Fitting data was used by logistic growth curve to develop the study of empirical description of plant growth [8]. Researchers analyzed and fitted Gompertz growth curve to study the bacterial growth, and tumor growth. Also, the parameters of logistic and Gompertz growth curve are estimated by the least square method. The growth curve based on Gompertz growth curve was analyzed by using mathematical and statistical analysis [9]. Gompertz growth curve was used for estimation of the bacterial growth [10,11]. Gompertz growth curve was initially fitted by data of tumor growth [12]. Estimated parameters of logistic growth curve was approximated by using the least square method [13]. Data and estimated parameters were fitted by logistic growth curve based on the least square method [14]. The least square method was adopted to estimate parameters of Gompertz growth curve [15]. Remarkably, the number of infectious cases of COVID-19 looks like S-curve. Also, logistic and Gompertz growth curve shape S-curve with different parameters to represent in the cases. Furthermore, a lot of research has been studied about spreading of COVID-19. The spreading of COVID-19 of 11 European countries was studied during starting time of outbreak until 4 May, 2020-after a big outbreak at China. The effects of non-medical measures such closure of academic institutes and lockdown policies were investigated. The proposed models which can be backward calculated from the death cases to estimate the spreading with allowing for the lag time between infectious and death time. The data were internationally merged for the time-varying reproduction number. The estimation was reported that the non-medical measures derived the reproductive number to be less than one with confidence interval at 99%. Briefly, the 11 European countries will have 12-15 millions of infected individuals at 14 May, 2020. The non-medical measures especially lockdowns effects to decrease the COVID-19 spreading [16]. After the spreading of COVID-19 at Wuhan, China, the outbreak threatened the health systems of all the world. The conceptual model for dynamical system of COVID-19 outbreak was studied at Wuhan, China with Chinese authority such as establishing special hospitals and limitation in immigration of people across borders. The individual behaviors were are restricted and studied to correspond to the authorities' policies such as extension of holiday, measures of quarantine, etc. The models were estimated by with two main components based on the 1918 influenza pandemic and the trend of pandemic with reporting ratio [17]. The outbreak of COVID-19 has transmitted to all countries in the world. The number of infectious cases in the world is increasing. The events, festivals, and tournaments have been canceled or postponed as well as the academic institutions were closed more than 14 days, incubation period of COVID-19. This outbreak affects the economics and health system in the world. The mathematical model (dynamical system) based on differential equation, SIR compartmental model, of COVID-19 spreading was modelled to describe the behavior of COVID-19 outbreak. The parameters in the model were estimated to analyze the COVID-19 outbreak situation in China for developing the efficient measures to control the spreading [18]. The outbreak model of COVID-19 and analysis of effects from lockdown policy in India were conducted. The assumptions of model are based on lockdown policy, quarantine, and infectious cases. The trend and rate of COVID-19 spreading of 18 states in India were forecasted. The results showed that the peak time and ending time of spreading in India will be July, 2020 and March, 2021, respectively. Besides, the number of infectious cases in India was approximately estimated over 19 lakhs [19]. The dynamical model with time of COVID-19 outbreak in China, Italy, and France was analyzed and proposed. The simple susceptible-infected-recovered-deaths model as dynamical model for this research indicated that the kinetic parameter did not affect to the analysis of outbreak each country whereas the infection and death rates appear to be more variable [20]. The reproductive number of COVID-19 was estimated and presented on the difference and parameter selection which reflect the dynamical transmission with the reproductive number. The risk of transmission on COVID-19 outbreak was also estimated. The method for detecting infectious cases and risk assessment on COVID-19 outbreak improved rapidly by sensor technology to shorten detection time and led to quickly diagnose the cases. Moreover, the diagnosis of cases through dynamical system based on the reproductive number was proposed. Then, the suitable measures, self-quarantine and isolation, were intervened to control the spreading and to evaluate risk of COVID-19 transmission [21].
However, forecasting models for estimating the growth curve of COVID-19 infectious people are a novel topic. The number of COVID-19 infectious people is interesting to compare the results from two forecasting models, logistic and Gompertz growth curves. Therefore, the main purpose of this research is to construct the forecasting models for prediction of the cumulative number of COVID-19 cases by day in Southeast Asia. Logistic which is symmetric growth S-curve with respect to its inflection point and Gompertz which asymmetric growth S-curve are selected to be the fitting functions of the cumulative number of cases whick look like S-curve with the least square method for the parameters estimation. Furthermore, the comparison among the models to be suitable model is based on optimization of Coefficient of Determination and Root Mean Squared Percentage Error. Next section relates to mathematical and statistical background as the materials and methods for this research. Then, the results and discussion, and conclusion are also proposed for this research.

Materials and Methods
This section presents data collection, the forecasting models-logistic growth curve model and Gompertz growth curve model-for predicting the number of daily cumulative confirmed COVID-19 cases of each country in Southeast Asia. Besides, the algorithm for forecasting models of COVID-19 cases is provided. The two growth curve models are derived from the differential equation as the followings.

Data Collection
Data or observations as the secondary data for the cumulative daily confirmed COVID-19 cases are gathered from the website Worldometer [22]. Worldometer is operated by oversea developers, researchers, and volunteers to provide statistics collected around the world. This is a website which also proposed the data about COVID-19 on real time daily. The cumulative daily confirmed COVID-19 cases as data for this research in each country on Southeast Asia are collected from February 15, 2020 (t = 0) to May 20, 2020 (t = 95). The collected data are perfect and no missing values.

Forecasting Models and Its Analytics
Let l (t) be the number of individuals based on logistic growth curve of daily cumulative confirmed COVID-19 cases of each country in Southeast Asia at time t ≥ 0. The logistic growth curve is a solution of differential equation which was initiated by Verhulst [23], where l 0 > 0 is an initial condition for initial population size of cumulative COVID-19 cases, r > 0 is a growth rate parameter, and C is a carrying capacity parameter. The solution of Eq. (1) can be carried out by partial fraction and separable method [24] as Taking exponential function into both sides, the result is With initial condition l (t = 0) = l 0 , the result is obtained as Let g (t) be the number of individuals based on Gompertz growth curve of daily cumulative confirmed COVID-19 cases in Southeast Asia with growth curve D which is intrinsic growth rate and represents growth rate per capita at time t ≥ 0. Physically, this rate is the parameter to control Gompertz growth curve to be S-curve which represents the COVID-19 curve of the number of infectious cases. The Gompertz growth curve is also a solution of differential equation [25] based on an exponential growth differential equation With initial condition γ (t = 0) = γ 0 , the result is By substitution of Eq. (4) into Eq. (3), the result is obtained as By taking exponential function into both sides, the result is obtained as Next, analysis of logistic growth curve model and Gompertz growth curve model is an essential understanding for application the models to pandemic of COVID-19. There are three aspects for analysis of forecasting models-asymptotical behavior, the maximum growth rate behavior, and symmetrical behavior.
Asymptotical behavior of forecasting models will occur when time tends to infinity. That is, the cumulative number of COVID-19 infectious people is asymptotically estimated by the steady state which is state of system for long time in the future.
For logistic growth curve model, For Gompertz growth curve model, The maximum spreading behavior of the cumulative number of COVID-19 infectious people is the peak point of growth rate of COVID-19 cases.
For logistic growth curve model, to find the maximum growth rate of logistic growth curve with peak time, the saddle point is computed as For Gompertz growth curve model, to find the maximum growth rate of Gompertz growth curve with peak time, the saddle point is computed as Symmetrical behavior of forecasting models is the behavior of the fitted data which are the cumulative number of COVID-19 infectious people whether it is symmetric data or not.
For logistic growth curve model, it is symmetric about its saddle point.
For Gompertz growth curve model, it is asymmetric.
Generally, the definition of the first derivative of function f (t) which represents logistic or Gompertz growth curve at time of a point t * is defined as Also, it can be estimated by Let f (t; a, b, c) denotes a function of logistic or Gompertz growth curve with three parameters a, b, c for any time t ≥ 0. The parameters are estimated by fitting actual total COVID-19 cases x t for any time t ≥ 0 with function logistic or Gompertz growth curve. Here, the least square method is applied to estimate the parameters based on the minimum of the square of deviation function d (a, b, c) as

Parameters Estimation
where a represents l 0 or g 0 for logistic or Gompertz growth curve, respectively, b represents C or γ 0 for logistic or Gompertz growth curve, respectively, c represents r or D for logistic or Gompertz growth curve, respectively.
To minimize the square of deviation function, taking partial derivatives on the square of deviation function is carried out and equals them to zero. This yields the linear equation system to solve for the parameters a, b, c as

Accuracy of Forecasting Models
The accuracy of forecasting model is checking the validation of forecasting model with a comparison between forecasted value and actual value via the Root Mean Squared Percentage Error, appropriation of forecasting model via the Coefficient of Determination, and the trust of forecasting model via confidence interval. Let f (t) be the actual value andf (t) be the forecasted value.

Root Mean Squared Percentage Error (RMSPE)
Coefficient of Determination (R 2 ) where SE is the standard error of f (t) and df is the degree of freedom.

Algorithm for Forecasting Model of COVID-19 Cases
Let θ be a set of parameters for logistic or Gompertz growth curve. 934 CMES, 2020, vol.125, no.3

Algorithm 1: Parameter estimation and Comparison of the predictive model
Step 1: Setting actual value Population ← the actual number of cumulative COVID-19 cases Time ← 0 to length(Population) − 1 with step 1 Step 2: Setting initial value for predictive models r, D ← Rate of change of present Time with respect to previous time θ l ← initial setting l 0 , C, r for logistic growth curve θ g ← initial setting g 0 , γ 0 , D for Gompertz growth curve Step 3: Applying the least square method for parameter estimation Fitted value ← Leastsquare(growth curve, θ, Population, Time) Step 4: Calculating error to compare predictive models between actual value and fitted value to be a suitable model based on the minimum of root mean squared percentage error and based on the maximum of coefficient of determination Step 5: Forecasting the future value and its confident interval Step 6: Comparing the suitable predictive model for spreading of COVID-19 by using the derivative

Results and Discussion
This section illustrates the results which consist of estimated parameters of forecasting models and validation of forecasting models. Also, interpretation of results is discussed and analyzed the number of total COVID-19 cases and of its spreading in Southeast Asia.

Estimated Parameters and Forecasting Models
This section provides the estimated Parameters of forecasting models, logistic growth curve and Gompertz growth curve, for countries in Southeast Asia. Tab. 1 shows the estimated parameters of forecasting models, logistic growth curve and Gompertz growth curve. The selected model is based on the maximum of Coefficient of Determination which informs how the independent variable (time) can be explain the dependent variable (the number of total COVID-19 cases) and the minimum of Root Mean Squared Percentage Error. The results show the suitable forecasting model for each country in Southeast Asia as follows: The suitable forecasting model for Indonesia, Philippines, and Malaysia is Gompertz growth curve which is asymmetric. The suitable forecasting model for Singapore, Thailand, Vietnam, Cambodia, Brunei, Myanmar, East Timor, and Laos is logistic growth curve which is symmetric.

Spreading of COVID-19 and Forecasting the Number of Total Cases in Southeast Asia
The results of behavior and estimating spread of COVID-19 cases in Southeast Asian Countries are provided in this section.
Tab. 2 shows the asymptotical infectious size, peak time of spreading which is calculated at saddle point, and the maximum growth rate which is calculated by the derivative at peak time of COVID-19 spreading in Southeast Asia. The appsroximate total infectious population size during outbreak is asymptotical infectious size. The time for the fastest spreading (peak time) of the COVID-19 in Southeast Asia is occurred at the saddle point. Also, the time in which the spread of the disease starts to slow down is after the peak time. Indonesia has just passed the peak time for approximately 15 days, approximately 25 days for Singapore, and approximately 35 days for Philippines, with respect to starting February 15, 2020 to ending May 25, 2020. Peak time is important because the spread after peak time will slowly decrease. The first two countries in Southeast Asia of the maximum growth rates of spreading are Singapore and Indonesia respectively. The duration of approaching peak time of Indonesia is longer than the other countries in Southeast Asia. The maximum of asymptotical infectious size is Singapore, about 0.526%. The time for the fastest spreading of the COVID-19 in Singapore is about 74.366 days after the beginning outbreak (t = 0). At the peak of pandemic, approximately 912.194 new people get the disease per day in Singapore. In the same way, the time for the fastest spreading of the COVID-19 in Indonesia is about 86.173 days after the beginning outbreak (t = 0). At the peak of pandemic, approximately 421.487 new people get the disease per day in Indonesia.  On the other hand, the spreading in Philippines fluctuates though it has exceeded peak time. After peak time, the spreading in Philippines is continuously sideway and the suitable forecasting model for Philippines is Gompertz growth curve which is asymmetric.
The outbreak in Singapore is very severe in Southeast Asia due to a lot of the number of total cases but it has passed the peak time and the spreading has gradually declined after peak time. It is likely that the number of total cases in Singapore will continuously decrease because the suitable forecasting model for Singapore is symmetric growth curve which is logistic growth curve model.
On the other hand, Indonesia is one of the countries that the spreading of COVID-19 is severe because it is continually increasing. However, the suitable forecasting model for Indonesia, Gompertz growth curve which is asymmetric, shows that the spreading in Indonesia has just passed the peak time for approximately 15 days. It delays in passing the peak time with comparing to the other countries in Southeast Asia. It is possible that the spreading in Indonesia is still increasing because the suitable forecasting model is asymmetric.
Moreover, the forecasted number of total COVID-19 cases and confidence interval with 95% confidence of total COVID-19 cases in Southeast Asian are demonstrated in Tab. 3. To sum up, three countries should carefully monitor the situation of COVID-19 spreading are Indonesia, Singapore, and Philippines.

Conclusion
This research studies the forecasting models to estimate the number of cumulative COVID-19 cases in Southeast Asian countries. The forecasting models-logistic growth curve which symmetric and Gompertz growth curve which is asymmetric-are adopted to estimate the COVID-19 cases in Southeast Asian countries. The suitable forecasting model, logistic or Gompertz growth curve, is selected by the maximum of Coefficient of Determination and the minimum of Root Mean Squared Percentage Error. The spreading based on estimate derivatives of COVID-19 cases is compared among the countries in Southeast Asia. Furthermore, the forecasted number of cumulative COVID-19 cases and its confidence interval are also demonstrated. The findings showed that the suitable forecasting model for Indonesia, Philippines, and Malaysia is Gompertz growth curve which is asymmetric. These countries are in the severe spreading group in Southeast Asia. On the other hand, the number of cumulative infectious people of the other countries is estimated by logistic growth curve, which is a suitable forecasting model and is symmetric. The countries in case of logistic growth curve model except Singapore are the slight spreading group in Southeast Asia. The results imply that the forecasting model which is symmetric such a logistic growth curve model can help the authorities' decision to control the spreading COVID-19 easier than case of asymmetric such a Gompertz growth curve. That is, logistic growth curve will start to decline for the half distance or symmetry property of logistic growth curve slow after peak time. When it comes spreading in Singapore, the number of cumulative infectious people in Singapore is estimated by logistic growth curve which is symmetric. After peak time, the number of cases in Singapore will slowly decline by the symmetry property of logistic growth curve. Under comparison of the derivative amongst all countries in Southeast Asia, Singapore, Indonesia, Philippines, and Malaysia are the countries in which COVID-19 spreading is quite severe. Singapore has passed the peak time for approximately 25 days. Philippines has passed the peak time for approximately 35 days. Malaysia has passed the peak time for approximately 55 days. However, Indonesia has just passed the peak time for approximately 15 days. Also, the duration of approaching peak time of Indonesia is longer than the other countries in Southeast Asia. It implies that Singapore, Indonesia, Philippines, and Malaysia should carefully monitor COVID-19 spreading situation. Furthermore, the forecasted infectious cases in the future of all countries in Southeast Asia for t = 96 to t = 99 are in 95% of confidence interval.
However, the external factors can increase the number of cases in each country because some cases infected from outside country and can be detected by the authorities at the airport. The forecasting models do not include such the cases and it based on the past information to forecast the future. The future research should be focused on adding the exogenous variable or external variable to the models and establish the better forecasting models. Also, the active infected cases should be focused for estimation because these cases directly relate to the medical planning and controlling the COVID-19 outbreak such as the plan of the number of hospital bed and ventilator, the need for personal protective equipment, etc.