Open Access
ARTICLE
Comparison of Physical, Gaussian Process, and Physics-Informed Gaussian Process Models for Wind Turbine Power Curve Estimation
1 Department of Digitalization, Area of Systems Engineering and Automatic Control, University of Burgos, Avda. Cantabria, s/n, Burgos, Spain
2 Department of Informatics, University of Valladolid, P.° de Belén, 15, Valladolid, Spain
* Corresponding Author: Samuel Martínez-Gutiérrez. Email:
(This article belongs to the Special Issue: Intelligent Control and Machine Learning for Renewable Energy Systems and Industries)
Computer Modeling in Engineering & Sciences 2026, 147(3), 25 https://doi.org/10.32604/cmes.2026.081247
Received 26 February 2026; Accepted 06 May 2026; Issue published 30 June 2026
Abstract
Accurate modelling of power production in wind power systems is essential for optimizing their real-time operation and meeting technical or economic objectives. However, the precise modelling of wind turbine power output remains challenging, particularly when relying on conventional parametric models, which often struggle to capture complex or non-linear behaviors. This paper compares three modelling approaches to estimate the power produced by a real wind turbine (a Senvion MM82/2050 located in France): one parametric, based on analytical expressions of the power coefficient CP(λ, β); another nonparametric, which uses Gaussian processes (GP) to probabilistically model the relationship between operating variables and the power generated; and a third semiparametric approach, which uses a physics-informed GP that explicitly incorporates the wind conversion model based on the power coefficient CP(λ, β) within the Gaussian process as a mean function. Parametric models are efficient, interpretable, and useful when the underlying system model is known; however, they exhibit less predictive power in the face of complex behavior. In contrast, GPs offer greater flexibility, quantify uncertainty, and adapt to complex patterns in the data; however, their extrapolation outside the training range is limited and can lead to erroneous or even physically impossible predictions. The physics-informed GP integrates physical knowledge about the conversion of wind speed to power, improving the estimations outside the training range. Four model estimation procedures were performed using real data obtained from the SCADA system: the first one, retrieves the parameters of the power coefficient Cp from the physical model; the second estimates the hyperparameters of the GP; the third simultaneously estimates both the Gaussian process hyperparameters and the power coefficient parameters of the physics-informed GP; and the fourth computes only the hyperparameters of the physics-informed GP, keeping the optimal power coefficient parameters obtained in the first procedure. The fitting results were analyzed using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) as metrics, as well as the time required for fitting/training. The results show that the parametric approach has a lower predictive capacity than the GP and physics-informed GP. The latter has an RMSE that is slightly lower than that of the standard GP and makes more accurate predictions in regions with limited or no data availability. The results also show a trade-off between accuracy and computational efficiency, the physics-informed GP has a training time considerably longer than that of the other two models, nevertheless, it is a valuable tool when prediction robustness is a priority. Finally, the results highlight the need to include additional explanatory variables to better capture the observed dispersion and the effect of the high short-term variability of the 1-min SCADA measurements on model fitting.Keywords
The power curve of a wind turbine is defined as the relationship between the wind speed incident on the rotor and the electrical power generated by the turbine. This relationship is essential for wind resource assessment, design and sizing of wind energy-based systems, and monitoring of turbine operational performance. Manufacturers typically derive this curve following the IEC 61400-12-1 standard [1], which provides only discrete points, either in tabular or graphical form, under standardized and controlled conditions. However, the actual power produced by a wind turbine often deviates from the manufacturer-provided curve due to real operating conditions. Consequently, a research field has emerged dedicated to the advanced modeling of these curves, with the aim of more accurately representing turbine behavior and its variations under real-world operating conditions.
Sohoni et al. [2] present a critical review of the different techniques used for power curve modeling and their applications in wind-energy systems. These approaches can be grouped into categories such as discrete, deterministic or probabilistic, parametric or nonparametric, stochastic, and those based on either manufacturer-specified data or real operational data. These classifications can be combined to create a wide variety of model types. In parametric models, the relationship between the inputs and outputs is prescribed using a set of mathematical equations containing a finite number of parameters, whereas nonparametric models do not assume the functional form of the underlying phenomenon. In the context of system identification, these categories are often associated with white-box and black-box modelling paradigms, respectively. For a detailed and systematic review of the classifications and methods used for modelling the wind turbine power curve, refer to [2,3].
Numerous models have been reported in the literature based on parametric approaches, including logistic [4] and polynomial [5] models. One of the most comprehensive models is based on the physical model of wind-to-power conversion, in which the power coefficient CP, a variable representing the fraction of incident wind power that a turbine can convert into useful energy, is explicitly incorporated [6,7]. These models assume a predefined mathematical expression for the power coefficient and are typically effective when limited information is available, such as during early stage design or for general estimation purposes. More complex physics-based models can be obtained using computational fluid dynamics (CFD) to simulate the interaction between the turbine and flow. These models are highly informative and physically interpretable; however, they often involve significant computational costs [8].
Conversely, nonparametric models, such as those based on neural networks [9,10], support vector machines [11,12], or Gaussian processes (GP) [13], can capture more complex relationships among variables and are better suited to real operational contexts with multiple sources of uncertainty and noise. GPs provide a flexible probabilistic regression tool that can capture non-linear relationships and quantify predictive uncertainty. Prior work has shown that incorporating additional operational variables (e.g., rotor speed and blade pitch) [14] can improve GP-based power curve models compared to using wind speed alone, and heteroscedastic GP formulations [15] have been explored to better represent the noise variability in SCADA data. However, like many purely data-driven methods, standard GP predictions can deteriorate when the operating conditions fall outside the support of the training data, which may result in unrealistic extrapolation and inflated predictive uncertainty. In addition, unlike models with predefined formulations, they require a large amount of data for training.
Semiparametric models are obtained by combining parametric and nonparametric approaches and are commonly referred to as grey-box approaches in system-identification terminology. In these models, the parametric component defines the overall behavior of the power curve, whereas the nonparametric component rectifies the local deviations from the parametric model. Depending on the type of parametric model used, the resulting semiparametric model may or may not be considered physically informed. Physics-informed learning involves incorporating established physical knowledge, such as theoretical equations or conservation laws, directly into a machine learning model [16]. To further contextualize these modelling categories within wind energy applications, representative turbine- and wind-farm-level hybrid studies are briefly reviewed next.
Regarding non-physically informed semiparametric models, [17] proposes a hybrid GP formulation that uses a non-physical mean function, such as logistic trends. In contrast, physics-informed semiparametric models explicitly embed the physical structure in the parametric component. For instance, [18] proposes an additive physics-informed hybrid model in which a physics-inspired wind-to-power conversion equation provides the backbone, while neural networks are used (i) to approximate the power coefficient Cp (with an explicit Betz-limit constraint) and (ii) to learn a residual correction term. In this formulation, Cp is learned via a neural approximation rather than through an explicit analytical Cp(λ, β) parameterization, as in our approximation. At the wind farm scale, other studies have focused on aggregated power prediction and wake effects, often relying on predominantly black-box modelling approaches [19].
Focusing specifically on physics-informed Gaussian-process (GP) models, a review is provided in [20], including several engineering examples. In the wind energy domain, a mean-function-based grey-box GP approach is presented in [21], where wind speed and temperature are used to define the mean function. However, that study is demonstrated in a simplified setting using synthetic data and a logistic trend. In contrast, our work derives the GP mean function directly from the fundamental wind-to-power conversion equations and validates the approach using high-resolution real-world SCADA data.
The objective of this study is to compare three representative approaches for estimating the power curves of a wind turbine: a parametric model based on the physical wind-to-power conversion model, a nonparametric Gaussian model that makes no physical or structural assumptions, and a physics-informed Gaussian process that incorporates the physical wind-to-power conversion model. All models are fitted using the same real dataset obtained from the SCADA system of an operating wind farm. The comparison focuses not only on the model accuracy but also on the practical applicability relative to the intended purpose. This analysis provides a critical overview of the advantages and limitations of each approach. The main contributions are as follows: (i) a systematic comparison and evaluation of the proposed modelling approaches using identical inputs, data, and a consistent validation protocol across all models; (ii) a physics-informed GP that uses a wind-to-power conversion physical model as an informative mean function, thereby guiding predictions toward physically plausible behavior, especially in sparsely sampled or out-of-domain regions; and (iii) an explicit treatment of parameter identification of the Cp(λ, β) surface, including sequential and joint training strategies, to assess accuracy, extrapolation behavior, and physical consistency.
The remainder of this paper is organized as follows: Section 2 presents the formulation of the three models; Section 3 describes the procedure used to fit each model to the measured data; Section 4 provides information on the dataset employed; and Section 5 compares and discusses the obtained results. Finally, Section 6 summarizes the main conclusions and outlines future lines of research.
This section presents the formulations of the three models proposed for estimating the power produced by a wind turbine. Section 2.1 presents the theoretical parametric model of the power curve of a wind turbine, and Section 2.2 presents two versions of the use of Gaussian processes to model the same power curve: a classical Gaussian process and a physics-informed Gaussian process.
2.1 Parametric Model Based on Physical Principles
The generated mechanical power, Pm, for a given turbine that is fully aligned with the wind direction can be expressed as shown in Eq. (1).
where ρ is the air density (kg/m3), R is the rotor radius (m), v is the wind speed (m/s), and CP is the power coefficient (dimensionless). To obtain the generated electrical power P, the generator efficiency must be included in expression (1). For simplicity, an ideal efficiency of 1 was assumed. All variables, except for the power coefficient, are known: the electrical power and wind speed can be obtained from historical data corresponding to a specific commercial turbine model. The rotor radius was obtained from the manufacturer’s datasheet, and a constant value of 1.225 (standard density at sea level for 15°C) was used for the air density to reduce the number of variables in the problem.
The turbine studied in this paper has a pitch angle control system. For this type of turbine, the power coefficient depends on two variables: the pitch angle, β and tip-speed ratio, λ. The pitch angle is the angle of the turbine blades relative to the plane of rotation. It is used to modify the aerodynamic lift and drag forces generated by the blades, and thereby, the power produced. Increasing the pitch angle reduces the generated power, and vice versa. The tip-speed ratio (2) is a dimensionless number obtained as the ratio between the linear speed at the blade tip, ωrR (which depends on the rotor angular speed ωr in rad/s), and the wind speed v incident on the turbine rotor. Considering that the turbine operates under steady-state conditions and assuming that there is no torsion in the shaft between the rotor and the generator, the generator rotational speed ωg is the rotor rotational speed ωr times the gear ratio N.
The relationship between the power coefficient and the variables β and λ has been addressed in numerous studies, and the most common choices are polynomial, sinusoidal, or exponential functions. In this case, we used the exponential function in Eq. (3) dependent on nine parameters Ck (k = 1, …, 9). The exponential expression adopted for the power coefficient surface was chosen because this formulation has been widely used in the modelling of modern variable-speed, pitch-regulated wind turbines. It naturally produces a smooth, unimodal, and physically meaningful function, avoids the oscillatory behavior often observed in high-order polynomial models, and ensures numerical stability across the operating range. Moreover, previous studies have shown that this parameterization provides an excellent balance between flexibility and interpretability, successfully capturing the influence of both the tip-speed ratio and pitch angle in turbines similar to the Senvion MM82 considered in this study [22,23].
Therefore, the power estimation obtained using the physical model depends on the wind speed v, pitch angle β, and tip-speed ratio λ. In fact, it is the turbine control system, by acting on the pitch angle and selecting the rotor rotational speed, that achieves the extraction of the desired power at each wind speed, which manufacturers report in the power curve.
2.2 Models Based on Gaussian Processes
Gaussian processes are a nonparametric Bayesian technique widely used in supervised learning problems. Their main advantages are their ability to model complex relationships between variables without requiring a specific functional form and to provide an estimate of uncertainty. This flexibility allows GPs to be easily adapted to different types of data, making them versatile tools for a wide variety of applications.
From a more formal perspective, a Gaussian process is defined as a collection of random variables such that any finite set of them follows a joint Gaussian distribution. One of the most intuitive ways to understand GPs is through the function space view, where a GP represents a probability Gaussian distribution over continuous functions rather than over the parameters of a given model [24]. The formulation of a GP is commonly expressed using Eq. (4).
where
However, in practice, the vector of observations of the process y does not correspond directly to f(x); instead, it is affected by additive noise. Assuming that this noise is Gaussian, independent, and identically distributed, the model of the actual observations is defined by Eq. (5).
Thus, the vector of observations continues to follow a multivariate Gaussian distribution (6) that depends on the mean function m(x) and the total covariance matrix Ky, Eq. (7), where σn2 is the noise variance, I is the identity matrix, and K is the kernel matrix obtained from k(x, x′).
In the particular case of this study, the Gaussian process was used to predict the behavior of the power generated by a wind turbine. To make it comparable with the physical model described in Section 2.1, the same input variables v, β, and λ are used, thus creating the vector x = [v, β, λ]T. With regard to the kernel function, this study uses the Radial Basis Function (RBF), also known as Squared Exponential (SE), which is often recommended in the literature on Gaussian processes as a basic kernel to try, due to its ability to model continuous, smooth, and non-linear relationships between input variables. Its expression is defined as (8), where σf2 is the variance of the process and
Finally, the mean function m(x) of the Gaussian process must be selected. A common value in the literature is m(x) = 0; however, other mean functions can also be used when prior knowledge of the shape of the response curve is available. For example, in modelling the power curves of wind turbines, as mentioned in the introduction, a logistic function was used as the mean function. However, in this work, a physics-informed learning approach was adopted to set the mean function. Defining the GP mean function as equal to the physical power model allows the central prediction of the process to comply with the physical laws described in Eqs. (1)–(3), such that the GP only learns the residual discrepancy between the physical prediction and observed data.
Therefore, for comparison purposes, two different GP models are proposed: one with m(x) = 0, Eq. (9), and another with m = 0.5ρπR2v3CP(λ, β), Eq. (10).
When working with Gaussian processes, it is recommended to normalize each variable to a common range, for example, between 0 and 1, especially when using multiple input variables with different units or scales. This normalization facilitates the fitting. To determine the real value of the predicted variable, the inverse process of normalization was applied. In this paper, normalization was applied only to the zero-mean Gaussian process. However, in the case of a physics-informed mean Gaussian process, given that the variables are already directly related to the physical mean function, it was chosen not to normalize them, as it considers the scaling problem to be less relevant in this case.
In this section, the optimization problems to be solved and the software used to estimate the parameters of the physical model and hyperparameters of the Gaussian and physics-informed Gaussian models are formulated.
3.1 Physical Model: Least Squares
To obtain the nine parameters Ck (k = 1, …, 9) that define the CP(λ, β) surface, an optimization problem was formulated. The objective function in Eq. (11) minimizes the quadratic error between the actual measured power Pi and the power estimated by the model
The estimated power
Therefore, the measured variables at each point i are the electrical power Pi, rotational speed ωg,i, wind speed vi, and pitch angle βi. This optimization problem was posed and solved in Python 3.11.5 [25], using the Pyomo 6.7.1 modelling and optimization tool [26], and the IPOPT 3.11.1 solver [27].
3.2 Gaussian Process: Maximum Likelihood
To fit the hyperparameters of a Gaussian process,
In contrast to the physical model, where it was necessary to manually implement the optimization problem in Python to fit the parameters, the GPflow 2.9.1 library [28] for Gaussian process facilitates the automatic fitting of the model’s hyperparameters using specific functions that minimize the negative marginal log-likelihood, using the L-BFGS-B optimizer from the SciPy library [29].
3.3 Physics-Informed Gaussian Process: Maximum Likelihood
Two alternatives exist for training the physics-informed Gaussian process, both of which have been implemented. The first and simplest method is to establish Eq. (1) as the mean function with nine parameters
The data used in this study were obtained from the open-access repository Zenodo [31] and correspond to the French SMARTEOLE project, which studied the Sole du Moulin Vieux wind farm [32]. The data extracted from the wind farm’s SCADA system consisted of measurements taken between 17 February and 25 May 2020, of a series of variables for turbine 1 (Senvion MM82 model with a rated power of 2050 kW). The dataset consisted of 134,661 points sampled at a frequency of 1 min for measurements of the electrical power generated P, wind speed v, pitch angle β, and generator rotation speed ωg. These data were filtered according to the following four criteria: (i) the misalignment between the turbine rotor and wind direction must be less than 0.1° to consider only situations in which the turbine is fully aligned with the wind. (ii) The generator rotation speed must be positive. (iii) The measured wind speed must be within the operating range of the turbine (vcutoff-in < v < vcutoff-off). (iv) The generated power must always be positive. In addition to this filter, to avoid problems with the calculation of the power coefficient, the value of the pitch angle β was replaced with zero at all points where it was negative (values slightly below zero owing to measurement errors). Once filtered, 1131 samples were retained (Fig. 1). Since the modelling task is formulated as a static input-output mapping between operating variables (v, β, ωg) and generated power, the dataset was divided into two disjoint subsets in order to evaluate generalization to unseen operating conditions. Specifically, 80% of the samples (905 points) were used for model fitting, while the remaining 20% (226 points) were reserved exclusively for testing and were not used during parameter estimation or hyperparameter optimization. The resulting test set is sufficiently large to provide statistically meaningful error metrics and spans comparable ranges of the observed variables, ensuring a representative and non-trivial validation of the proposed models. To verify that the consecutive 80/20 split does not introduce an artificial distribution shift, we compared the empirical distributions of the main input variables (wind speed, rotor speed, and pitch angle) in the training and test subsets; the distributions were found to be broadly consistent.

Figure 1: Measured values of generated power, pitch angle, and rotor speed. The black dotted line shows the power curve provided by the manufacturer of the wind turbine.
The results section is organized into the following subsections: Section 5.1 shows the results obtained for the physical model, Section 5.2 shows the results for the zero-mean Gaussian process, Section 5.3 shows the results of the physics-informed Gaussian process, and finally, Section 5.4 compares the three models in terms of adjustment time and goodness of fit, using RMSE and MAE as metrics.
The values of the parameters Ck, obtained by solving the optimization problem defined in Eqs. (11) and (12) are listed in Table 1.

Fig. 2 shows the estimated power based on Eqs. (1)–(3) and the parameters in Table 1, with respect to the incident wind in a power–wind speed graph, where a scatter plot was formed. The same graph compares this estimated power with the measured power, which has been classified into two groups: fitting data, which were used to solve the optimization problem and obtain the parameters of the CP curve, and test data, which were not used to obtain the model but served to validate it.

Figure 2: Power estimation vs. wind speed using a physical model for fitting data (left) and the test data (right).
In general, the estimates are reasonably accurate for most points, although a greater error is observed for wind speeds between 4–7 m/s. The discrepancy between the actual and modelled power curves in this operating region may be attributed to several factors: air density variations at different wind speeds, turbine efficiency can change at different rotor speeds; therefore, at different wind speeds, and the turbine’s braking control logic when the turbine is turned on or off at low wind speeds. It is important to highlight that this discrepancy at low speeds also appears between the measured power and the power curve provided by the turbine manufacturer, the black line in Fig. 1, therefore, this behavior can be attributed to the particular operation of that turbine. In contrast, the dispersion around the estimated power at every wind speed is clearly due to transient effects and meteorological conditions through air density, which have not been modelled.
For a given wind speed, the power generated may be slightly higher or lower due to the turbine’s inertia and the behavior imposed by its control system. Moreover, the measured power values are 1-min averages that show high variability, with up to 400 kW between the minimum and maximum values within each minute (Fig. 3).

Figure 3: Measured power and its variability each minute for different wind regions: low wind (4–8 m/s), mid winds (8–11 m/s), and high winds (11–24 m/s).
The same wind speed can also correspond to different air densities; unfortunately, we do not have online air density measurements or pressure data to estimate it. However, the meteorological conditions vary significantly, as shown in Fig. 4, where the temperature changes significantly (from 5°C to 25°C) for the same wind speed.

Figure 4: Temperature at each wind speed.
The optimal values for the hyperparameters of the Gaussian process, defined in Eq. (8) are shown in Table 2.

Fig. 5 shows the prediction of the mean value of the Gaussian process over the entire dataset, plotted as power vs. wind speed with the measured point cloud. Visually, it can be observed that, in general, the Gaussian process better predicts the power dispersion, but it can also be seen how, depending on the speed range, the Gaussian process captures the variability in power better or worse. To interpret the model’s behavior across wind speed ranges, Fig. 1, which displays the wind speed, rotation speed, and pitch angle, should be used. The tip-speed ratio was not plotted directly because it was computed from the wind speed and rotation speed. A comparison of Figs. 1 and 5 reveals three regimes.

Figure 5: Power estimation vs. wind speed using a Gaussian process for fitting data (left) and test data (right).
In the low-speed regime (4–8 m/s), the Gaussian process reproduces the observed dispersion in power. Fig. 1 indicates that both wind speed and rotation speed vary in this range, while the pitch remains close to 0°. The simultaneous variation of these inputs provides the model with sufficient information to learn their joint influence on power and to represent the observed variability.
In the intermediate regime (8–11 m/s), the mean prediction shown in Fig. 5 collapses toward the cloud mean and fails to capture full variability. Fig. 1 shows that in this region, the rotor speed and pitch are essentially constant, leaving the wind speed as the only input that changes significantly. With a single effective input, the Gaussian process has limited explanatory information and therefore tends to predict the local mean.
In the high-speed regime (11–24 m/s), the fit once again reflects the dispersion. Fig. 1 evidences that wind speed and pitch vary together in this band (even if the rotor speed is relatively stable), which enables the model to learn their combined effect on power and thus capture the variability once more. Overall, the joint comparison of Fig. 5 (model mean) and Fig. 1 (input data behavior) indicates that the model’s ability to reproduce power dispersion depends on the effective variability of the input data within each operating regime.
The limitation observed in the intermediate range suggests that additional explanatory variables, such as air density, or a reassessment of the hypothesis of data stationarity may be necessary, similar to what was explained previously for the physical model.
5.3 Physics-Informed Gaussian Process
As mentioned in Section 3.3, there are two alternatives for training a physics-informed Gaussian process. The first one consists of fixing the previously obtained Ck parameters and optimizing only the hyperparameters θ. The second involves optimizing the parameters and hyperparameters together. Table 3 compares the values of the fixed Ck parameters, which were previously obtained for the physical model, with the Ck values obtained by jointly fitting the parameters and the hyperparameters. Table 4 compares the hyperparameters of both models.


Although the parameters and hyperparameters differed between the two approaches, both models obtained similar predictive results. In the training set, Table 5 shows that the RMSE was 48.49 kW for the model with fixed Ck values and 48.87 kW for the model that jointly fitted the values of Ck and θ. In the test set, the RMSE values were 46.58 and 48.92 kW, respectively. Since both physics-informed Gaussian processes produce similar results, it seems preferable to use the Gaussian process with physical information and fixed Ck values, given that it has a shorter training time, 26,372 with respect to 38,775 s. For this reason, from now on it will be the one used to display graphs and make comparisons with the other models.

Fig. 6 shows the predictions of the mean value of the physics-informed Gaussian process trained with fixed Ck parameters for the complete dataset. As in the zero-mean Gaussian process, the physics-informed model adequately reproduces the dispersion of power values, except in the speed range between 8 and 11 m/s. where a greater discrepancy can be seen. This is because of the same reason as that explained for the zero-mean GP.

Figure 6: Power estimation vs. wind speed using a physics-informed Gaussian process for fitting data (left) and test data (right).
Table 5 shows the RMSE for both the training set and the test set of the four models studied, as well as the fitting or training time of the models which are very different, from 2.68 s to fit the physical model to 10.77 h to train the physics-informed Gaussian process. There is also a clear difference in performance when forecasting between the physical model and Gaussian processes in their different variants. Regarding the different Gaussian processes, incorporating information on the expected physical behavior slightly improved the training results but considerably improved the power prediction in the test set, suggesting greater robustness of the model against combinations of values in the input variables not seen in the training.
At first glance, both the GP models outperform the physical model. For a comprehensive analysis, the RMSE and MAE indicators were calculated for each of the three wind speed regimes previously explained, and the results are listed in Table 6. Figs. 7 and 8 compare the forecasts of the physical model on the test set with the Gaussian process and physics-informed Gaussian process with fixed Ck parameters, respectively. Although the series shown in the figures are plotted in chronological order, the indices (for example, observation 1000) refer to positions within the filtered dataset and do not imply consecutive time stamps in the original time series.


Figure 7: Comparison between the measured power (test data) and the estimated power using a physical model and a Gaussian process with a 95% confidence interval for the three wind speed regions.

Figure 8: Comparison between the measured power (test data) and the estimated power using a physical model and physics-informed Gaussian process with a 95% confidence interval for the three wind speed regions.
Both physics-informed Gaussian processes provided practically identical results across all indicators and wind conditions.
In the low-wind region (4–8 m/s), both in training and prediction, Gaussian and physics-informed Gaussian processes can model the effect of wind and rotor speed on power much more accurately, with very similar fit quality, with all indices being very similar and considerably lower than those of the physical model.
In the intermediate wind region (8–11 m/s) during training, the three methods provided very similar fits, with the Gaussian process and physics-informed Gaussian process being slightly better. This is a region where none of the models has freedom, since the pitch angle β is 0° and the rotor speed ωr is constant, depending on the generated power only on the wind speed.
For high-wind regions (11–24 m/s) and in the training set, both Gaussian and physics-informed Gaussian processes performed significantly better than the physical model across all indices. However, in the test set, the physical model performed better in terms of RMSE (56.59 kW) than the pure Gaussian process (95.54 kW). This occurs because there are no training data in that region (there are five test points without any nearby training samples) and the RMSE strongly penalizes these five outliers with large fitting errors, as shown in the red circle in Fig. 7. Nevertheless, the physics-informed Gaussian process with fixed parameters substantially improved (74.27 kW) over the pure Gaussian process (95.54 kW), demonstrating the advantage of combining both methodologies, as the physical model extrapolates well in regions where there are hardly any experimental training data, as shown in the red circle in Fig. 8. In contrast, if an indicator such as MAE is used, where large and small errors are penalized equally, the differences are minimal because there are very few data points with large errors compared to the rest.
Gaussian processes have the added advantage of being able to provide a confidence interval for prediction, which is more complex to obtain in the physical model. In this specific case, a 95% confidence interval is shown for both Gaussian processes. With regard to these intervals, it can be observed that the physics-informed Gaussian process has much less uncertainty than the classical Gaussian process. This is because the mean function of the physics-informed model already captures a significant part of the actual behavior of the system. Therefore, GP only has to adjust for small deviations around that physical trend, which reduces the variability of its predictions. In contrast, the zero-mean GP must learn both the general trend and local variations, resulting in wider confidence intervals. However, in several cases, for example, at observation 906 (Fig. 9), when the prediction of the Gaussian processes deviates significantly from the actual value, the confidence interval of the zero-mean GP contains the observed data, while the interval of the physics-informed GP does not. This can be explained by two complementary reasons. The first was explained above, since the physics-informed GP incorporates the trend imposed by the physical model and therefore has less freedom to deviate from that trend. The second reason is that the physical model used as a mean function does not account for several relevant sources of variability, such as air-density fluctuations, transient aerodynamic effects, or control-system dynamics. As a result, its mean function may not fully capture the true variability of the system; in fact, all estimation methods for observation 906 fall within the range of the 1-min measured data. On the other hand, the zero-mean GP, which is not constrained by a prior physical mean, has greater freedom to make predictions.

Figure 9: Detailed view of the first 25 test samples under medium wind speed conditions. (a) Comparison between physical model and Gaussian process and (b) comparison between physical model and physics-informed Gaussian process.
Fig. 10 illustrates the power estimation results obtained in the high winds region in terms of wind speed and pitch angle. In the figure, it can be observed that the five test points identified previously lie clearly outside the training region. This helps explain why the standard Gaussian process is unable to accurately predict these unseen operating conditions; with no nearby training samples, GP predictions deteriorate. In contrast, the physical model can better reproduce the behavior in this area, likely by extrapolating from the available pitch angle information. The physics-informed GP exhibits an intermediate behavior.

Figure 10: Comparison between the measured power (training and test data) and the estimated power using (a) physical model, (b) Gaussian process and (c) physics-informed Gaussian process for high winds (11–24 m/s).
Fig. 11 shows how the generated power changes as a function of λ for different wind speeds, keeping the angle β constant at a value of 0°. Fig. 11a represents the physical model and Fig. 11b represents the results obtained with the Gaussian process. The Gaussian process reproduces the structure of the physical model quite well in areas with training data (λ between 7.5 and 11), but shows notable differences in areas without measured data, such as at high wind speeds or low and high tip-speed ratios. Furthermore, the variability in these areas increases to a level where it becomes uninformative. For instance, when λ = 2 and v = 8 m/s, the power can range from approximately −3500 to 3500 kW with a 95% confidence level. This highlights a key limitation of Gaussian processes: when the models extrapolate beyond the training range, they converge towards the mean and increase the predictive variance. This limitation is overcome by the physics-informed Gaussian process whose results are shown in Fig. 11c: the model can adapt to the training data from v = 4 m/s to 10 m/s. However, in areas where no data are available, the model is mainly guided by the established mean function m(x). Consequently, this approach also produces more realistic confidence intervals across the entire combination of variables.

Figure 11: Generated power vs. tip-speed ratio for different wind speeds and fixed β = 0° using (a) physical model, (b) Gaussian process, and (c) physics-informed Gaussian process.
If the same graphs are created for other different angles β, for example, 5° and 15°, clearly different behaviors appear, and the limitations of the zero-mean Gaussian process in predicting outside the training range become even more evident. For β = 5° (Fig. 12), where limited training data are available, the zero-mean Gaussian process, Fig. 12b, no longer resembles the behavior of the physical model, Fig. 12a, and the uncertainty of its predictions increases considerably across the entire range of values. Furthermore, for certain speed values, for example, v = 10 m/s, the zero-mean Gaussian process predicts powers other than zero when λ = 0, which is physically impossible since λ = 0 implies that the turbine is stopped. In contrast, the physics-informed Gaussian process shown in Fig. 12c does not exhibit such physically impossible behavior. The physics-informed Gaussian process behaves essentially like the physical model in areas without data, adding only a range of uncertainty, and in areas with data, as these are very limited, the GP does not significantly correct the behavior of the physical model, which continues to predominate.

Figure 12: Generated power vs. tip-speed ratio for different wind speeds and fixed β = 5° using (a) physical model, (b) Gaussian process, and (c) physics-informed Gaussian process.
For β = 15°, Fig. 13, where there are no training data in the speed range v from 4 to 14 m/s, the limitations of the zero-mean Gaussian process are even more evident. The variability of the GP increases significantly across the entire range and again predicts physically impossible behaviors in regions outside the training range; these problems are clearly corrected when physical information is incorporated, Fig. 13c, which already behaves identically to the physical model with possible variability since there is no training data for that combination of variables.

Figure 13: Generated power vs. tip-speed ratio for different wind speeds and fixed β = 15° using (a) physical model,(b) Gaussian process, and (c) physics-informed Gaussian process.
This paper presents a systematic comparison of three different modelling approaches for estimating the power curve of a wind turbine, specifically the Senvion MM82/2050, using operational SCADA data. The models compared were a physical energy-conversion model, a standard Gaussian process, and a physics-informed Gaussian process. By analyzing their behavior across different wind regimes, both inside and outside the training domain, this study provides a clear assessment of the strengths and limitations of each approach.
The results show that the standard Gaussian process achieves the best fit in regions where sufficient training data are available, thereby successfully capturing the variability inherent in real turbine operation. However, its extrapolation capability is limited, leading to physically inconsistent predictions in domains that are not represented in the training set. In contrast, the physical model delivers stable and physically plausible estimates across the entire operating range, including regions without data, although at the expense of reduced accuracy within the training domain. By combining the two models in the physics-informed Gaussian process, it is possible to combine the advantages of both: good fit in areas where training data exist and extrapolation capability outside of them.
The results show a clear trade-off between accuracy and computational efficiency. While the physical model and standard GP have fitting times on the order of seconds, the physics-informed Gaussian process requires several hours of training. At first glance, the reduction in RMSE compared to conventional GP may seem modest. However, this improvement, although small in numerical terms, has a fundamental implication: the model avoids predicting physically impossible values. In this sense, the benefit of the physics-informed Gaussian process is not limited to a slight decrease in the overall error but lies in ensuring consistency with the physical constraints of the problem. Therefore, although the computational cost is considerably higher, the model offers a key qualitative advantage by producing physically consistent predictions, which may be more relevant than the mere marginal reduction in the RMSE.
The three proposed models estimate power as a function of the main turbine operating variables: wind speed, generator rotational speed, and blade pitch angle. The Cp-based physical model is a simple analytical formulation that captures wind power extraction and can be embedded into more complex dynamic wind turbine models to simulate operation and assess different control and operational strategies. Due to their good extrapolation capabilities, both the physical model and the physically informed Gaussian process can be integrated into the turbine control system, particularly within the maximum power point tracking (MPPT) module, to determine the optimal rotational speed in the partial-load region to extract the maximum power. In contrast, purely data-driven Gaussian processes are not suitable for direct use in a dynamic simulation, but they are well suited for performance monitoring, enabling the detection of malfunctions and operational anomalies when the measured power deviates from the model predictions. However, reliable performance requires the model to be trained across all operating regions because extrapolation outside the training domain is limited.
The study also highlights the high short-term variability present in 1-min SCADA measurements and its impact on model fitting and evaluation. It is studied how, in the medium wind region, where the turbine operates with nearly constant pitch and rotational speed, the limited explanatory power of the available variables becomes evident. This suggests that additional operational or meteorological variables, such as air density estimates or the inherent process variability. In addition, the modelling approach adopted in this work treats the wind power relationship as a static mapping, neglecting the intrinsic dynamics of the system. However, the actual interaction between wind speed, rotor behavior, and generated power is inherently dynamic, involving transient effects and control-system responses that unfold over time. Ignoring these dynamics reduces the models’ ability to fully reproduce the variability observed in real operation, particularly in regions where static variables provide limited information.
For completeness, the main limitations of each approach can be summarized as follows: The physical model relies on simplifying assumptions (e.g., constant air density and idealized conversion efficiency) and cannot represent unmodelled environmental and operational effects, which limits accuracy in some regimes. The standard GP is highly effective for interpolation when the data coverage is sufficient; however, its predictions can degrade in sparsely sampled or out-of-domain regions and may become physically implausible. The physics-informed GP improves physical plausibility by relating predictions to a mechanistic mean function, but it remains affected by missing explanatory variables and entails a substantially higher computational cost. Moreover, all approaches are formulated as static input–output mappings and therefore do not explicitly capture the turbine dynamics and short-term transients present in one-minute SCADA data.
Acknowledgement: The authors acknowledge the Spanish Ministry of Science, Innovation and Universities, the Spanish State Research Agency, and the European Union (FSE+).
Funding Statement: This research was funded by financed by the Spanish Ministry of Science, Innovation and Universities, the Spanish State Research Agency, and co-funded by the European Union (FSE+), through the projects ‘Control and Planning of Processes Subject to High Variability and Uncertainty’ (CyPVar) PID2024-157718OB-C33, and ‘Optimal real-time management under uncertainty for digital twins (OptiDit)’, PID2021-123654OB-C33, and ‘Advanced Learning for Improving Productivity in Smart Factories’ (PID2021-126659OB-I00). This paper is also funded with Samuel Martínez-Gutiérrez pre-doctoral contract for University Teacher Training (FPU), call 2022, awarded by the Spanish ministry of Science, Innovation and Universities.
Author Contributions: The authors confirm contribution to the paper as follows: Conceptualization, Samuel Martínez-Gutiérrez, Daniel Sarabia and Alejandro Merino; methodology, Samuel Martínez-Gutiérrez and Carlos Gutiérrez; software, Samuel Martínez-Gutiérrez and Carlos Gutiérrez; formal analysis, Samuel Martínez-Gutiérrez, Carlos Gutiérrez and Diego García-Álvarez; investigation, Samuel Martínez-Gutiérrez and Carlos Gutiérrez; data curation, Samuel Martínez-Gutiérrez and Carlos Gutiérrez; writing—original draft preparation, Samuel Martínez-Gutiérrez and Carlos Gutiérrez; writing—review and editing, Samuel Martínez-Gutiérrez, Carlos Gutiérrez, Daniel Sarabia, Alejandro Merino and Diego García-Álvarez; visualization, Samuel Martínez-Gutiérrez and Carlos Gutiérrez; supervision, Daniel Sarabia and Alejandro Merino; project administration, Daniel Sarabia and Alejandro Merino; funding acquisition, Daniel Sarabia and Alejandro Merino. All authors reviewed and approved the final version of the manuscript.
Availability of Data and Materials: Data available on request from the authors.
Ethics Approval: Not applicable.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. IEC 61400-12-1. Wind Turbines-Part 12-1: power performance measurements of electricity producing wind turbines. Geneva, Switzerland: International Electrotechnical Commission; 2012. [Google Scholar]
2. Sohoni V, Gupta SC, Nema RK. A critical review on wind turbine power curve modelling techniques and their applications in wind based energy systems. J Energy. 2016;2016(10):8519785. doi:10.1155/2016/8519785. [Google Scholar] [CrossRef]
3. Bilendo F, Meyer A, Badihi H, Lu N, Cambron P, Jiang B. Applications and modeling techniques of wind turbine power curve for wind farms—a review. Energies. 2023;16(1):180. doi:10.3390/en16010180. [Google Scholar] [CrossRef]
4. Villanueva D, Feijóo A. Comparison of logistic functions for modeling wind turbine power curves. Electr Power Syst Res. 2018;155(1–3):281–8. doi:10.1016/j.epsr.2017.10.028. [Google Scholar] [CrossRef]
5. Deshmukh MK, Deshmukh SS. Modeling of hybrid renewable energy systems. Renew Sustain Energy Rev. 2008;12(1):235–49. doi:10.1016/j.rser.2006.07.011. [Google Scholar] [CrossRef]
6. Castillo OC, Andrade VR, Rivas JJR, González RO. Comparison of power coefficients in wind turbines considering the tip speed ratio and blade pitch angle. Energies. 2023;16(6):2774. doi:10.3390/en16062774. [Google Scholar] [CrossRef]
7. Reyes V, Rodríguez JJ, Carranza O, Ortega R. Review of mathematical models of both the power coefficient and the torque coefficient in wind turbines. In: Proceedings of the 2015 IEEE 24th International Symposium on Industrial Electronics (ISIE); 2015 Jun 3–5; Buzios, Brazil. p. 1458–63. doi:10.1109/ISIE.2015.7281688. [Google Scholar] [CrossRef]
8. Moussa MO. Experimental and numerical performances analysis of a small three blades wind turbine. Energy. 2020;203(1):117807. doi:10.1016/j.energy.2020.117807. [Google Scholar] [CrossRef]
9. Pelletier F, Masson C, Tahan A. Wind turbine power curve modelling using artificial neural network. Renew Energy. 2016;89(2):207–14. doi:10.1016/j.renene.2015.11.065. [Google Scholar] [CrossRef]
10. Neshat M, Nezhad MM, Abbasnejad E, Mirjalili S, Groppi D, Heydari A, et al. Wind turbine power output prediction using a new hybrid neuro-evolutionary method. Energy. 2021;229(1):120617. doi:10.1016/j.energy.2021.120617. [Google Scholar] [CrossRef]
11. Astolfi D, Castellani F, Lombardi A, Terzi L. Multivariate SCADA data analysis methods for real-world wind turbine power curve monitoring. Energies. 2021;14(4):1105. doi:10.3390/en14041105. [Google Scholar] [CrossRef]
12. Veena R, Mathew S, Petra MI. Artificially intelligent models for the site-specific performance of wind turbines. Int J Energy Environ Eng. 2020;11(3):289–97. doi:10.1007/s40095-020-00352-2. [Google Scholar] [CrossRef]
13. Zhou J, Guo P, Wang XR. Modeling of wind turbine power curve based on Gaussian process. In: Proceedings of the 2014 International Conference on Machine Learning and Cybernetics; 2014 Jul 13–16; Lanzhou, China. p. 71–6. doi:10.1109/ICMLC.2014.7009094. [Google Scholar] [CrossRef]
14. Pandit RK, Infield D, Kolios A. Gaussian process power curve models incorporating wind turbine operational variables. Energy Rep. 2020;6(5):1658–69. doi:10.1016/j.egyr.2020.06.018. [Google Scholar] [CrossRef]
15. Rogers TJ, Gardner P, Dervilis N, Worden K, Maguire AE, Papatheou E, et al. Probabilistic modelling of wind turbine power curves with application of heteroscedastic Gaussian Process regression. Renew Energy. 2020;148(10):1124–36. doi:10.1016/j.renene.2019.09.145. [Google Scholar] [CrossRef]
16. Neuer MJ. Physics-informed learning. In: Machine learning for engineers. Berlin/Heidelberg, Germany: Springer; 2024. p. 173–208. doi:10.1007/978-3-662-69995-9_6. [Google Scholar] [CrossRef]
17. Virgolino GCM, Mattos CLC, Magalhães JAF, Barreto GA. Gaussian processes with logistic mean function for modeling wind turbine power curves. Renew Energy. 2020;162:458–65. doi:10.1016/j.renene.2020.06.021. [Google Scholar] [CrossRef]
18. Gijón A, Pujana-Goitia A, Perea E, Molina-Solana M, Gómez-Romero J. Prediction of wind turbines power with physics-informed neural networks and evidential uncertainty quantification. Eng Appl Artif Intell. 2026;164(1):113331. doi:10.1016/j.engappai.2025.113331. [Google Scholar] [CrossRef]
19. Howland MF, Dabiri JO. Wind farm modeling with interpretable physics-informed machine learning. Energies. 2019;12(14):2716. doi:10.3390/en12142716. [Google Scholar] [CrossRef]
20. Cross EJ, Rogers TJ, Pitchforth DJ, Gibson SJ, Zhang S, Jones MR. A spectrum of physics-informed Gaussian processes for regression in engineering. Data Centric Eng. 2024;5:e8. doi:10.1017/dce.2024.2. [Google Scholar] [CrossRef]
21. Zhang S, Cross EJ. Grey-box modelling via Gaussian process mean functions for mechanical systems. Data Sci Eng. 2025;9(1):161–8. doi:10.1007/978-3-030-76004-5_19. [Google Scholar] [CrossRef]
22. Heier S. Grid integration of wind energy conversion systems. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 1998. [Google Scholar]
23. Manyonge AW, Ochieng RM, Onyango FN, Shichikha JM. Mathematical modelling of wind turbine in a wind energy conversion system: power coefficient analysis. Appl Math Sci. 2012;6(91):4527–36. [Google Scholar]
24. Rasmussen CE, Williams CKI. Gaussian processes for machine learning. Cambridge, MA, USA: MIT Press; 2006. [Google Scholar]
25. Van Rossum G, De Boer J. Interactively testing remote servers using the Python programming language. CWI Q. 1991;4(4):283–303. [Google Scholar]
26. Hart WE, Watson JP, Woodruff DL. Pyomo: modeling and solving mathematical programs in Python. Math Program Comput. 2011;3(3):219–60. doi:10.1007/s12532-011-0026-8. [Google Scholar] [CrossRef]
27. Wachter A. An interior point algorithm for large-scale nonlinear optimization with applications in process engineering [dissertation]. Pittsburgh, PA, USA: Carnegie Mellon University; 2002. [Google Scholar]
28. Matthews AG, Van Der Wilk M, Nickson T, Fujii K, Boukouvalas A, León-Villagrá P, et al. GPflow: a Gaussian process library using TensorFlow. J Mach Learn Res. 2017;18(40):1–6. [Google Scholar]
29. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Meth. 2020;17(3):261–72. doi:10.1038/s41592-019-0686-2. [Google Scholar] [PubMed] [CrossRef]
30. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467. 2016. doi:10.48550/arXiv.1603.04467. [Google Scholar] [CrossRef]
31. Duc T, Simley E. Zenodo. SMARTEOLE wind farm control open dataset. 2022 [cited 2026 May 6]. Available from: https://zenodo.org/records/7342466. [Google Scholar]
32. Simley E, Fleming P, Girard N, Alloin L, Godefroy E, Duc T. Results from a wake-steering experiment at a commercial wind plant: investigating the wind speed dependence of wake-steering performance. Wind Energ Sci. 2021;6(6):1427–53. doi:10.5194/wes-6-1427-2021. [Google Scholar] [CrossRef]
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools