Computing intrinsic noise of the genetic regulation modeled by Hill functions

Gene regulation is among the most important yet intricate biological processes. Mathematical modeling of gene regulatory networks has provided helpful insights in understanding natural genetic networks and designing synthetic genetic circuits. One type of widely accepted approach for modeling gene regulation is through deterministic rate equations (i.e. ordinary differential equations) [1,2]. In the formulation of complex transcriptional events regulated by transcription factors, Hill-type rate law has been shown to successfully approximate the multibinding-site mechanism and cooperativity [3-5]. In particular, the application of Hill functions to represent activating or inhibitory regulation by transcription factors greatly reduces the complexity of deterministic modeling of gene regulation [6,7]. Nevertheless, it is noteworthy that the deterministic models using ordinary differential equations only reflect the macroscopic behavior of genetic networks averaged over many cells, while neglecting the intrinsically random nature of gene regulation in a single cell, namely the stochasticity inherent in gene expression [8,9]. Introduction Abstract

applied to describe intrinsic noise by constructing the probability evolution equation (aka master equation) from rate equations, but solving the CME is difficult in general. Gillespie developed a stochastic simulation algorithm (SSA), based on a Monte Carlo numerical procedure, that can exactly simulate stochastic trajectories characterized by the CME [20]. Using SSA, intrinsic noise of a biochemical network at single-cell level can be calculated accurately. The SSA approach implements the Monte Carlo simulation of each reaction at each time step for an individual cell, and such procedure needs to be repeated N times if one wishes to obtain the probability distribution of the intrinsic noise among N number of cells. This leads to high computational demands associated with SSA. As a practical alternative, the linear noise approximation (LNA) of CME is analytically tractable and computationally efficient. Specifically, LNA uses van Kampen's size-expansion of the CME to separate macroscopic concentration (i.e. mean) from the fluctuations around it [21]. As a result, the dynamics of mean, variances and covariances of molecular species can be described by a set of ordinary differential equations. It is shown that LNA accurately approximates intrinsic stochasticity when the molecular numbers are relatively large [22,23].
Notably, the standard methods of SSA and LNA require the theoretical model to be composed of elementary chemical reactions. Hill-type rate law, however, is not elementary in its apparent form. How the existing methods can be used to accurately and efficiently calculate the intrinsic noise of the gene expression described by Hill functions, a well-accepted representation that considerably simplifies genetic models, remains unclear. Efforts have been made to heuristically apply SSA and LNA to the Hilltype dynamics (termed as hSSA and hLNA respectively), whereby the Hill rate is simply treated as a single reaction despite of the non-elementary nature [24][25][26]. The heuristic approach successfully captures the time trajectory of the mean value of the concentration, but it seems to neglect higher order statistics in concentration fluctuations [27]. Similarly, Thomas et al. showed that, when the heuristic representation of the system with Hill propensity functions is used in the hLNA, the contribution to the overall noise levels from the fast reacting species is underestimated [28]. Indeed, a series of analysis of stochastic simulation methods suggest that further computational modification beyond heuristic approach is needed for Hill-type dynamics to improve the estimation of system noise [15,27,29].
The slow-scale LNA (ssLNA) has recently been introduced as a compromise between the need for network simplification and the precise representation of the system's noise levels. The ssLNA approach reduces the network size when calculating the concentration means, while still accounting for the intrinsic noise generated by the fast reacting species that is neglected by hLNA [28].
In this study we investigate the intrinsic stochasticity of the genetic models corresponding to second-order activating and inhibitory Hill functions, using five different computational methods. First, we apply SSA and LNA to the full-scale elementaryreaction models that are expanded from the two Hill functions. In comparison, the heuristic methods hSSA and hLNA are applied directly to the (non-elementary) Hill functions. Furthermore, the ssLNA is also implemented for Hill genetic activation and inhibition. While all three LNA methods (LNA, hLNA, and ssLNA) show much faster computational speed when compared to the SSA counterparts, only the full-scale LNA and the ssLNA compute the correct levels of intrinsic noise when compared to the result of the full-scale SSA method, which is considered exact. In addition, we simulate the chemical Langevin equations for the Hill functions developed under the ssLNA framework and show that they can produce stochastic trajectories with the similar noise accuracy as SSA, yet requiring much less computational expenses. The investigation indicates that ssLNA together with the associated chemical Langevin equations may provide a relatively accurate as well as efficient computational platform to delineate intrinsic stochasticity of the Hill-type genetic models.

Results and Discussions
In the present study, we focus on exploring the intrinsic noise of two theoretically basic and useful genetic models, namely the second-order Hill-type gene activation and gene inhibition networks. For both cases, the genetic reactions include the cooperative binding of two transcription factor molecules (denoted as T) to the gene promoter (denoted as G), where the T molecules form a transcription dimer (denoted as T2) first, and subsequently the binding of dimer T2 to the promoter G respectively initiates or inhibits the mRNA transcription. The mRNA molecule is then translated into protein product.
We seek to understand the performance, including computational accuracy and efficiency, of the five existing stochastic modeling methods in calculating intrinsic noise of the Hill-type gene regulations. To this end, we apply two Gillespie algorithm-based methods, namely SSA and heuristic SSA (hSSA), as well as three LNA-based methods, namely LNA, heuristic LNA (hLNA) and slow-scale LNA (ssLNA), to the aforementioned second-order Hill-type gene activation and gene inhibition networks, via calculating the stochastic molecular abundances of mRNA and protein. Due to its exact implementation of CME, the results of the SSA method are used as the gold standard. The average-cell level behavior, represented by mean mRNA and protein molecule count, and the single-cell level intrinsic noise, represented by coefficient of variation (CV), defined as the ratio between standard deviation and mean of mRNA and protein molecules [30][31][32], are obtained using each method and compared with the results of the SSA method.
The Gillespie algorithm-based approach differs from the LNA-based approach in that the former needs to simulate multiple molecule-number trajectories for noise analysis across a population of single cells. To generate meaningful statistical data for the characterization of model stochasticity (mean and CV) using the SSA and the hSSA methods, we compute 10 2 -10 3 single-cell trajectory samples, the number used in previous computational as well as experimental noise studies [30,33,34].
The LNA framework uses the van Kampen's Ω-expansion method to approximate the Chemical Master Equation, where Ω represents the system volume. Specifically, the CME is Taylor-expanded about the macroscopic dynamics in powers of 1 Ω . The first-order terms give the rate equations governing macroscopic dynamics (mean abundance), and the next-order terms give the Linear Noise Approximation (LNA) for dynamics of fluctuations [21]. The intrinsic fluctuation approximated by the solution of LNA has been shown to be multivariant Guassian, whereby the ordinary differential equation (ODE) of its correlation matrix can be also obtained [21,35]. By solving these ODEs for the means and the variances of the stochastic variables (i.e. molecular numbers of mRNA and protein), LNA allows for direct calculation of the intrinsic noise (i.e. CV) of the models of Hill-type genetic activation and inhibition.
The standard SSA and LNA methods are rigorously developed for biochemical models consisting of elementary reactions. For the models that incorporate nonlinear non-elementary (loosely called 'phenomenological') terms such as Michaelis-Menten kinetics and Hill functions, the SSA and the LNA method have been applied heuristically, by assuming quasi-steady-state of the fastreacting species and using the nonlinear terms as approximate propensity functions [24,28,36]. To compare with the standard approach of SSA and LNA, we also explore the performance of hSSA and hLNA in quantifying the intrinsic stochasticity of Hilltype gene regulations, where the fast binding/unbinding processes are assumed to reside at quasi-steady states.
The slow-scale LNA is a new general method derived based on standard LNA, which promises to rigorously quantify the statistics of intrinsic noise of the reduced models under quasi-steady-state approximation (QSSA), without being expanded into their elementary-reaction full models [28]. A major goal of this study is to evaluate the performance of ssLNA in describing the noise statistics of the Hill-type gene regulations. We seek to explore how QSSA together with the associated time-scale separation can be applied within the framework of LNA to improve the computational efficiency of noise analysis of the class of compact models containing 'phenomenological' terms while maintaining accuracy.
The detailed procedures of LNA, ssLNA, as well as hLNA and hSSA methods are described in the Methods section and Supplementary Materials.

Intrinsic Stochasticity of Hill-type Gene Activation
We first explore the intrinsic stochasticity of a Hill-type gene regulation process represented by a transcription rate The remaining three elementary reactions in the gene regulation process include translation from mRNA M to protein P and degradation of M and P molecules as listed in equation (4)- (6).
Since binding and unbinding reactions in general occur at much faster rate than transcription, translation and degradation, we can apply QSSA to assume that the fast variables T 2 and GT 2 reach equilibrium for modeling simplification. As a result, the five elementary reactions in the equations (1)-(3) are reduced to a 'phenomenological' step of transcription, whose rate is proportional to the steady-state portion of the GT 2 complex among total gene (denoted as G T ) as given in equation (7) (see Supplemental Materials for the derivation).
(2) . d k k k k k = We use five methods to calculate the intrinsic noise of the above gene activation process. First, the standard SSA and LNA methods are applied to the full model consisting of all the 8 elementary reactions in equations (1) through (6). Second, heuristically using the 'phenomenological' Hill function (7) as the propensity function for the simplified transcription reaction, together with the remaining elementary reactions in equations (4)-(6), we apply hSSA and hLNA. Lastly, we apply ssLNA to the gene regulation described by equations (1)- (6), assuming that G, T, T 2 and GT 2 are fast variables reaching quasi-steady state, and M and P are slow variables [28]. Each method is used to compute the time trajectories of macroscopic mean and intrinsic noise of mRNA (M) and protein (P) in the gene regulation. The intrinsic noise is quantified by the coefficient of variation (CV), defined as standard deviation divided by mean. For the Gillespie-based methods, SSA and hSSA, we compute >10 3 number of stochastic samples to obtain means and standard deviations that are statistically meaningful.
We seek to evaluate the accuracy in computing the trajectories of mean and intrinsic noise, as well as the computational efficiency of computing the noise. To this end, the standard SSA of the full gene activation model is used as the gold standard for accuracy evaluation. In Figures 1-3, trajectories of the means and CVs for the activating Hill regulation obtained by five methods are compared with the results of the standard SSA method. Each of the figures corresponds to the same model of gene activation, with parameters calibrated to evaluate performance under different ranges of steady-state copy numbers. Specifically, we simulate three scenarios where the mean molecular number of proteins at steady state is respectively in the range of ~10 2 (Figure 1 Figures 1-3, all the trajectories of the macroscopic means of M and P converge toward the results of standard SSA (note that the LNA and ssLNA trajectories are hidden beneath the hLNA trajectory), indicating that all the five method give rise to similar degree of accuracy in describing the average-cell behavior of the activating gene regulation. Notably, using Hill functions directly as propensity functions in the heuristic methods, hSSA and hLNA, do not seem to affect the accuracy of computing the mean behavior, which is consistent with previous reports [24,25]. Such agreement persists across all the three ranges of molecule number, albeit the higher number of molecules, the less degree of stochasticity in the mean trajectories computed by the Gillespie-based SSA and hSSA methods, which is expected for stochastic simulations.
With regard to the precision of noise calculation, however, we find some discrepancies between the results of the five methods. Figure 1-3 (c) and (d) show that, for all the three ranges of molecule numbers, the CV trajectories computed by SSA, LNA and ssLNA converge together (note that the LNA trajectory is hidden beneath the ssLNA trajectory). But the CV trajectories computed by the heuristic methods hSSA and hLNA converge to another steady-state value smaller than that computed by the standard SSA, the ground truth. For instance, when the number of protein molecules is in the hundreds at steady state, the CV of mRNA and protein is respectively underestimated by ~60% and ~30% at steady state. The degree of underestimation is alleviated as the copy number increases. In particular, when the number of protein molecules is in the 10 4 at steady state, the CV of mRNA and protein is respectively underestimated by ~10% and ~4% at steady state. On the other hand, the results of the standard LNA and ssLNA agree well with the ground truth under all the three copy-number scenarios. That is, the ssLNA method based on reduced model of slow variables is able to achieve the same precision of noise calculation as that of the standard SSA and standard LNA methods, which are based on the full model of all elementary reactions.  Although the SSA method provides the highest accuracy in quantifying intrinsic noise, it has significantly high demand for computation times, evaluated by CPU times (see the Methods section for details), especially when the number of molecules increase (Table 1). For instance, the CPU time used for SSA to obtain the CVs of 1,000 samples increases from 4.375 seconds for ~10 2 protein molecules to 125.25 seconds for ~10 4 protein molecules, a 28 fold increment. Note that the CVs reported above are calculated using 2000-3000 runs to produce relatively smooth mean trajectory, indicating that high computational demand is required to obtain reliable statistical estimation of CV using SSA. The LNA approach yields considerably greater computational efficiency, especially for high-molecule case, as it directly computes population-level variances without sampling individual stochastic trajectories. Specifically, each of the three copy-number cases takes only ~14 seconds. As expected, the computation time of the heuristic methods hSSA and hLAN is greatly reduced in comparison with their full-model counterpart by about 20 and 300 fold respectively. The computation efficiency of ssLNA is also very good whereby the time is reduced by ~300-400 fold comparing to the standard LNA. The above results show that, among the five methods, the ssLNA method achieves not only high accuracy but also fast computation in quantifying the intrinsic noise of the model of Hill-type gene activation.    While the ssLNA method calculates variances efficiently without running stochastic simulations, the chemical Langevin equation (CLE) provides a complementary solution to approximate stochastic individual trajectories. The CLE formulates as a stochastic differential equation (Methods), and thus is much faster to solve than SSA, which exactly simulates the trajectories of the CME. In particular, we apply the slow-scale CLE method that has been developed under the timescale-separation framework of ssLNA [28]. The computation time of stochastic trajectories using CLE indeed reduces considerably (Table 1), especially for the ~10 4 copynumber case (36-fold reduction compared with SSA). To validate accuracy of the stochastic simulation by CLE under the ssLNA framework, we further compute the macroscopic mean and CVs of multiple CLE simulation samples. For the three different molecular copy numbers, they all agree well with the results of the standard SSA method (Figures 4-7). These results demonstrate that, for the noise analysis employing QSSA-based reduced model, CLE could serve as a well-behaved single-cell simulation tool in complement to ssLNA that only generates macroscopic mean trajectory rather than individual sample paths. Intrinsic Stochasticity of Hill-type Gene Inhibition ms K G G+M  → (8) As a result, under the QSSA condition, the transcription rate is proportional to the fraction of free DNA (G) to total gene at steady state (equation (9)).
The noise level computed by SSA of the full model is again used as the gold standard for accuracy evaluation. We simulate with steady-state protein copy-number in three ranges (again ~10 2 , ~10 3 , and ~10 4 ) by varying the amount of transcription factor. Figure 7-9 (a) and (b) show that the mean dynamics of mRNA and protein computed by hSSA, LNA, hLNA and ssLNA agree well with the result by SSA (again the LNA and ssLNA trajectories are hidden beneath the hLNA trajectory). The cellular noise in Figure 7-9 (c) and (d), however, show that the population CV trajectory computed by the heuristic methods (hSSA and hLNA) is slightly smaller compared to that computed by the full-model methods (SSA and LNA) as well as ssLNA (again the LNA trajectory is hidden beneath the ssLNA trajectory). That is, the intrinsic noise of inhibitory gene model is still underestimated by the heuristic methods, although the degree of underestimation is much less (up to ~1%) compared to that for the activation gene model (up to 60%). Note that the overall noise levels are relatively small in the gene inhibition model, because large numbers of initial molecules have to be assumed for the model to decay toward the desired steady-state copy number levels, which are in the ranges consistent with the activating gene regulation case. This could be the main reason why the degree of underestimation by the heuristic methods is much lower for the inhibitory Hill model. On the other hand, the LNA and ssLNA methods yield population CV paths comparable to those produced by SSA for all the three copy-number cases, proving again their reliable accuracy of noise computation. In terms of simulating individual stochastic trajectory, the slow-scale CLE method again results in considerable accuracy in the mean and CV paths, as demonstrated by their good agreement with the results by the full-model SSA method (Figure 10-12).
For the calculation of intrinsic noise among a cell population, standard SSA and LNA are applied to the full model accounting for the elementary reactions (1)-(2), (8) and (4)- (6), while hSSA and hLNA are applied to the reduced model of the reactions (4)-(6) associated with the non-elementary transcription rate (9). In addition, intrinsic noise is also computed by ssLNA, and single-cell stochastic trajectories are obtained using the slow-scale CLE method.  (Table 1). In particular, the ssLNA method reduces the computation time by at least 10 3 fold comparing to SSA and by at least 150 fold comparing to LNA. Model reduction in the heuristic methods (hSSA and hLNA) leads to a computational time ~1-3 orders of magnitude smaller than their full-model counterpart at the price of noise underestimation. For simulating individual stochastic path, the slow-scale CLE method reduces the computation time of the standard SSA method by at least 20 times, showing its computational efficiency in addition to accuracy.
In summary, we evaluate the performance of SSA-based methods (SSA, hSSA) and LNA-based methods (LNA, hLNA, ssLNA) in computing intrinsic noise of a gene inhibition model. The results are similar to those of the gene activation model in that heuristically integrating the reduced model (i.e. Hill function) to SSA or LNA impairs the accuracy of noise computation. In addition, using the slow-scale framework, we obtain considerable accuracy as well as computational efficiency in quantifying intrinsic noise via ssLNA, as well as in delineating individual paths via CLE.    With respect to cellular dynamics, the macroscopic concentrations represents mean molecular behavior averaged over a cell population. Due to intrinsic stochastic fluctuations, the microscopic state of each individual cell is random variable: x=X+φ, where φ represents the linear-term fluctuations and is Gaussian centered around X within the LNA framework. If the matrix S denotes the stoichiometry matrix, where each column represents the variation in molecule counts of the species for a certain elementary reaction, and f is reaction propensity vector,then the governing deterministic ODE of the means X is written as:

Standard linear noise approximation (LNA) method
The ODE of the correlation matrix , containing the variances and covariances for the fluctuations of all the components in the system, is given by  is the Jacobian of the rate equation (10), and is called the diffusion matrix [21]. Using the covariance trajectories computed from equation (11) and the means modeled by (10), we can compute the time courses of the coefficient of variation (CV = standard deviation divided by mean), which is commonly used to characterize the intrinsic noise across a cell population.

Slow-scale LNA (ssLNA) method and slow-scale chemical Langevin equation
Suppose the species in a model can be time separated into fast and slow species, where the fast species are those assumed to reach pseudo equilibrium under the quasi-steady state approximation (QSSA). We partition the state vector as X = (X s , X f ), where X s and X f denote the vector of fast and slow variables, respectively. Under QSSA of fast species, the deterministic model can be reduced to a slow-scale rate equation for X s , S h h X S f =   (12), which contains 'phenomenological' terms such as Michaelis-Menten and Hill functions.
The ssLNA method can be implemented in such a slow-scale framework by first partitioning the J F and S matrices of the full model as follows: s, slow species f, fast species (13) The variances of the fluctuations around the means of X s can be solved using the ODE of the slow-scale correlation matrix , where J s is shown to be the same as the Jacobian matrix J h of the reduced rate equation (12) of slow species [28], and the diffusion matrix is given as: The time courses of the coefficient of variation (CV) are obtained using the covariance trajectories computed from equation (14) and the means modeled by the slow-scale rate equations (12).
In the framework of LNA, chemical Langevin equation (CLE) has been proposed to effectively approximate single-cell stochastic time trajectory of the species in CME. In this study, we use the CLE developed under ssLNA, 16), where each element of the Γ(t) vector is a temporally uncorrelated, independent Gaussian white noise [21,28]. Euler-Maruyama method is used to numerically solve the stochastic differential equation (16). Same as the SSA method, one run of simulation of CLE represents a single-cell path sample. We use similar sample size as SSA (10 2 -10 3 ) to obtain the statistics of cellpopulation level noise (i.e. standard deviation and mean) for the ssLNA-based CLE.

Metrics for stochastic computation
To compute the means and intrinsic noise of cell-population dynamics, ~2000-3000 runs of SSA, hSSA, and CLE trajectories are used to obtain a converging mean trajectory. The CV is used to measure the population-level intrinsic noise. For the comparison of computational efficiency, CPU time is used to evaluate each method's performance. The CPU time, obtained using the cputime function in MATLAB, evaluates the time used for each method. The time is intended as used by a function for the computation, excluding the possible setup time. LNA, hLNA and ssLNA times are averaged over a set of ~100 runs. SSA, hSSA, and CLE times ( ) , , are also averaged over ~100 runs, but the time taken for 1000 runs is reported in Table 1 for comparison, since several runs are required to approximate the solution of the CME. All computations are performed using Dell Precision T3500 with Intel® Xeon® W3530 (8MB Cache, 2.80 GHz, 4.80 GT/s Intel® QPI) and 6 GB Memory.

Conclusions
Gene regulation is a molecular process that could present high intrinsic noise. For the gene activation and the gene repression systems studied here, we apply five different stochastic analysis methods to provide a comparison on the quantitative measure of the intrinsic noise. The comparisons of the performance of the five methods are summarized in Figure 13 (normalized CV) and Figure  14 (normalized CPU time). In both model cases, while the SSA can provide the most accurate noise representation, it requires large computational resources (Figure 14), which could make the realization of large models infeasible. The heuristically reduced SSA (hSSA) provides a good characterization of the mean molecule counts, but fails to faithfully reproduce fluctuations around the mean ( Figure 13). The linear noise approximation (LNA) approach significantly reduces computational times ( Figure 14) while preserving noise accuracy ( Figure 13, overlapped with ssLNA). The heuristic reduction of LNA models (hLNA), however, presents the same degree of inaccuracies as the hSSA in noise calculation ( Figure 13). The slow-scale approach (ssLNA) not only performs