Stochastic optimization methods are now being used in a multitude of applications, ranging from circuit design on silicon wafers to airline flight schedules. In these and many other applications the objective is to minimize a given cost function that depends on a large number of discrete or continuous variables[BOW01,WS99]. In analogy to physical problems, the cost function describes a potential energy surface (PES) in the parameter space and its global minimum optimizes the desired objective. Stochastic optimization methods are applied when enumerative methods are too costly. This is generically the case in high-dimensional optimization problems, where the total number of possible configurations grows exponentially with the number of variables.
Stochastic optimization methods successively improve one or several configurations of the underlying model to obtain an approximant of the global optimum of the PES. The optimization process thus maps onto a fictitious dynamical process of one or several configurations that move in the configurations space. The process stops when either a certain previously defined amount of computational resources has been spent or when the dynamical process terminates in a stable configuration. In either case there is no guarantee that the stochastic process has found the global optimum of the PES. Indeed, due to the stochastic nature of the process there can be no guarantee of finding the global optimum. In the absence of perfection it is important to differentiate two possible goals in stochastic optimization: in many applications (e.g. circuit design) the quality of the solution is only measured by its energy difference to the global optimum, the ``distance'' of the configuration obtained to the global optimum is completely irrelevant. In other problems, such as protein structure prediction or drug discovery, this distance is crucial. We do not seek to ``optimize'' the folding energy, but to derive useful information from the three-dimensional structure obtained, a low-lying metastable state that has a large RMSD to the true native state may contain virtually no useful information.
The computational challenge in stochastic optimization methods depends strongly on the number of degrees of freedom and the complexity of the PES. The latter depends on the total number of low-lying metastable states, the ability to efficiently explore the configuration space and the average height of transition states that separate low-lying metastable states. Here we discuss and compare several stochastic optimization methods that we have recently used for protein structure prediction and receptor ligand docking. Little is presently known about the relative efficiency of these methods for protein structure prediction at the all atom level.
The key problem of all stochastic optimization methods is the difficulty to balance up- and downhill moves. In a rugged potential energy surface with many competing metastable states, methods that move only downhill are trapped in the closest local minimum. To locate the global optimum, the optimization method must also move uphill. In the simulated annealing method (SA) [KVG83] this is achieved by exploring the potential energy surface in a stochastic random walk at a finite temperature, which is gradually reduced during the simulations. In the beginning of the simulation, when the temperature is high, may uphill moves are accepted, towards the end, only the local minimum is explored. While it is possible to show that SA will converge for some potential energy surfaces in the limit of ininitesimal cooling rates, this method still fails for very rugged potential energy srufaces, an observation that has inspired many alternate optimization strategies.
In the Monte-Carlo with minimization technique, also known as basin hopping method, the dynamical process is mapped to an artificial potential energy surface where the energy of each point is replaced by the energy of a nearby metastable conformation. High energy transition states that separate metastable conformations are thus eliminated. Applications vary in the implementation of the minimization that maps a point to the local minimum and the stochastic method by which the resulting PES is explored. MCM/BHT is successful for many optimization problems where the cost of the minimization procedure is outweighed by the gain in efficiency of the subsequent stochastic optimization process on the simplified landscape. An alternate approach to simplify the underlying PES is used in the stochastic tunneling method, where the dynamical process explores the transformed potential energy surface. The transformed PES, which adapt in the course of the simulations, compresses high energy regions of the PES that are no longer relevant for the local search, but must be traversed to explore different regions of the PES.
In other methods, such as parallel tempering or replica exchange, concurrent simulations at different temperatures are performed. These simulations exchange conformational information. While simulations at high temperatures explore the coarse grained structure pf the PES, simulations at low temperature refine the best local minima found. The parallel tempering technique can be used on distributed computational architectures, because relatively little information is exchanged during the simulation. However, our present experience suggests that applications of protein folding (trp-cage protein, HIV accessory protein) with the parallel tempering methods loose when more then 30 nodes are used for modes size proteins.
Since protein folding is among the most challenging computational problems today, the use of the most powerful computational architechtures is advisable to facilitate its solution. In the forseeable future, the most cost effective computers will consist of many nodes with distributed memory. The BlueGene architechture of IBM or the distributed GRID resources used by the Folding@Home team are prototypical examples of such architectures. Presently, however, we know little about the efficiency of various optimization methods for protein folding on such computer networks. After initial efforts with genetic algorithms have shown mixed results, we have focoused on distributed basin hopping and evolutionary strategies as possible avenues to use distributed architectures with different degree of connectivity.