A task-farm parallelization framework has been implemented in the ChemShell computational chemistry environment to provide a facility for parallelizing common chemical calculations, including finite-difference Hessian evaluation, the nudged elastic band method for reaction path optimization, and population-based methods for global optimization. The optimization methods are provided by a parallel interface to the DL-FIND optimization library. As ChemShell can already exploit parallel external programs for energy and gradient evaluations, the new methods result in a two-level approach to parallelization that gives significantly improved performance for massively parallel calculations. For typical systems, speed-up factors of five to eight times have been observed compared with non-task-farmed calculations. The task-farming version of ChemShell has been used to study the heterolytic dissociation of a hydrogen molecule over a polar oxygen-terminated surface of aluminium-doped zinc oxide using an embedded cluster hybrid QM/MM approach. We calculate a 42 kcal mol−1 heat of reaction and a 30 kcal mol−1 activation energy, which is equivalent to a high backward reaction barrier of 72 kcal mol−1 per H2 molecule, in close agreement with temperature programmed desorption experiments. The dissociation path includes a stable intermediate comprising a hydride ion in an oxygen vacancy and physisorbed atomic hydrogen.
Recent progress in high performance computing, in particular, the rapid expansion of the size of multiprocessor platforms, allows us to address previously intractable problems in materials science. In this article, we focus on the quantum mechanical (QM) description of reaction paths in heterogeneous catalysis, but problems on a similar scale arise in other applications of functional materials including electronics, energy production, conversion and storage, optical and magnetic devices. To address these issues, we are pursuing the development of efficient and robust tools for the simulation of materials, of which the ChemShell computational chemistry environment (www.chemshell.org; Sherwood et al. 2003) is one of the most promising.
ChemShell provides a means of integrating QM and molecular mechanical (MM) software packages to perform combined QM/MM calculations. The external packages are used solely for energy and gradient evaluations, while ChemShell routines are used to form the combined QM/MM gradient and to handle higher-level tasks such as geometry optimization or molecular dynamics. ChemShell calculations are controlled using scripts written in the Tcl language (www.tcl.tk), with generic commands that hide as far as possible the details of the external programs from the user.
The QM/MM method (Warshel & Levitt 1976; Field et al. 1990; Lin & Truhlar 2007) is widely used for modelling biological systems (Senn & Thiel 2009), and has also proved useful in solid state applications (Sherwood et al. 1997; Sauer & Sierka 2000). To model solid state systems, ChemShell supports embedded cluster calculations (Sherwood et al. 2003; Sokol et al. 2004) in which the bulk material or the surface is represented by a finite MM cluster model, optionally with additional point charges to mimic the effect of bulk electrostatics. The active site is modelled by a (higher-level) QM calculation with electrostatic embedding, where the cluster environment is represented by point charges in the QM Hamiltonian. This setup is intended to give an optimal balance between accuracy at the active site and computational expense. The cluster approach in ChemShell has been used to study zeolite structure and reactivity (de Vries et al. 1999; Sherwood et al. 2003; To et al. 2006a,b, 2007, 2008), the physical properties and catalytic activity of zinc oxide (French et al. 2001a, 2003a, 2008; Bromley et al. 2003; Sherwood et al. 2003; Sokol et al. 2004, 2007; Catlow et al. 2008), the generation of point defects in photographic materials (Wilson et al. 2005) and the formation of water on interstellar dust grains (Goumans et al. 2009).
ChemShell may be built as a serial or parallel program, although the current release does not contain any parallel algorithms of its own. The parallel version instead is intended to exploit any parallelism contained in the packages used to evaluate the energy and gradient. Usually the evaluation step of a ChemShell calculation is by far the most computationally expensive, and so the overall calculation would be expected to scale well even if only this step is parallelized. However, on high performance computing resources where calculations using thousands of processor cores are routine, the parallel performance of the external programs often does not continue to scale efficiently. In this domain, we expect there to be significant gains from adding a second layer of parallelism at the ChemShell level.
In this paper, we describe how a parallel framework has been added to ChemShell using a task-farming approach. In §2, the details of the implementation are discussed. In §3, the task-farming approach is applied to three types of chemical calculation and the performance gains assessed. Finally, in §4, we describe how the new capabilities of ChemShell have been used in a study of hydrogen dissociation over the oxygen terminated surface of aluminium-doped zinc oxide, a material of fundamental and industrial interest.
2. Task-farm parallelization
Task-farming parallelism is useful in situations where a set of independent calculations have to be performed (van Dam et al. 2006). The ‘task farm’ consists of all available parallel processes (usually one process per CPU core), which are then divided into subsets of processes called workgroups. The tasks are split between the workgroups which work on them in parallel. As the calculations are independent, no information needs to be exchanged between workgroups during this time, and sharing of results can be postponed until all the tasks have completed.
To implement task farming in ChemShell, a number of modifications to the code and external programs were necessary.
ChemShell is implemented in parallel using MPI (message passing interface). The parallel framework used before task farming was added is illustrated in figure 1a. One MPI process acts as a ‘master’, on which a Tcl interpreter runs and executes the input script. The other processes are ‘slaves’ that remain idle until a parallel input command is executed. This would typically be a request for a gradient evaluation using an external program. To achieve maximum efficiency, the external program is linked to the ChemShell executable so that it can be called directly as a library routine. The external program therefore shares the same MPI environment as ChemShell and can make use of all the MPI processes for its own parallel calculations. When the command has completed, control returns to the master process.
Task-farming parallelism adds an extra layer of organization to the processes, which are grouped into independent workgroups using MPI communicators. Communicators are handles that are used to identify groups of processes that can exchange information with each other. In the original parallel framework, the processes were grouped into a single communicator (the default, MPI World). To create independent workgroups the MPI World communicator is split into smaller sets of processes, each with their own Workgroup communicator. The user specifies the number of workgroups to be created using a command-line argument when ChemShell is executed.
In each workgroup one process acts as a master and the rest are slaves (figure 1b). They operate in the same way as in the original parallel framework, except that there are now multiple master processes (if the number of workgroups is set to one, the new framework reduces to the original). Each master runs a copy of the Tcl interpreter and independently executes the same ChemShell input script. A number of Tcl commands have been implemented to report workgroup information, such as how many workgroups exist and which workgroup the script is running in. These commands may be used to make parts of the script conditional on the workgroup ID and this provides a mechanism to distribute tasks between workgroups.
To prevent file conflicts, a scratch working directory is created for each workgroup. This is important to prevent conflicts between ChemShell data objects, which can be saved to disk. By default ChemShell objects will be loaded from the working directory, but if not present the common parent directory will also be searched. This makes it possible to create globally accessible objects. A Tcl command is provided to make local objects globally accessible by moving them to the parent directory. This command first synchronizes the workgroups using an MPI barrier to ensure that the objects are made available at a well-defined stage of each workgroup’s calculation.
DL-FIND is a geometry optimization library (ccpforge.cse.rl.ac.uk/projects/dl-find/; Kästner et al. 2009) that is included in the ChemShell distribution as the standard optimization driver. ChemShell and DL-FIND communicate via a well-defined interface which is used for passing options, geometries, energies and gradients between the two codes.
DL-FIND has its own implementation of task-farming parallelism (Kästner et al. 2009). The parallel facilities consist of a collection of wrapper functions around MPI library calls to share data between processes and a parallel interface to pass details of the task farm to or from the calling program. DL-FIND is capable of setting up the task farm (i.e. the MPI communicators) itself, but this feature is not used when it is linked to ChemShell because in this case DL-FIND operates within the task farm environment set up by ChemShell. To make the ChemShell environment accessible to DL-FIND, interface subroutines were added to ChemShell to provide the required information.
The way that DL-FIND operates within the ChemShell task farm environment is illustrated in figure 2. Note that DL-FIND communicates directly between workgroups using MPI calls. ChemShell provides an MPI communicator grouping the master processes for this purpose.
There is a subtle difference between the ChemShell and DL-FIND task-farming environments which has to be reconciled by the interface. On the ChemShell side only the master processes (one per workgroup) execute ChemShell commands such as DL-FIND calls. However, the DL-FIND parallel routines assume that they are running on all processes. To work around this ChemShell only provides information on the master processes to DL-FIND so that DL-FIND sees only one process per workgroup. DL-FIND’s parallel algorithms can then run unchanged in the ChemShell environment.
(c) External programs
All parallel work below the level of the ChemShell interpreter must be carried out within a workgroup. Therefore, if an external program is called for an energy and gradient evaluation it must use the Workgroup MPI communicator, not the World communicator. An interface function has been added to ChemShell to pass the Workgroup communicator to external programs.
We have modified the GAMESS-UK (http://www.cfs.dl.ac.uk/gamess-uk/; Guest et al. 2005) QM package and GULP (Gale & Rohl 2003) MM package to accept a Workgroup communicator from ChemShell. These programs can be compiled as libraries into the ChemShell executable so a simple function call is used to pass the information.
In the case of GAMESS-UK, a significant amount of computational time is spent evaluating matrix eigenvectors and eigenvalues and it is therefore important to use a parallel diagonalizer such as PeIGS (http://www.emsl.pnl.gov/docs/global/peigs.html) in order to make efficient use of a massively parallel platform. As the PeIGS library also contains MPI calls, these were modified so that PeIGS could work in the task-farmed environment. This involved replacing instances of the MPI World communicator with the Workgroup communicator passed from ChemShell.
In principle, any parallel external program that has been interfaced to ChemShell can be used in a task-farmed calculation providing it can be modified to accept the Workgroup communicator and use it instead of the World communicator. Alternatively, the parallelism at the ChemShell level could be exploited while only using a serial external program. In this case, the task farm would be set up so that there is only one MPI process per workgroup. No modifications would be required for the external program but the potential for scaling up the calculation would obviously be more limited.
3. Performance of parallel methods
We have modified three types of chemical calculation in ChemShell to take advantage of the task-farming implementation. They each involve independent energy or gradient evaluations and should therefore benefit from high-level parallelism.
All calculations were performed on the Phase 2A HECToR service (www.hector.ac.uk) using quad core 2.3 GHz AMD Opteron processors with 8 Gb memory per processor.
(a) Finite difference Hessian
An obvious choice for parallelization is the calculation of a finite difference Hessian. Each entry in the Hessian matrix is calculated using the difference of two gradients. Using a forward difference algorithm an N-atom system requires 3N+1 independent gradient evaluations. With a central difference algorithm this rises to 6N evaluations.
In the original ChemShell implementation, the Hessian calculation is performed using a single Tcl command. In the task-farmed version this command has been split up to separate the calculation of the gradients from the evaluation of the Hessian. The individual gradient calculations can then be easily parallelized by dividing them between the workgroups. The intermediate results are stored as ChemShell objects, which are then made globally accessible and used to evaluate the Hessian.
The 57-atom silicate–VO3 cluster shown in figure 3 was used to assess the performance of the task-farmed implementation. Energies and gradients were calculated using GAMESS-UK with the B3LYP functional (Stephens et al. 1994). Two basis sets were used: the LANL2 effective core potential basis (Hay & Wadt 1985; giving 413 basis functions) and the TZVP (Dunning 1971; McLean & Chandler 1980) all-electron basis (1032 basis functions). The larger basis test is present to fully assess the PeIGS build of GAMESS-UK, as PeIGS is only used to diagonalize matrices whose dimensions are greater than the total number of MPI processes.
To give an indication of the time required for the full Hessian calculation, single-point calculations were carried out with differing numbers of cores, with one MPI process per core. The results are shown in figure 4. If perfect scaling were achieved the calculation time would halve with each doubling of the number of cores (and a second level of parallelism would be unnecessary). For both basis sets reasonable scaling is achieved up to approximately 128 cores, but for larger core counts the gains are very small. This suggests that large efficiency gains should be possible using task-farmed calculations.
The full forward difference Hessian was evaluated using a set of 1024-core calculations with differing numbers of workgroups. The tasks were parallelized using a simple static load-balancing scheme where as far as possible an equal number of gradient calculations were assigned to each workgroup. As each gradient calculation should take approximately the same amount of time (apart from the first where no wavefunction guess is provided), no major gains would be expected from a more sophisticated load-balancing mechanism.
The results are shown as wall clock times in table 1. To correctly interpret the results, it is important to keep in mind that all calculations run with the same number of cores and only the division into workgroups is changed. As the number of workgroups increases, the number of cores in each workgroup falls proportionally. The calculation with the highest speed-up factor therefore gives the best balance between parallelization of individual gradient evaluations and parallelization of the Hessian as a whole. This is different to the benchmarking of single-level parallelism where scaling of calculation time with number of cores is used to measure efficiency. There is no scaling in this sense in table 1, as the change in the speed-up with the number of workgroups approaches zero at peak efficiency. If the number of workgroups is too high the speed-up will begin to fall again.
Speed-up factors are calculated by comparison with the single workgroup calculation, as it is the slowest. For the LANL2 ECP basis set substantial speed-ups are seen up to a maximum of 64 workgroups (with 16 cores per workgroup), where a speed-up factor of almost 7 is achieved. The task-farmed approach is therefore considerably more efficient than using the parallel routines in GAMESS-UK alone. Further gains were not achieved by going beyond 64 workgroups. This is firstly because a larger number of workgroups means that a larger proportion of the calculations do not benefit from an initial wavefunction guess (although for a non-benchmark calculation this could be provided using a preliminary single-point evaluation step). Secondly, only 172 gradient evaluations in total are required and therefore the load is not efficiently balanced in the 128 workgroup calculation. For larger systems it may be advantageous to use 128 or more workgroups.
For the TZVP basis set calculations were performed using a single workgroup and 64 workgroups. Similar results are seen, with the 64 workgroup calculation again achieving a speed-up of approximately 7. This indicates that the efficiency gains remain even when large matrix diagonalizations are carried out during the gradient evaluation.
(b) Nudged elastic band optimization
The nudged elastic band (NEB) algorithm is a method for finding a minimum energy path between two structures. Typically, it is used to characterize a reaction path including an energy barrier. The improved-tangent variant of NEB (Henkelman & Jónsson 2000) is available in DL-FIND (Kästner et al. 2009).
Each NEB optimization cycle consists of energy and gradient evaluations for a sequence of structures (images) with geometries that sit along a path between the two endpoints. The final NEB gradient is constructed using spring forces that connect the images. However, the gradient calculations for the images are independent and therefore can be evaluated in parallel.
The NEB algorithm in DL-FIND has been parallelized using static load-balancing. An image is assigned to a workgroup if the image number modulo the number of workgroups is equal to the workgroup ID. This method ensures that a particular image is always assigned to the same workgroup, which is important if the external program uses restart files (as is the case for a GAMESS-UK calculation). At the end of each cycle the energies and gradients are shared between all workgroups by an MPI call within DL-FIND.
The first NEB cycle is different from the others in that the workgroups each calculate the gradients in serial along the whole path. This is to help convergence of the external QM program, as the wavefunction of the previous image can be used as a guess for the next.
The benchmark system for the NEB method is shown in figure 5. It is a 3207-atom QM/MM cluster consisting of CO2 and 2H2 adsorbed onto an Al-doped zinc oxide surface. The NEB method was used to find the barrier for the interchange of H in H–CO2 between two different sites. The QM region consisted of 32 atoms with a PVDZ basis used for all elements except Zn, where a Stuttgart ECP was used. (Igel-Mann, G. ECP28SDF see http://www.theochem.uni-stuttgart.de/pseudopotentials/clickpse.en.html). This gives a total of 194 basis functions. Following Sokol et al. (2004), this region is partitioned into an inner 19 atoms treated fully by quantum mechanics and a surrounding QM/MM interface region with pseudopotentials placed on 13 Zn atoms. The interface region provides a localized embedding potential that prevents the electrons from the inner region spilling out onto the positively charged centres in the MM region.
GAMESS-UK was used for the QM calculations with the B97-1 functional. GULP was used to provide MM energies and gradients using the shell model interatomic potential of Whitmore et al. (2002). Ten images are used to describe the path, with the two endpoints frozen, giving eight gradient evaluations in total per cycle.
A single-point energy and gradient evaluation for the test system is actually an iterative cycle of QM and MM calculations. This is because the QM region is polarized by the MM atoms as point charges and the shells of the MM system are polarized in turn by the QM region. The QM/MM gradient must therefore be iterated until converged each time it is calculated.
Timings for the single-point calculation are shown in figure 6. For large core counts the scaling is very poor, with a 1024-core calculation taking a longer time to complete than a 256-core calculation. This means that the overhead of message passing between so many MPI processes outweighs any advantage from the extra computing power. Although the full iterated single-point calculation takes a significant amount of time, the individual QM and MM calculations are quite modest and do not benefit from such large core counts. Increasing the size of the QM region would make larger core counts useful, but this would have made running a full benchmark too computationally expensive. The results for this system indicate that the number of workgroups should be set as high as possible (i.e. to 8).
The NEB benchmark calculations were performed over 50 cycles. This results in an effectively converged path. Continuing on to full convergence was not desired, as small numerical differences can lead to variation in the total number of cycles for the optimization and this would not reflect the intrinsic performance of the parallelization. The most basic form of the NEB method was used, with no climbing image (Henkelman et al. 2000) and no freezing of intermediate images during the optimization.
The results are shown in table 2. Speed-up factors are calculated relative to the full single workgroup run of 1024 cores and also a run of 256 cores. The 256 core run represents the best performance achievable using a single workgroup.
The improvement offered by the task-farming approach is significant, and as expected using the maximum number of workgroups gives the highest performance. When compared with a single workgroup 1024-core run a speed-up of over 8 is found, which is only possible because the single-point 128-core calculation is faster than the 1024-core equivalent. When compared to the 256-core NEB run, the speed-up is lower than 8 but still very substantial.
(c) Global optimization methods
Two intrinsically parallel optimization methods are available in DL-FIND (Kästner et al. 2009), namely a genetic algorithm and stochastic search. These methods are typically used for finding the global minimum on a potential energy surface. Both methods involve calculations on a population of structures during each cycle, which can be run in parallel.
Following Al-Sunaidi et al. (2008), benchmark calculations were performed on ZnO nanoclusters. In Al-Sunaidi et al. (2008), MM calculations were used but for the purpose of benchmarking a much more demanding QM calculation was set up using GAMESS-UK. Timing calculations were performed on a (ZnO)28 cluster. The B97-1 functional was again used with a PVDZ basis (560 basis functions) and the Stuttgart ECP for the Zn atoms. A population of 32 structures was used, which is a typical size for these methods and is an efficient number for task-farm parallelization.
The scaling properties of the genetic algorithm and stochastic search methods are expected to be identical. For the benchmark tests stochastic search was used. A single energy and gradient evaluation is very quick and so to obtain reliable timings a full cycle of 32 evaluations was performed. The results are shown in figure 7. Again, the best performance is given by 256 cores with a slowdown observed for higher core counts. This again suggests that substantial gains can be made by splitting the MPI processes into workgroups.
Normally, a genetic algorithm or stochastic search optimization would be run for hundreds or thousands of cycles in order to have a good chance of finding the global minimum. However, owing to computational expense it was not feasible to run a baseline calculation of this length using a single workgroup. To obtain a benchmark the stochastic search algorithm was run for 20 cycles instead. This is expected to give a good representation of the scaling behaviour as each cycle contains the same number of evaluations and should therefore take approximately the same amount of time.
The results are shown in table 3. Surprisingly, of the two single workgroup runs, the 1024-core run is faster than 256 cores. This could have been due to natural variation in SCF convergence owing to the random element to the geometries created. To test this we ran the benchmark twice, but the same trend was found in both runs. This implies that there is some overhead to the large process count in the first cycle that is not present in subsequent cycles.
Speed-up factors for the task-farmed calculations are therefore given compared to the 1024-core run. Gains in performance are again substantial, with the best performance as expected given by maximizing the number of workgroups, although the performance of the 16 workgroup calculation is very nearly as good, with speed-ups of over 5 achieved in both cases.
(d) Overall performance
Substantial gains in performance were observed in implementing task-farming for all three types of calculation. The highest gains were obtained by maximizing the number of workgroups (up to the limit where the load becomes unbalanced). This result is expected because the ChemShell level of parallelism involves very little overhead compared with parallel evaluation of the energy and gradient. Single-point calculations were good predictors of the final performance and this initial step is recommended as a method for determining the optimal number of workgroups in advance.
Having benchmarked the new approach, we now apply it to study the catalytic behaviour of zinc oxide, making use of the efficiency gains achieved to increase the accuracy of the QM/MM model used.
4. Hydrogen dissociation over oxygen terminated polar surface of Al-doped ZnO
Zinc oxide (ZnO) finds applications in a large number of current research and industrial areas including electronics, electrical devices and catalysis. Pure ZnO is an n-type semiconductor, and its conductivity is strongly affected by molecular adsorbates, which is exploited in gas sensing. The non-centrosymmetric wurtzite structure adopted by ZnO along with the ionic nature of bonding in the material yield useful strong piezoelectric properties. The optical applications in blue and ultraviolet laser and light-emitting diodes rely on the wide direct band gap of about 3.37 eV at room temperature. ZnO has also been considered for spintronics applications, for which it is doped with 1–10% of transition metal ions (Mn, Fe, Co, V, etc.). The most important application of ZnO in catalysis is in the synthesis of methanol from syn-gas, a mixture of H2, CO and CO2. Both preparation and operation of the catalyst include its hydrogenation as an important step (Waugh 1992; French et al. 2001a, 2003b). The complex atomistic processes involved in sorption, reaction and then desorption of hydrogen from ZnO have captured the interest of both experimentalists and theoreticians (Taylor & Sickman 1932; Taylor & Strother 1934; Taylor & Liang 1947; Kubokawa & Toyama 1954; Low 1959, 1965; Kubokawa 1960; Eischens et al. 1962; Nagarjunan et al. 1963; Peshev 1966; Dent & Kokes 1969; Morrison 1969; Narayana et al. 1970a,b; Baraski & Galuszka 1976; Boccuzzi et al. 1978; Watanabe 1978; Bowker et al. 1981; Griffin & Yates 1982; Maslov et al. 1982; Denisenko et al. 1983; Roberts & Griffin 1986, 1988; Burch et al. 1990; Idriss & Barteau 1992; Waugh 1992; Ghiotti et al. 1993; Nyberg et al. 1996; Gines et al. 1997; Mitra et al. 1998; Lu et al. 1999; Spencer 1999; Zapol et al. 1999; Scarano et al. 2001; Kazanskii & Pid’ko 2002; Kunat et al. 2003; Rozovskii & Lin 2003; Losurdo & Giangregorio 2005; Tabatabaei et al. 2006; Bruno et al. 2009; Dierre et al. 2009; Kiss et al. 2009; Rossmuller et al. 2009; Doh et al. 2010; Weber et al. 2010) in finding a mechanism and probing the active sites in ZnO.
Our previous hybrid QM/MM study on hydrogenated ZnO has confirmed the heterolytic character of hydrogen dissociation over ZnO and identified most prominent hydroxyl and hydride vibrational modes (French et al. 2003a). In particular, the potential sites for the physi- and chemisorption of hydrogen were considered on the catalytically important polar zinc (0001) and oxygen terminated surfaces of ZnO, while the nonpolar surface was used for comparison. The adsorption of hydrogen on the ZnO surface comprises two major modes (Eischens et al. 1962; Dent & Kokes 1969; French et al. 2003a): type I adsorption is the reversible dissociative adsorption of H2 on Zn−O dimers (exposed at zinc oxide surfaces) that results in a pair of Zn−H and O−H species; type II is also dissociative adsorption of H2 but is irreversible and yields bridging Zn−H−Zn and O−H…O species (analogous centres in the bulk of ZnO have recently been erroneously characterized as multi-centred bond, which of course is not consistent with their ionic nature). The vacant oxygen and zinc interstitial surface sites have an important role in rationalizing the observed spectra, and the most noteworthy is the Type I mode at 1710 cm−1 for the zinc hydride stretching mode (French et al. 2003a). This work along with a number of complementary studies (French et al. 2001a,b, 2003a,b, 2004, 2008; Bromley et al. 2003; Catlow et al. 2003, 2005a,b, 2007, 2008, 2010; Phala et al. 2005; Sokol et al. 2007) provided the first comprehensive computational survey of the atomistic and electronic mechanisms involved in methanol synthesis over the surfaces of pure ZnO and metal clusters supported on ZnO. Analogous methods by other groups have proved to be successful and accurate in the study of localized states and processes in a wide range of recent fundamental and applied studies (Nygren et al. 1994; Mejias et al. 1995; Llusar et al. 1996; Rösch et al. 1996; Seijo & Barandiarán 1996; San Miguel et al. 1998; de Carolis et al. 1999; Kantorovich 1999, 2000a,b; Sushko et al. 1999; Nasluzov et al. 2001, 2010; Karlsen et al. 2003; Rivanenkov et al. 2003; Danyliv & Kantorovich 2004; Mysovsky et al. 2004; Seijo & Barandiarán 2004; Di Valentin et al. 2006; Wang & Kantorovich 2006; Danyliv et al. 2007; Giordano et al. 2007; Sanz & Márquez 2007; Shor et al. 2007, 2009; Kimmel et al. 2008; Ramo et al. 2008; Swerts et al. 2008; He et al. 2009; Shluger et al. 2009; Abarenkov et al. 2010).
The first industrial catalyst used for methanol synthesis was based on ZnO doped with chromium. The modern leading industrial catalyst is Cu supported on ZnO doped with aluminium (Waugh 1992), and the interaction of hydrogen with the oxide catalyst is the initial reaction for the hydrogenation of carbon oxides. In this study, we extend our work to a more realistic model of the industrial catalytic system and calculate both reaction and activation energies of hydrogenation. Moreover, developments in the new parallelization algorithms implemented in the ChemShell code along with the increase in the computational power of HPC platforms allowed us to expand fivefold the region of the catalyst studied explicitly, using QM techniques, and now including an Al impurity.
(b) Structural model and methodological approach
In this work as elsewhere we postulate the active site for hydrogen dissociation as a vacant oxygen interstitial surface site (VOISS), illustrated in figure 8. Approximately a quarter of oxygen sites are vacant on the ()-O terminated polar surface of ZnO, which is a natural way of ridding ZnO crystallites of the physically unsustainable macroscopic dipoles. Our preliminary MM surface studies using the two-dimensional periodic code MARVIN (Gay & Rohl 1995) have shown that the oxygen terminated surface on reconstruction and relaxation experiences a very large rumpling associated with the voltage of approximately 1.35 V. Thus, a natural electron sink is formed at this surface, which results in electron accumulation in ZnO at the oxygen terminated surface, where a significant fraction of electrons would localize at the anionic vacant sites. Moreover, the aluminium doping in the material is typically achieved by substitution of Zn ions for Al, which forms very shallow donor centres that practically auto-ionize, with the electrons lost to the conduction band. In our calculations, we have placed one such centre in the uppermost surface layer of ZnO to observe its interaction with the electron localized naturally on the available VOISS centres.
Thus, a QM/MM embedded cluster model has been built around a VOISS centre with a second-shell neighbour Al impurity on the pre-optimized semi-infinite molecular model of ZnO. The embedded surface quantum region consists of 61 atoms. The interface comprises 63 Zn ions bearing large-core energy-consistent pseudopotentials (the same as used in the nudged elastic band benchmark). Atoms in the QM region, interface and the further 306 atoms of the MM region situated within a 13 Å radius from the centre are all considered as active, i.e. subject to geometry optimization to the energy minima or transition states. The active region is in turn embedded in the frozen MM region that consists of 2501 atoms surrounded by 269 fixed point charges. Overall, there are 3200 atoms in the system.
As in the benchmarking calculations, the QM and MM parts of the calculation were carried out with GAMESS-UK and GULP, respectively. Density functional theory was used as the QM method using the B97-2 hybrid exchange-correlation functional (Wilson et al. 2001), which was trained to reproduce not only equilibrium properties of atoms and molecules but also reaction barriers. A TZV2P basis set was used for the QM calculations (Weigend & Ahlrichs 2005). The MM region is treated using polarizable shell-model ionic interatomic potentials (Whitmore et al. 2002; Catlow et al. 2008) as before.
The parallel nudged elastic band method was employed to find reaction barriers and transition states in two stages. First, we built a crude approximation to the reaction path connecting reactants and products with a small number of images(starting with eight images and then doubling this number to obtain a finer detail of the path). Next, a refinement to the barrier found by the initial run was achieved by a new NEB search on a finer mesh, zooming in close to the transition state, for which the neighbouring points on either side of the crude transition state were taken as the new end points. In the second stage, we have used 10 images per individual path (see below) and one workgroup per image. The final run has been done using only five images.
(c) Reaction path for hydrogen dissociation
Our study of the dissociation of an H2 molecule over the Al/ZnO surface starts by considering its physisorption at the active site. The physisorption (or van der Waals adsorption) of molecular hydrogen on zinc oxide was characterized in the early studies of Taylor and co-workers who reported a heat of adsorption of less than 1 kcal mol−1 at −78°C (Taylor & Strother 1934), which has been later refined to a value of 0.6 kcal mol−1 (Taylor & Liang 1947). We also calculate a small value of 0.3 kcal mol−1, in agreement with experiment, but should note that the long-range van der Waals interactions are not accounted for by the hybrid DFT employed here but, we expect, the largest contribution to this calculated energy is owing to the electrostatic interaction between the ionic surface and the hydrogen quadrupole. On adsorption, the H2 molecule is found on a very shallow potential energy surface where it floats over ZnO at about 3 Å above the uppermost O ions, as shown in figure 8.
We previously reported the H2 dissociation energy over the bare polar surface of pure ZnO to be approximately 46.6 kcal mol−1, which significantly decreases with hydrogen loading onto the surface (see fig. 7, Catlow et al. 2008). In close agreement with our previous studies and experiment, we find the dissociation step in the reaction of chemisorption to be strongly exothermic, with the total heat of adsorption from vacuum of approximately 42 kcal mol−1 (or 21 kcal mol−1 per H atom). The high activation energy explains the slow initial uptake of hydrogen in adsorption experiments. As in the case of pure ZnO, on dissociation, one hydrogen atom occupies the vacant oxygen site and is hydridic (H−) whereas the other (as a proton) forms a hydroxyl group with one of the nearest oxide ions. While the hydride ion lies just about 0.2 Å below the uppermost oxide surface plane, the hydroxyl group stands proud of the surface. The full reaction energy profile is shown in figure 9 with relevant data summarized in table 4 (where the positive sign indicates the exothermicity).
Our first calculation of the reaction path for the heterolytic dissociation of a hydrogen molecule over zinc oxide yielded an energy profile with two maxima, which indicated the existence of a stable intermediate state as shown in figure 9. Unconstrained geometry optimization starting from the suspected energy minimum structure has confirmed this finding and allowed us to refine the intermediate structure that comprises a hydrogen atom floating over the VOISS, which is in turn occupied by a hydride ion. The electron initially trapped at the VOISS is now localized on the floating hydrogen atom, which, on dissociation of the H2 molecule, returns to the ‘high’ position, about 3 Å above the surface. In the final stage of the reaction the hydrogen atom is seen to ‘dive’ (see description of the transition states below) to the surface forming a hydroxyl group while an electron is now passed to the nearest VOISS. In this process, an electron donated by Al can be seen as a seed catalysing the reaction of H2 chemisorption, in analogy with a polymerization process.
Next, we performed a series of NEB searches to identify transition states of interest: one between the physisorbed hydrogen molecule and intermediate and another between the intermediate and the chemisorbed hydrogen species. The first transition state is characterized by a significantly weakened H−H bond of 1.41 Å (cf. the equilibrium value of 0.74 Å) with the molecular axis making an acute angle to the plane of the surface where the lower hydrogen is in the vacancy and the upper hydrogen about 2.7 Å from two surface oxide ions. In the second transition state, the atomic hydrogen is at a distance of 1.95 Å from the oxide ion, with which it is about to make a hydroxyl group, thus has moved inwards, while the hydride ion again remains at the vacant site as a spectator. The first transition state proved to be higher in energy than the second, at approximately 30 kcal mol−1, and thus rate determining, whereas the second provides a forward barrier of only 4.5 kcal mol−1, which implies a very short life time for the atomic hydrogen intermediate. The high backward barrier energy of approximately 72.1 kcal mol−1 for single H2 dissociation in this work is very close to the experimental value of approximately 71.7 kcal mol−1 reported by Bowker et al. for clean ZnO surfaces and comparable with the experimental value of ca. 77.9 kcal mol−1 reported by Kunat et al. for the hydroxylated surface. Our calculations corroborate the experimental characterization of this reaction as irreversible, i.e. it is difficult to remove surface hydrogen from Al/ZnO to form gaseous H2. However, on increasing hydrogen loading, a type I reversible adsorption occurs. The resultant hydrogen species would then take part in other catalytic reactions on the Al/ZnO, including co-adsorption of CO2 (from syn-gas), which picks up the hydrogen from the surface of the catalyst in steps and through various intermediates to produce methanol (Waugh 1992; French et al. 2001a).
In summary, in this work, we have obtained activation energies of adsorption within 1–2 kcal mol−1 of the experiment, which provides further evidence that the solid-state computational techniques used have now matured and can provide the chemical accuracy previously achieved only for molecular systems (cf. Catlow et al. 2005a,b).
T.W.K. was supported by the HECToR Distributed CSE project ‘Efficient Massively-Parallel Tools for the Study of Catalytic Chemistry’. G.D., A.A.S. and C.R.A.C. acknowledge support by an EPSRC Portfolio Partnership (grant no. ED/D504872) and membership of the UK’s HPC Materials Chemistry Consortium, which is funded by EPSRC (grant no. EP/F067496). We would like to thank Julian Gale (Curtin University, Australia) for helping us to prepare GULP for use in the task-farming framework. Huub van Dam (PNNL, USA) provided the silicate structure used for the Hessian benchmark calculations. We are grateful to Aron J. P. Walsh and Scott M. Woodley for useful discussions. We would like to thank our collaborators on the previous work on hydrogen adsorption, S. A. French, S. T. Bromley and S. C. Rogers.
One contribution of 16 to a Special feature ‘High-performance computing in the chemistry and physics of materials’.
- Received November 22, 2010.
- Accepted January 6, 2011.
- This journal is © 2011 The Royal Society