.. _chemSearch: Chemical searches ================= Workflows for chemical searches differ from the ones for :ref:`confSearch` in that they aim to find the most stable "chemical entity" for a molecule under certain conditions. This approach is different from simply finding the lowest energy conformer. For example, one can ask questions such as "What is the most stable protomer when a proton is added or removed from the molecule?" The workflows for answering such questions may also involve conformational searches, which will be covered in several examples below. .. _anionSearch: Anion Search - conjugate bases of lactic acid ---------------------------------------------- First steps ........... Let's start with a very simple example: the lactic acid. It has `two protons that are relatively easy to be removed `_ in solution, one from the alcohol group and one from the carboxylic acid. .. figure:: chemSearch/latic_acid.png :align: center :width: 400 Lactic acid and the two most acidic groups. We can try using the **deprotonation** workflow from WEASEL to find which deprotonation would lead to the most stable structure. Simply :download:`download ` the ``lactic.xyz`` file with the structure to the directory in which you want to run your calculation and run: .. prompt:: bash $ weasel lactic.xyz -W ChemSearch-Deprot Alternatively, the SMILES string can be used: .. prompt:: bash $ waesel -smiles C[C@H](O)C(=O)O -W ChemSearch-Deprot Steps of the WEASEL deprotonation workflow ............................................ The WEASEL deprotonation workflow consists of the following steps: .. figure:: chemSearch/workflow_deprot.jpg :align: center :width: 650 **Step 1**: Generation of deprotonated structures (conjugate bases) - First, all possible anions are generated within an energy range of 30 kcal/mol. This is important because the XTB method used in PreOpt has a larger error when comparing different protomers (or conformers of a single molecule in general), but it can be changed by setting the ``ChemSpaceGen_MaxEn`` variable in the workflow file or passing it on the command line as ``-chemspace-gen-enrange``. .. important:: Unless you have very specific reasons, don't make this number smaller than the default otherwise some major errors might occur. **Step 2**: Conformer ensemble generation for each conjugate base - After this first step, all conjugate bases selected will be subjected to :ref:`confSearch`, following to the same options and flags described in the respective page. This is important because one generated anion might be more stable than the other if one considers the whole conformational distribution, including H-bonds, intermolecular interactions and other means to stabilize the extra charge. The initial output looks like:: =================================== 1. Generating Chemical Ensemble =================================== (...) 2 anion(s) were found. Action 1. completed successfully! ==================================== 2. Generating Conformer Ensemble ==================================== Resource allocation: 2 conformer search(es), 8 total core(s), 1 calculation(s) at a time. ConfSearch 1: Starting conformer search. ConfSearch 1: Conformer search successful. ConfSearch 2: Starting conformer search. ConfSearch 2: Conformer search successful. (...) Action 2. completed successfully! In this case, only the two anions obtained by deprotonating the alcohol and acid are selected. Deprotonating a CH3 for instance,is immediately recognized as a too high energy process and thus discarded. .. note:: Of course, in this particular case the molecule is small and rather rigid. There is no real need for a conformational search and it can be turned off by using ``-no-confsearch`` in the input line. But that is not the general case as we will see below! Assembling, optimization and ranking .................................... With the conformational search done for each protomer, the whole group of molecules is assembled into a single set and optimized using XTB (or any other ``-preopt-method`` given in the input). By default here we also use a ``ConfGen_EnRange_PreOpt`` (default energy range for the PreOpt step) of 30 kcal/mol, to keep consistent with the expected error of the method. This is followed by an optimization of all structures with r2SCAN-3c, and the final energy ranking is done with the higher-level method wB97X-V. .. important:: As with the other methods, WEASEL always uses water as a default solvent! Please check the :ref:`solvation` page for how to change this. .. note:: If you want to change the optimization or the energy ranking method, please change the respective workflow files or use different ``-preopt-method`` and ``-opt-method`` in the workflow file. In the very end, a summary is printed:: ************************* SUMMARY ANION CONFORMATIONAL SEARCH ************************* Final set of 1 Anion conformers: Conformer 1 / Anion 1: 0.00 kcal/mol Original Conformer-ID: Anion / 1-ID: 1 ***************************************************************************************** and in this case, only one anion was found after the final energy filtering of up to 3 kcal/mol. This is the anion obtained by removing a proton from the acid group. As the pKa for the alcohol is much higher, that is exactly what we expected: .. figure:: chemSearch/lowestAnion.png :align: center :width: 400 The lowest energy anion obtained for the lactic acid. .. _filestruc: The file structure .................. In terms of the file structure, the chemical searches produce very similar data as for the :ref:`confSearch`, except that the workflow file name is added at the end. In this case, it looks like:: . ├── lactic.xyz └── lactic_anionSearch ├── lactic_lowestAnion.xyz ├── lactic.anionEnsemble.xyz ├── lactic_anionSearch.report ├── lactic_anionSearch.summary └── AnionSearch ├── CREST job files └── ORCA job files +--------------------------------+----------------------------------+ | File | Description | +================================+==================================+ | lactic_lowestAnion.xyz | lowest-energy anion | +--------------------------------+----------------------------------+ | lactic.anionEnsemble.xyz | anions + conformers ensemble | +--------------------------------+----------------------------------+ | lactic_anionSearch.report | Output file of the WEASEL run | +--------------------------------+----------------------------------+ | lactic_anionSearch.summary | Summary file for the WEASEL run | +--------------------------------+----------------------------------+ .. _protomerSearch: Protomer Search - the case of histidine --------------------------------------- Initial steps ............. The amino acid histidine, at neutral pH, has three nitrogen atoms susceptible to protonation. In this case the situation is also much more complicated than for the lactic acid, due to the flexibility of this molecule. .. figure:: chemSearch/histidine.png :align: center :width: 400 The amino acid histidine and its three protomers. Again, after :download:`downloading ` the ``histidine.xyz`` file into the directory where you want to run the calculation, you can find the protomers by using: .. prompt:: bash $ weasel histidine.xyz -W ChemSearch-Prot Steps of the WEASEL protonation workflow ............................................ The WEASEL protonation workflow consists of the following steps: .. figure:: chemSearch/workflow_prot.jpg :align: center :width: 650 Conformational ensemble ....................... Now at first, four structures are generated, as shown below: .. figure:: chemSearch/his_initial.gif :align: center :width: 400 The initial four histidine protomers generated using XTB. The two most stable structures make chemical sense, the last two are somewhat off, such as the one obtained by adding a proton to a carbon of the imidazole ring. Anyway, the algorithm follows with a conformational search on each of these protomers:: =================================== 1. Generating Chemical Ensemble =================================== (...) 4 protomer(s) were found. Action 1. completed successfully! ==================================== 2. Generating Conformer Ensemble ==================================== Resource allocation: 4 conformer search(es), 8 total core(s), 1 calculation(s) at a time. (...) Chemical space 1: 28 protomer(s) were found. Chemical space 2: 4 protomer(s) were found. Chemical space 3: 9 protomer(s) were found. Chemical space 4: 36 protomer(s) were found. Action 2. completed successfully! and a total of 77 conformers of these four protomers are found. The process follows as described before with XTB pre-optimization, followed by a more accurate r2SCAN-3c optimization and finally the energy ranking is done using wB97X-V. Results after ranking ..................... The final results are then:: ************************* SUMMARY PROTOMER CONFORMATIONAL SEARCH ************************* Final set of 14 Protomer conformers: Conformer 1 / Protomer 2: 0.00 kcal/mol Original Conformer-ID: Protomer / 1-ID: 2 Conformer 2 / Protomer 1: 0.62 kcal/mol Original Conformer-ID: Protomer / 1-ID: 1 Conformer 3 / Protomer 2: 0.68 kcal/mol Original Conformer-ID: Protomer / 3-ID: 2 Conformer 4 / Protomer 1: 0.85 kcal/mol Original Conformer-ID: Protomer / 3-ID: 1 Conformer 5 / Protomer 2: 1.48 kcal/mol Original Conformer-ID: Protomer / 2-ID: 2 Conformer 6 / Protomer 1: 1.57 kcal/mol Original Conformer-ID: Protomer / 2-ID: 1 Conformer 7 / Protomer 1: 1.87 kcal/mol Original Conformer-ID: Protomer / 6-ID: 1 Conformer 8 / Protomer 1: 2.02 kcal/mol Original Conformer-ID: Protomer / 9-ID: 1 Conformer 9 / Protomer 2: 2.06 kcal/mol Original Conformer-ID: Protomer / 4-ID: 2 Conformer 10 / Protomer 1: 2.24 kcal/mol Original Conformer-ID: Protomer / 23-ID: 1 Conformer 11 / Protomer 1: 2.38 kcal/mol Original Conformer-ID: Protomer / 12-ID: 1 Conformer 12 / Protomer 1: 2.52 kcal/mol Original Conformer-ID: Protomer / 11-ID: 1 Conformer 13 / Protomer 3: 3.09 kcal/mol Original Conformer-ID: Protomer / 5-ID: 3 Conformer 14 / Protomer 1: 3.21 kcal/mol Original Conformer-ID: Protomer / 7-ID: 1 ******************************************************************************************** Please note that here, in contrast to the :ref:`confSearch`, on the right side there are Conformer-IDs and Protomer-IDs. The later relates to the first step, where different protomers were generated, while the former relates to a conformer number within that protomer set. The first conformer numbers on the left side are then related to the energy ordering. The actual structures obtained are: .. figure:: chemSearch/his_final.gif :align: center :width: 400 The final fourteen histidine protomers, ranked using wB97X-V. Looking closely, the most stable structures belong to protomers 2 and 1, in which the amine or the histidine ring are protonated. The final energy difference between the first two is surprisingly close to the `experimental value of 0.39 kcal/mol `_, considering that no explicit solvation method was used here! :ref:`filestruc` is analogous to what was described above for the anion case, with a simple name change for protomerSeach, as it will also be for the other methods. .. _tautomerSearch: Tautomer Search - ESIPT molecules --------------------------------- We also have a workflow designed to find tautomers of initial guess structures. In particular, we can find here the tautomers that can be generated by a proton transfer, that usually interconvert quickly via tunneling. For instance, let's investigate the 2-(2'-hydroxyphenyl)benzimidazole (HBI), which is important in Intramolecular Excited State Proton Transfer applications and is known to have `three different tautomers in water `_: .. figure:: chemSearch/HBI.png :align: center :width: 400 The three main tautomers/conformers from HBI in neutral water. The so-called *cis-enol* conformer (Ec) is known to be the most stable one in neutral water, while the *ketone* (K) is usually not observed experimentally in the ground state. Let's :download:`download ` the structure file HBI.xyz and call WEASEL to do the **Tautomer Search**, simply using the respective workflow: .. prompt:: bash $ weasel HBI.xyz -W ChemSearch-Tautomer Steps of the WEASEL tautomer workflow ............................................ The WEASEL tautomer workflow consists of the following steps: .. figure:: chemSearch/workflow_tautomer.jpg :align: center :width: 650 As said before, this calculation automatically runs in water as a solvent, and the same steps described above are done. Our final results is then:: ************************* SUMMARY TAUTOMER CONFORMATIONAL SEARCH ************************* Final set of 1 Tautomer conformers: Conformer 1 / Tautomer 3: 0.00 kcal/mol Original Conformer-ID: Tautomer / 1-ID: 3 ******************************************************************************************** And only the tautomer with initial ID 3 is found to be within the final 3 kcal/mol range, after computing the energies using wB97X-V. The structure of the isomer saved in the *HBI2_lowestTautomer.xyz* is exactly what was expected, the Ec tautomer: .. figure:: chemSearch/HBI_final.png :align: center :width: 400 Final structure found after the tautomer search, corresponding exactly to the experimental result. .. note:: These results might be quite sensitive to the solvation model. Also consider using the SMD model (``-smd``), or even adding some explicit solvent molecules at hot spots in the molecule where they might be strongly interacting. HAT Search - finding the weak spots of methylcyclohexane -------------------------------------------------------- Yet another important chemical search is the Hydrogen Atom Abstraction (HAT) search, which can be used to find which bonds in a given molecule are more likely to suffer HAT. As an example, we will will the textbook case of methylcyclohexane (mch). It is known from experiments with radical reactions that the bond most likely to suffer HAT there is the one hydrogen bound to the tertiary carbon atom. .. figure:: chemSearch/HAT.png :align: center :width: 400 Reaction between methylcyclohexane and a bromine radical. Let's check that using quantum chemistry! The initial geometry can be :download:`downloaded ` (``mch.xyz``) and the HAT workflow can be called: .. prompt:: bash $ weasel mch.xyz -W ChemSearch-HAT Steps of the WEASEL HAT workflow ................................... The WEASEL HAT workflow consists of the following steps: .. figure:: chemSearch/workflow_hat.jpg :align: center :width: 650 It will again will go over a cascade of methods, as just described for the other workflows, except that there is **no** conformational search step after the generation of the HAT species, and the final printing is:: ************************* SUMMARY H-ATOM ABSTRACTION SITE SEARCH ************************* Final set of 4 HAT Sites: HAT Site 1: 0.00 kcal/mol Original HAT Site-ID: 1 HAT Site 2: 2.34 kcal/mol Original HAT Site-ID: 3 HAT Site 3: 2.39 kcal/mol Original HAT Site-ID: 4 HAT Site 4: 2.65 kcal/mol Original HAT Site-ID: 6 ******************************************************************************************** .. important:: A conformational search on each generated radical can be turned on by add ``-confsearch`` to the main input! Here, the most stable molecule found is precisely what is also observed experimentally, with the H being removed from the tertiary carbon: .. figure:: chemSearch/HAT_final.png :align: center :width: 400 Final structure found after the HAT workflow The other closely lying structures are the ones obtained by removing hydrogen atoms from the other three positions in the ring. The removal of a proton from the primary carbon, as expected, is too high in energy and does not even show up here. At least, it should not be below the final 3 kcal/mol range and is thus not relevant at room temperature. Ion Binding Search - where do cations bind? --------------------------------------------- The last chemical search we will show here is what we call the **Ion Binding Search**. The idea is to find how certain molecules can interact with certain atomic cations. The workflows runs first through a protonation step, to find the sites more likely to bind H+, and those are replaced by the ion of choice. This is very useful when computing ion-binding energies, for instance when investigating `lithum-ion batteries `_. Let's try to find, from the reference above, at which position is more likely a Li+ ion to bind to a sulfuryl solvent, and how is the distribution of species, e.g.: .. figure:: chemSearch/sulf.png :align: center :width: 400 Sulfuryl used in this study. In this case, since the molecule is rather rigid anyway and there are not too many different conformations, we will start with a simple smiles string, saved in a sulfuryl.smi file:: S1(OCCO1)([O])[O] and the command to run the workflow is: .. prompt:: bash $ weasel sulfuril.smi -W ChemSearch-IonBinding -ion Li+ .. important:: Here we **have** to add the ``-ion`` flag, followed by a cation and its charge (or Ni2+, Al3+, and so on), so that WEASEL knows what to add. Steps of the WEASEL HAT workflow ................................... The WEASEL HAT workflow consists of the following steps: .. figure:: chemSearch/workflow_ionbinding.jpg :align: center :width: 650 A sequence of similar steps is done as usual, and the final result is printed:: ************************* SUMMARY ION BINDING SITE SEARCH ************************* Final set of 2 Ion Binding Sites: Ion Binding Site 1: 0.00 kcal/mol Original Ion Binding Site-ID: 1 Ion Binding Site 2: 1.88 kcal/mol Original Ion Binding Site-ID: 5 ************************************************************************************* Here the two structures found in the end are: .. figure:: chemSearch/sulf_final.png :align: center :width: 400 The two structure found as most stable, with their relative energies. This means that, at least under this conditions, there should be two thermally accessible structures, however one would be in less quantity (about 4% only). .. important:: Here a conformational search is also **not** done by default after the generation of the initial structures. It can be turned on by setting ``-confsearch`` in the main input. Be aware that could disrupt the molecule-cation structure however. .. note:: The clustering option from the :ref:`confSearch` also applies here! Specific Chemical Search Options -------------------------------- Besides the flags related to :ref:`confSearch` and :ref:`arguments`, the specific options that can be given related to these chemical searches are: Energy range used for the initial generation of different chemical entities :keyword: ``-chemspace-gen-enrange REAL`` :description: An energy value in kcal/mol. The default is 30 kcal/mol and this value should not be reduced without good reasons. If you want to include most or all possible structures in the next steps, you need to increase it even further.