.. _CCScalc: Ion Mobility Collision Cross Sections ====================================== The `ion mobility collision cross section (CCS) `_ is a value that refers to the area of a particle (atom or molecule) where interactions with another particle can take place. CCS values are routinely measured using `ion mobility spectrometry (IMS) `_, and are of significant interest in areas like `metabolomics `_, in which it is used to obtain information about the lowest energy conformers. .. _CCS run: How to run the calculation -------------------------- Inside WEASEL, all necessary steps to compute CCS values are done by one of three `CCS workflows`_. Chosing the right workflow depends on the chemical properties of the input structure. In the following, we will use *Ethiprole* as an example, for which we took the structure from `PubChem `_ (as a *SDF* file). If you want to run the structure by yourself, you can also use its SMILES string with the keyword ``-smiles CCS(=O)C1=C(N(N=C1C#N)C2=C(C=C(C=C2Cl)C(F)(F)F)Cl)N``. It is usually the case that the CCS is computed for charged forms of neutral molecules, either by protonating/deprotonating, or adding external cations/anions. On these cases, protonation and consecutive CCS calculations can be done using: .. prompt:: bash $ weasel Ethiprole.sdf -W CCS-Prot-Ensemble Deprotonation and consecutive CCS calculations can be done using: .. prompt:: bash $ weasel Ethiprole.sdf -W CCS-Deprot-Ensemble If the input structure is already charged, a CCS calculation can be started by: .. code:: weasel Ethiprole.sdf -W CCS -c .. important:: The ``-W CCS`` workflow conducts an optimization (GFN2-xTB) and a molecular charge calculation (B3LYP/def2-TZVP), but skips the protomer and conformer search. Replace the ```` with the total charge of the ion, e.g. 1 for a single positive charge, and -1 for a single negative charge. Because of the flexible nature of *Ethiprole*, we used the ``CCS-Prot-Ensemble`` for this example. When the calculation is finished, the final results are parsed into the ``.report`` and the ``.summary`` file. The most relevant parts are shown in the following, but for more details, see the decription of :ref:`the output files `. The report file will look something like this, where the IDs of the structures, as well as their weights are reported: .. figure:: ccs/CCS_weasel_report.svg :align: center :width: 100% Part of the report file printing the CCS values. The weights are the contributions of each structure to the total results calculated inside WEASEL by their energetical proportions. In the summary file, only the most relevant informations are written: .. figure:: ccs/CCS_weasel_summary.svg :align: center :width: 100% Part of the summary file printing the CCS values. If the calculation takes a while, don't worry! With 32 cores it should take about 1.5 hours, to give you a rough estimation. There are a lot of steps going on, so to understand how the final CCS values are computed, let's take a more detailed look at the individual workflow steps. .. _CCS workflows: The steps of the workflow ------------------------- The steps of the ``CCS-Prot-Ensemble`` and ``CCS-Deprot-Ensemble`` workflows are conducted as follows: .. _CCS workflow overview: .. figure:: ccs/CCS_weasel_workflow.png :align: center :width: 80% Overview of the CCS-(De-)Prot-Ensemble workflow steps inside WEASEL. The number of structures produced/reduced by each step for *Ethiprole* are added for reference. 1. Starting with an input structure (e.g. a SMILES string, XYZ file, etc.), the workflow follows a series of steps. 2. Protomer search of all possible (de-)protonation sites using the :ref:`protomer search workflow `. 3. Conformer search of all (de-)protonated structures with the :ref:`conformer search tool `. The preoptimization uses xtb at the **GFN2-xTB** level and filters structures with energies > 30 kcal/mol. Finally, clustering and single point calculations at the **B3LYP/def2-TZVP** level of theory filters out structures with an energy difference > 4 kcal/mol. 4. A special `geometrical clustering`_ of all conformers belonging to each protomer leads to the final structures, that are Boltzmann weighted for the last step. 5. Finally, the :ref:`collision cross section calculations ` lead to the wanted results. As you can see, the CCS calculation incorporates a lot of different steps of the WEASEL infrastructure. We will take a closer look at the specifics concerning the `CCS files`_ produced during the calculations and how to :ref:`change the default behavior ` of the calculations. .. _CCS files: The output files ---------------- A lot of files are produced during to the :ref:`protomer search ` and the consecutive :ref:`conformer search `, that are described in detail on their corresponding help pages. Additionally, the `CCS workflows`_ will create a folder called **CCS**, in which the detailed output files are parsed. The folder structure is built up like the following:: . ├── Ethiprole.sdf └── CCS ├── CCS_2 ├── Ethiprole_Protomer_2_lowestProtomer_3_CCScalc.ccsc ├── Ethiprole_Protomer_2_lowestProtomer_3_CCScalc.out └── Ethiprole_Protomer_2_lowestProtomer_3_CCScalc.xyz └── Clusters ├── Ethiprole_CCS-Prot-Ensemble_CCScalc.clustering.out ├── Ethiprole_CCS-Prot-Ensemble_CCScalc.input.xyz ├── Protomer2_cluster1.xyz └── Protomer2_cluster2.xyz The CCS values are calculated inside WEASEL using the sub-program :ref:`CCScalc `, which directs its output into a folder named **CCS_x**. Here, the index **x** is a placeholder for the conformer ID of the structure that was calculated, e.g. conformer **49** can be found in folder **CCS_49**, conformer **2** in **CCS_2**, and so on. The folder contains the input and output files of the :ref:`CCScalc program ` and looks like the following: +------------------------------------------------------+------------------------------------+ | CCS Files | Description | +======================================================+====================================+ | Ethiprole_Protomer_2_lowestProtomer_3_CCScalc.ccsc | input file specific to CCScalc | +------------------------------------------------------+------------------------------------+ | Ethiprole_Protomer_2_lowestProtomer_3_CCScalc.out | output file of CCScalc | +------------------------------------------------------+------------------------------------+ | Ethiprole_Protomer_2_lowestProtomer_3_CCScalc.xyz | structure input file in xyz format | +------------------------------------------------------+------------------------------------+ Inside the **CCS** main folder, a **Clusters** folder is created during the `geometrical clustering`_ in step 4 of the `CCS workflows`_. Depending on the number of structures that are clustered, the number of clusters can vary. The files look like the following: +----------------------------------------------------+---------------------------------------+ | Clusters Files | Description | +====================================================+=======================================+ | Ethiprole_CCS-Prot-Ensemble_CCScalc.clustering.out | information on the clustering process | +----------------------------------------------------+---------------------------------------+ | Ethiprole_CCS-Prot-Ensemble_CCScalc.input.xyz | input structures before clustering | +----------------------------------------------------+---------------------------------------+ | Protomer2_cluster1.xyz | cluster 1 of protomer 2 | +----------------------------------------------------+---------------------------------------+ | Protomer2_cluster2.xyz | cluster 2 of protomer 2 | +----------------------------------------------------+---------------------------------------+ .. _CCScalc program: CCScalc ------- .. figure:: ccs/CCScalc_logo.png :align: center :width: 50% .. _CCScalc usage: Usage ^^^^^^ Inside WEASEL, the *CCScalc* program calculates the CCS of an input molecule by computing the mobility of an ion via molecular dynamics (MD) simulations based on Hamilton's equation of motion. For some more details, let's take a look at the *CCScalc* output file inside the CCS folder. .. figure:: ccs/CCScalc_output.svg :align: center :width: 70% The CCScalc output of version 0.2. The most important parts are marked and commented. The output file first lists the *CCScalc* settings that were used for the calculation, and then provides the CCS values and errors of each iteration of the calculations. Each CCS calculation is done a minimum number of times to obtain a more reliable result. Underneath, the summary provides the averaged CCS values, as well as the mobility value and the standard error of the mean. Settings ^^^^^^^^ The default settings can be manipulated with the following command-line arguments: .. warning:: Changing the default values can drastically alter the results and if the values are lowered, the accuracy of the method will decrease! .. _CCScalc settings: +-----------------------+-----------------------------------------+-----------------------------------------------------------------+---------+ | Keyword | Description | Inside CCScalc | Default | +=======================+=========================================+=================================================================+=========+ +-----------------------+-----------------------------------------+-----------------------------------------------------------------+---------+ | ``-ccs-velocity`` | number of velocity integration steps | sets the accuracy for collision parameter b | 48 | +-----------------------+-----------------------------------------+-----------------------------------------------------------------+---------+ | ``-ccs-maxcycles`` | maximum number of cycles | upper limit for convergence cycles | 40 | +-----------------------+-----------------------------------------+-----------------------------------------------------------------+---------+ | ``-ccs-mincycles`` | minimum number of cycles | lower limit for convergence cycles | 8 | +-----------------------+-----------------------------------------+-----------------------------------------------------------------+---------+ | ``-ccs-impact`` | number of random stating geometries | rotate molecule and vary distance Gas--Molecule | 768 | +-----------------------+-----------------------------------------+-----------------------------------------------------------------+---------+ | ``-ccs-collgas`` | the kind of collision gas | set He or N2 | N2 | +-----------------------+-----------------------------------------+-----------------------------------------------------------------+---------+ | ``-ccs-sem`` | computation stopping criterion | relative deviation of the standard error of the mean in percent | 0.35 | +-----------------------+-----------------------------------------+-----------------------------------------------------------------+---------+ | ``-ccs-(no-)cluster`` | switch on/off `geometrical clustering`_ | --- | on | +-----------------------+-----------------------------------------+-----------------------------------------------------------------+---------+ .. warning:: If the number of cycles is too small for the calculation to converge under the value provided by the ``-ccs-sem`` keyword, the program throws an error, but will still go through with the calculation. **Be advised**: the results might not be considered statistically converged! .. note:: If the number of maximum cycles is lower than the number of minimum cycles, the program will throw a warning but sets ``-ccs-mincycles`` to the value provided by the user (``-ccs-maxcycles``). .. note:: CCScalc takes the temperature from the WEASEL input, which can be changed by setting ``-temp`` and the temperature in Kelvin via command line. The number of cores used for the calculation is the same as provided with WEASEL. .. note:: The DFT charge calculation is important to obtain the correct di- and quadrupole contributions. The involved parametrization was conducted on the B3LYP/def2-TZVP level, which is taken as the standard level for the charge calculations. Changing this level could decrease the accuracy of the method! .. _geometrical clustering: Geometrical clustering ---------------------- The larger the molecule, the more structures remain after step 3 in the `CCS workflows`_ (the conformer search), which is schematically depicted in the `CCS workflow overview`_. While in the example of *Ethiprole* the number of conformers is small, larger molecules easily have hundreds of structures remaining after the conformational search. However, computing the CCS value of all these structures is computationally demanding and having hundreds of CCS values can cause confusion concerning the "true" result. Clustering the structures by their geometrical similarities thus reduces the number of structures that need to be calculated by *CCScalc*, decreasing the timings significantly and provides a good overview on the relevant values. When we take a look at the ``.results`` file of the calculation, we can see the geometrical clustering step looking like the following: .. figure:: ccs/CCS_weasel_clustering.svg :align: center :width: 100% The geometrical clustering of the structures in step 4 of the `CCS workflows`_. Each protomer is clustered seperately and the number of protomers and conformers remaining after clustering is printed in the end. For these structures the CCS values are calculated and printed as seen :ref:`above `.