.. _detailedinstall: Detailed installation instructions ================================== The following information is intended for system administrators, e.g. for cases in which WEASEL is set up to run on a remote cluster within a queuing system. * For how to identify the ressources that need to be allocated, see `Allocating ressources`_. * For how to identify the files that need to be transfered to the remote machine, see `Required files at WEASEL start`_. Requirements ------------ Software requirements ..................... * WEASEL is distributed as a self-contained archive including all libraries to run. In this archive resides a directory (e.g. ``weasel-1.11``) with the main executable (``weasel``), ``README.txt``, sample config files and all libraries and test files in subdirectories:: weasel-1.11 ├── orca ├── weasel ├── README.md ├── changelog.md ├── LICENSE.txt ├── 3rd-party-licenses │   └── License files for bundled 3rd party libraries ├── examples │   └── Test files (molecules) ├── references │   └── Reference molecules for NMR shift calculations ├── settings │   ├── defaults.ini │   └── minimal-1.11.ini.sample ├── workflows │   └── workflow settings files └── lib    └── Libraries and data files Hardware requirements ..................... While WEASEL has only low hardware requirements, but WEASEL calls ORCA during runtime, so its hardware requirements are mainly determined by the ORCA requirements, see separate instructions. Running WEASEL -------------- A simple WEASEL job is run directly from the command line: .. prompt:: bash $ weasel structure.mol2 [-KEY1] [-KEY2] ... In its simplest form it only requires a structure file, here *structure.mol2*. The settings that define what is done in a WEASEL job are defined next. Default settings ................ Default settings :file: ``/etc/weasel/settings-.ini``, ``~/.config/weasel/settings-.ini`` :description: Configuration file in which WEASEL looks for default settings. Usually it contains only minimal mandory information for WEASEL to run (see also :ref:`here`) like installation paths to software installations (e.g. ORCA) and hardware settings (CPUs, memory). For system-wide installations the system administrator should adjust ``weasel-/settings/minimal-settings-.ini.sample`` and install it in ``/etc/weasel/settings-.ini``. This sample file just contains the minimal set of variable which WEASEL requires to run. The path of the system settings file ``/etc/weasel/settings-.ini`` can be customized by the environment variable ``WEASEL_SYSTEM_SETTINGS``. Although normally not required, the experienced user finds an exhaustive set of settings with WEASEL's built-in defaults and explanatory comments in ``weasel-/settings/defaults.ini``. .. important:: In a cluster environment with shared home directories, the use of ``~/.config/weasel/settings-.ini`` is discouraged since installation paths and hardware configurations might differ from node to node. Project-specific defaults :keyword: ``weasel structure.mol2 -settingsfile /usr/home/configs/settings cluster1.ini`` :description: Adding the ``-settingsfile`` keyword together with the filename defines a second configuration file that should be read by WEASEL. The file should be available at the specified location. .. important:: The project-specific defaults overwrite the defaults from settings.ini. Workflow settings :keyword: ``weasel structure.mol2 -workflow UVVis`` :description: Adding the ``-workflow`` keyword together with a workflow type. The argument ``UVVis`` expects the config file ``workflow_UVVis.ini`` in the ``weasel-/workflows`` directory. .. important:: The workflow settings overwrite the project-specific settings and the defaults from settings.ini. I.e. the priority is as follows: workflow settings > project-specific settings > default settings. Interplay WEASEL and Open MPI ............................. WEASEL calls ORCA during runtime which relies on Open MPI for parallel computations. Starting from 1.9, WEASEL comes with Open MPI libraries included which suffices for single node calculations and multi-node calculations on commodity hardware. However, for high performance clusters, e.g. with Infiniband networks, we recommend to use a customised Open MPI installation which can be configured either * **Via default settings**:: [SOFTWARE] MPI_PATH = /your/path/to/openmpi-X.Y.Z .. note:: The lib and bin subdirectories of openmpi are assumed to be subdirectories of MPI_PATH. * **Via environment variable**: The location of the ORCA executables and openmpi library are defined via environment variables, e.g. in a bash environment: .. prompt:: bash $ export MPI_PATH = /your/path/to/openmpi-X.Y.Z .. important:: The definitions via environment variables overwrite those via the settings files. .. note:: Make sure to use the latest bugfix release of Open MPI 4.1 and include Fortran support. The configure script will enable Fortran support automatically if ``gfortran`` is installed. Interplay WEASEL and xTB ......................... WEASEL relies on xTB to speed up calculations. But xTB can be quite ressource hungry in terms of stack size, especially for large molecules. To avoid stack overflows WEASEL will try to lift any stack size limit by default, but this will fail when the preset hard limit on the used hardware is not already set to infinity. In this case a warning will be printed. To raise the limit manually (in case WEASEL is not able to) the following command has to be called just before starting WEASEL: .. prompt:: bash $ ulimit -s unlimited To permanently unset the stack size limit the command has to be added to the shell configuration file (e.g. :code:`.bashrc`). .. important:: Depending on how the used computer system is configured, the stack size limit may not be raised without elevated rights. .. note:: The stack size limitation should also be raised for calculations that utilize CREST (such as a conformer search), for CREST relies on xTB as well. Allocating ressources ..................... There are multiple options to define the number of cores and the RAM used by WEASEL. * **Defaults** The number of cores and the maximum RAM to be used are defined in settings.ini:: [HARDWARE] Memory = 2000 # in MB per core Cores = 8 * **Command line** The default number of cores and RAM reserved for the job can be overwritten via the command line: .. prompt:: bash $ weasel structure.mol2 -cores 12 -mem 8000 which requests 8000 MB RAM per core. Alternatively the RAM available for the entire calculation can be requested as: .. prompt:: bash $ weasel structure.mol2 -cores 12 -mem-total 96000 .. important:: The definitions via the command line overwrite those via the settings files. Required files at WEASEL start .............................. If the job is run on a remote machine, all files that are required by the calculation need to be transferred. The file that always needs to be present is the structure file, provided as the only positional argument to the weasel command. Depending on the arguments, other files might need to be transferred. Below, a list of such arguments together with example filenames (structure files with the extensions xyz, allxyz, pdb, sdf or mol2) is provided. Note that the arguments are case sensitive. +---------------------+----------------+ | WEASEL keyword | example | +=====================+================+ | positional argument | structure.mol2 | +---------------------+----------------+ | -product | HNC.xyz | +---------------------+----------------+ | -tsguess | HNC_TS.xyz | +---------------------+----------------+ | -hg-led-addguest | guest.xyz | +---------------------+----------------+ .. note:: The structure files can also have other extensions, like e.g. allxyz, pdb, sdf or mol2. After WEASEL run ................ The directory structure after a WEASEL run might look as follows:: . ├── aspirine.xyz └── aspirine_UVVis    ├── aspirine_PreOpt.xyz    ├── aspirine_Opt.xyz    ├── aspirine_UVVis.avogadro.out    ├── aspirine_UVVis_peaks.txt    ├── aspirine_UVVis.report    ├── aspirine_UVVis.summary    ├── PreOpt    │   ├── aspirine_PreOpt.out.bz2    │   ├── aspirine_PreOpt.xtbopt.xyz    │   └── aspirine_PreOpt.xyz    ├── Opt    │   ├── aspirine_Opt.gbw.bz2    │   ├── aspirine_Opt.inp    │   ├── aspirine_Opt.out.bz2    │   ├── aspirine_Opt_trj.xyz    │   └── aspirine_Opt.xyz    └── SP_DFT    ├── aspirine_SP_DFT.gbw.bz2    ├── aspirine_SP_DFT.inp    └── aspirine_SP_DFT.out.bz2 After job completion all files and subdirectories can be copied back. * **Mainjob directory** The mainjob directory is a direct subdirectory to the CWD. By default, the name of the mainjob directory corresponds to the stem of the structure input file plus command-line specific labels, in this example *aspirine_UVVis*. In this directory, all of the important simulation results: the optimized structure files, a report and a summary file are stored. .. note:: The default mainjob name can be extended by labels using the -label argument. This can be useful if you want to carry out multiple simulations on a single structure input but using different solvents. * **Task directories** Subdirectories to the mainjob directory contain further results from the individual ORCA runs. .. note:: WEASEL can automatically compress the large ORCA result files when setting the default in setting.ini to:: [OUTPUT] CompressionMethod = bzip2 # alternatively use None Additional result path ...................... Report and summary files can be written to a second path during runtime: .. prompt:: bash $ weasel structure.mol2 -report-dir /SECOND/PATH .. note:: This can be useful when the job outcome should be observed on a local machine, but the job is running on a remote machine. Failed calculations ................... In emergency cases, i.e. * the structure file given by the positional argument is not present, or * one of the arguments to WEASEL is not correct, WEASEL stores an error file and a report file in the CWD:: . ├── aspirine.xyz ├── aspirine.error └── aspirine.report Pitfalls ........ * **Special network topologies** In case of multiple network backends, Open MPI usually tries to figure out on its own which interface to use for inter-node communication. This might be different from the hostnames in the *hostfile* or from those given by the queuing system. However, Open MPI allows to pin devices or IP ranges using environmental variables which Weasel and Orca will pass through, e.g. :: OMPI_MCA_btl_tcp_if_include=lo,enp7s0 OMPI_MCA_btl_tcp_if_include=127.0.0.0/8,10.0.0.0/16 OMPI_MCA_btl_tcp_if_exclude=eth0 .. important:: Also see subsection 'Missing environmental variables' to ensure that these variables are also set on slave machines. .. note:: For details, we refer to the official `Open MPI documentation`_. .. _Open MPI documentation: https://www.open-mpi.org/faq/?category=tcp#tcp-selection * **Missing environmental variables** WEASEL's parallelization partly relies on a set of global enviromental variables, that need to be set on every machine. Depending on how WEASEL is configure it can occur, that some environmental variables are not set correctly on slave nodes. To avoid this issue, WEASEL has the :code:`ENV_VARS` option in the settings files, that ensures that the listed environmental variables will be set on every node:: [SOFTWARE] # Optional: List of environmental variable names, that are ensured to be set on every node by Weasel. Listed variables # have to be set on the head node, otherwise they are ignored. # (e.g. ENV_VARS=OMPI_MCA_btl_tcp_if_include,OMPI_MCA_btl_tcp_if_exclude,OMPI_MCA_btl) ENV_VARS= This option takes a list of enviromental variable names and WEASEL copies their values from the head node. .. note:: Listed variables, that are not ot set, will be ignored.