Detailed installation instructions
The following information is intended for system administrators, e.g. for cases in which WEASEL is set up to run on a remote cluster within a queuing system.
For how to identify the ressources that need to be allocated, see Allocating ressources.
For how to identify the files that need to be transfered to the remote machine, see Required files at WEASEL start.
Requirements
Software requirements
WEASEL is distributed as a self-contained archive including all libraries to run. In this archive resides a directory (e.g.
weasel-1.11) with the main executable (weasel),README.txt, sample config files and all libraries and test files in subdirectories:weasel-1.11 ├── orca ├── weasel ├── README.md ├── changelog.md ├── LICENSE.txt ├── 3rd-party-licenses │ └── License files for bundled 3rd party libraries ├── examples │ └── Test files (molecules) ├── references │ └── Reference molecules for NMR shift calculations ├── settings │ ├── defaults.ini │ └── minimal-1.11.ini.sample ├── workflows │ └── workflow settings files └── lib └── Libraries and data files
Hardware requirements
While WEASEL has only low hardware requirements, but WEASEL calls ORCA during runtime, so its hardware requirements are mainly determined by the ORCA requirements, see separate instructions.
Running WEASEL
A simple WEASEL job is run directly from the command line:
weasel structure.mol2 [-KEY1] [-KEY2] ...
In its simplest form it only requires a structure file, here structure.mol2.
The settings that define what is done in a WEASEL job are defined next.
Default settings
- Default settings
- file:
/etc/weasel/settings-<version>.ini,~/.config/weasel/settings-<version>.ini- description:
Configuration file in which WEASEL looks for default settings. Usually it contains only minimal mandory information for WEASEL to run (see also here) like installation paths to software installations (e.g. ORCA) and hardware settings (CPUs, memory).
For system-wide installations the system administrator should adjust
weasel-<version>/settings/minimal-settings-<version>.ini.sampleand install it in/etc/weasel/settings-<version>.ini. This sample file just contains the minimal set of variable which WEASEL requires to run. The path of the system settings file/etc/weasel/settings-<version>.inican be customized by the environment variableWEASEL_SYSTEM_SETTINGS.Although normally not required, the experienced user finds an exhaustive set of settings with WEASEL's built-in defaults and explanatory comments in
weasel-<version>/settings/defaults.ini.
Important
In a cluster environment with shared home directories, the use of
~/.config/weasel/settings-<version>.ini is discouraged since installation
paths and hardware configurations might differ from node to node.
- Project-specific defaults
- keyword:
weasel structure.mol2 -settingsfile /usr/home/configs/settings cluster1.ini- description:
Adding the
-settingsfilekeyword together with the filename defines a second configuration file that should be read by WEASEL. The file should be available at the specified location.
Important
The project-specific defaults overwrite the defaults from settings.ini.
- Workflow settings
- keyword:
weasel structure.mol2 -workflow UVVis- description:
Adding the
-workflowkeyword together with a workflow type. The argumentUVVisexpects the config fileworkflow_UVVis.iniin theweasel-<version>/workflowsdirectory.
Important
The workflow settings overwrite the project-specific settings and the defaults from settings.ini. I.e. the priority is as follows: workflow settings > project-specific settings > default settings.
Interplay WEASEL and Open MPI
WEASEL calls ORCA during runtime which relies on Open MPI for parallel computations. Starting from 1.9, WEASEL comes with Open MPI libraries included which suffices for single node calculations and multi-node calculations on commodity hardware.
However, for high performance clusters, e.g. with Infiniband networks, we recommend to use a customised Open MPI installation which can be configured either
Via default settings:
[SOFTWARE] MPI_PATH = /your/path/to/openmpi-X.Y.Z
Note
The lib and bin subdirectories of openmpi are assumed to be subdirectories of MPI_PATH.
Via environment variable:
The location of the ORCA executables and openmpi library are defined via environment variables, e.g. in a bash environment:
export MPI_PATH = /your/path/to/openmpi-X.Y.Z
Important
The definitions via environment variables overwrite those via the settings files.
Note
Make sure to use the latest bugfix release of Open MPI 4.1 and
include Fortran support. The configure script will enable Fortran
support automatically if gfortran is installed.
Interplay WEASEL and xTB
WEASEL relies on xTB to speed up calculations. But xTB can be quite ressource hungry in terms of stack size, especially for large molecules. To avoid stack overflows WEASEL will try to lift any stack size limit by default, but this will fail when the preset hard limit on the used hardware is not already set to infinity. In this case a warning will be printed.
To raise the limit manually (in case WEASEL is not able to) the following command has to be called just before starting WEASEL:
ulimit -s unlimited
To permanently unset the stack size limit the command has to be added to the shell
configuration file (e.g. .bashrc).
Important
Depending on how the used computer system is configured, the stack size limit may not be raised without elevated rights.
Note
The stack size limitation should also be raised for calculations that utilize CREST (such as a conformer search), for CREST relies on xTB as well.
Allocating ressources
There are multiple options to define the number of cores and the RAM used by WEASEL.
Defaults The number of cores and the maximum RAM to be used are defined in settings.ini:
[HARDWARE] Memory = 2000 # in MB per core Cores = 8
Command line The default number of cores and RAM reserved for the job can be overwritten via the command line:
weasel structure.mol2 -cores 12 -mem 8000
which requests 8000 MB RAM per core. Alternatively the RAM available for the entire calculation can be requested as:
weasel structure.mol2 -cores 12 -mem-total 96000
Important
The definitions via the command line overwrite those via the settings files.
Required files at WEASEL start
If the job is run on a remote machine, all files that are required by the calculation need to be transferred.
The file that always needs to be present is the structure file, provided as the only positional argument to the weasel command.
Depending on the arguments, other files might need to be transferred. Below, a list of such arguments together with example filenames (structure files with the extensions xyz, allxyz, pdb, sdf or mol2) is provided. Note that the arguments are case sensitive.
WEASEL keyword |
example |
|---|---|
positional argument |
structure.mol2 |
-product |
HNC.xyz |
-tsguess |
HNC_TS.xyz |
-hg-led-addguest |
guest.xyz |
Note
The structure files can also have other extensions, like e.g. allxyz, pdb, sdf or mol2.
After WEASEL run
The directory structure after a WEASEL run might look as follows:
.
├── aspirine.xyz
└── aspirine_UVVis
├── aspirine_PreOpt.xyz
├── aspirine_Opt.xyz
├── aspirine_UVVis.avogadro.out
├── aspirine_UVVis_peaks.txt
├── aspirine_UVVis.report
├── aspirine_UVVis.summary
├── PreOpt
│ ├── aspirine_PreOpt.out.bz2
│ ├── aspirine_PreOpt.xtbopt.xyz
│ └── aspirine_PreOpt.xyz
├── Opt
│ ├── aspirine_Opt.gbw.bz2
│ ├── aspirine_Opt.inp
│ ├── aspirine_Opt.out.bz2
│ ├── aspirine_Opt_trj.xyz
│ └── aspirine_Opt.xyz
└── SP_DFT
├── aspirine_SP_DFT.gbw.bz2
├── aspirine_SP_DFT.inp
└── aspirine_SP_DFT.out.bz2
After job completion all files and subdirectories can be copied back.
Mainjob directory
The mainjob directory is a direct subdirectory to the CWD. By default, the name of the mainjob directory corresponds to the stem of the structure input file plus command-line specific labels, in this example aspirine_UVVis. In this directory, all of the important simulation results: the optimized structure files, a report and a summary file are stored.
Note
The default mainjob name can be extended by labels using the -label argument. This can be useful if you want to carry out multiple simulations on a single structure input but using different solvents.
Task directories
Subdirectories to the mainjob directory contain further results from the individual ORCA runs.
Note
WEASEL can automatically compress the large ORCA result files when setting the default in setting.ini to:
[OUTPUT]
CompressionMethod = bzip2 # alternatively use None
Additional result path
Report and summary files can be written to a second path during runtime:
weasel structure.mol2 -report-dir /SECOND/PATH
Note
This can be useful when the job outcome should be observed on a local machine, but the job is running on a remote machine.
Failed calculations
In emergency cases, i.e.
the structure file given by the positional argument is not present, or
one of the arguments to WEASEL is not correct,
WEASEL stores an error file and a report file in the CWD:
.
├── aspirine.xyz
├── aspirine.error
└── aspirine.report
Pitfalls
Special network topologies
In case of multiple network backends, Open MPI usually tries to figure out on its own which interface to use for inter-node communication. This might be different from the hostnames in the hostfile or from those given by the queuing system.
However, Open MPI allows to pin devices or IP ranges using environmental variables which Weasel and Orca will pass through, e.g.
OMPI_MCA_btl_tcp_if_include=lo,enp7s0 OMPI_MCA_btl_tcp_if_include=127.0.0.0/8,10.0.0.0/16 OMPI_MCA_btl_tcp_if_exclude=eth0
Important
Also see subsection 'Missing environmental variables' to ensure that these variables are also set on slave machines.
Note
For details, we refer to the official Open MPI documentation.
Missing environmental variables
WEASEL's parallelization partly relies on a set of global enviromental variables, that need to be set on every machine. Depending on how WEASEL is configure it can occur, that some environmental variables are not set correctly on slave nodes. To avoid this issue, WEASEL has the
ENV_VARSoption in the settings files, that ensures that the listed environmental variables will be set on every node:[SOFTWARE] # Optional: List of environmental variable names, that are ensured to be set on every node by Weasel. Listed variables # have to be set on the head node, otherwise they are ignored. # (e.g. ENV_VARS=OMPI_MCA_btl_tcp_if_include,OMPI_MCA_btl_tcp_if_exclude,OMPI_MCA_btl) ENV_VARS=
This option takes a list of enviromental variable names and WEASEL copies their values from the head node.
Note
Listed variables, that are not ot set, will be ignored.