Input File

Note

This documentation is automatically generated from the YAML template file.

This file defines a set of fields which are then read and used by the workflow. Every field is explained and example values are provided.

Project metadata

The following metadata has no effect on the workflow itself, but it will be written to the output metadata file. These values will be uploaded to the database and then exposed in the project overview. They may be also useful to search this project using the browser.

name

Write a brief description or title for this trajectory for the overview page. This name may be used by the client to search the trajectory in the database. The name is displayed in the overview page.

Default: My project

description

Write additional comments about the project. The description is displayed in the overview page.

description: My project description

authors

Write author names. Authors are displayed in the overview page.

Warning

Check already existing values for this field to avoid making duplicates.

authors:
  - My name
  - My partner's name

groups

Write author group name/s. The group is displayed in the overview page.

Warning

Check already existing values for this field to avoid making duplicates.

groups:
  - My group
  - My partner's group

contact

How to contact the authors (e.g. a mail). The contact is displayed in the overview page.

program

Program (software) name which carried the trajectory and its version. Program and version are both displayed in the overview page.

Warning

Check already existing values for this field to avoid making duplicates.

program: GROMACS

version

version: 2024.2

type

Type of molecular dynamics. At this moment there are only two options in this field: trajectory and ensemble.

Note

This field has an effect on the client: some time-dependent analysis will change the labels of their axes in order to make sense. For instance RMSD X axis will be labeled as frames instead of time.

type: trajectory

method

MD method is displayed in the overview. e.g. Classical MD, Targeted MD, Biased MD (Accelerated Weighted Ensemble), Enhanced sampling (Hamiltonian Replica Exchange) …

method: Classical MD

license

License and link to the license web page. The license is displayed in the overview page. Under the license there is a More information button. The link is used to redirect the user when this button is clicked.

license: This trajectory dataset is released under a Creative Commons Attribution 4.0 International Public License

linkcense

linkcense: https://creativecommons.org/licenses/by/4.0/

citation

Citation for refering this simulation or related paper. The citation is displayed in the overview page. To set a citation use the following instructions: to add a line break type ‘(br)’ inside the citation string, to add superior text type ‘^’ before each character.

thanks

Acknowledgements to be shown in the overview page.

thanks: I would like to thank my funders...

accession

OPTIONAL: Use this field to force a custom accession. Accession is a short code to refer this project in your local database. If no accession is forced then a default formatted accession will be generated when loading the project.

References

References to other databases to enrich our data.

pdb_ids

Set the source pdb ids of the trajectory structure Additional data from the pdb is harvested by the loader while uploading to the database This data is displayed in the overview page

pdb_ids:
  - 6ACS
  - 6M0J

forced_references

Set which reference sequences must be used in order to map residues in the structure of the simulation. UniProt accession ids are accepted. Forced references may be not provided or just cover the structure partially. Then a blast will be run for each orphan chain sequence. In addition, UniProt accession ids may be guessed from the PDB ids, when provided.

Forced references may be provided as a list. In this scenario UniProt sequences are aligned to chain sequences to guess which UniProt belongs to each chain. Forced references may be provided as a dictionary. Then the user specifies which reference belongs to each chain. Use the “noref” flag to mark a chain as “no referable” (e.g. antibodies, synthetic constructs).

forced_references:
  - Q9BYF1
  - P0DTC2
forced_references:
  A: Q9BYF1
  B: P0DTC2
  C: noref

ligands

Set ligands in the simulation. The workflow identifies ligands by their pubchem accession. If a pubchem accession is passed then it is used. Otherwise, the pubchem accession is found using other database accessions. Each ligand must have at least one of the following attributes:

  • pubchem: the PubChem accession

  • drugbank: the DrugBank accession

  • chembl: the ChEMBL accession

Optionally, a list of vmd selections may be provided to force the mapping

  • vmd_selection: a list of vmd selections (chain D)

Ligands are mapped in the standard topology file. In addition, an RMSD analysis is run for every defined ligand.

ligands:
  - pubchem: 1986
  - drugbank: DB00945

Simulation metadata

Simulation parameters.

Tip

Someday this will be automatically mined.

framestep

Time framestep in nanoseconds (ns). Time between each snapshot/frame. May be None if this is not a trajectory, but an ensemble. Framestep is an important value since it is used in many graph axes in the web client.

framestep: 0.01 # ns

timestep

The rest of values are displayed in the web client as trajectory metadata. These values do not affect other outcomes in the workflow. Simulation timestep in femtoseconds (fs). Time step used for the numerical integration of the simulation.

timestep: 2 # fs

temp

Temperature in Kelvin (K).

temp: 310 # K

ensemble

Ensemble e.g. NVT, NPT, etc.

Warning

Check already existing values for this field to avoid making duplicates.

ensemble: NPT

ff

Force fields.

Warning

Check already existing values for this field to avoid making duplicates.

ff:
  - Amber ff14SB
  - GLYCAM-06j

wat

Water force fields.

Warning

Check already existing values for this field to avoid making duplicates.

wat: TIP3P

boxtype

Boxtype e.g. Triclinic, Cubic, Dodecahedron.

Warning

Check already existing values for this field to avoid making duplicates.

Analysis parameters

These fields have an impact in the analysis workflow.

interactions

Set which are the interesting interactions to be analyzed. A bunch of interaction-specific analyses will be run for each interaction and displayed in the web client.

Interactions are defined by the ‘agents’ which are meant to interact pairwise. An ‘agent’ may be anything, even a group of unrelated molecules. Atoms of different agents which are close enought will be considered as interface atoms. These atoms will be the ones considered in interface analyses. If no interface atoms are found then the interaction is considered not valid and the user is warned.

Interactions are uploaded to the database as part of the project metadata and as an independent analysis. Project metadata includes the interaction name, agents name and agent atom selections (VMD syntax). Analysis data includes also every agent atom indices (both the whole agent and the interface only).

Each interaction has the following attributes:

  • name: a string tag used to relate interaction analyses data with their corresponding atoms. In addition, the name is used to label the corresponding analyses in the web client.

  • agent_1: the name of the first agent in the interaction, which is used to label in the client.

  • selection_1: the VMD selection of the first agent in the interaction.

  • agent_2: the name of the second agent in the interaction, which is used to label in the client.

  • selection_2: the VMD selection of the second agent in the interaction.

  • distance_cutoff (optional): the distance used to determine which atoms are in the interface (in Å).

The default value is intended for atomistic simulations. Thus coarse grain interactions may need manual input distance cutoff.

interactions:
  - name: protein-ligand interaction
  agent_1: protein
  selection_1: not resname lig
  agent_2: ligand
  selection_2: resname lig
  - name: domain-domain interaction
  agent_1: domain 1
  selection_1: resid 2 to 291
  agent_2: domain 2
  selection_2: resid 2 to 291
  - name: dna-dna hybridization
  agent_1: strain A
  selection_1: chain A
  agent_2: strain B
  selection_2: chain B
  distance_cutoff: 10

pbc_selection

Set those residues which are under periodic boundary conditions (PBC). These residues are excluded from the imaging centering and fitting. These residues are excluded in the follwoing analyses:

  • RMSD: Sudden jumps in PBC residues result in non-sense high peaks

  • RMSD per residue: Sudden jumps in PBC residues result in non-sense high peaks

  • RMSD pairwise: Sudden jumps in PBC residues result in non-sense high peaks

  • TM score: Sudden jumps in PBC residues result in non-sense high peaks

  • RGYR: Sudden jumps in PBC residues result in non-sense high changes

  • RMSF: Sudden jumps in PBC residues result in non-sense high peaks

  • PCA: Sudden jumps make not sense in PCA and they eclipse non-PBC movements

  • SASA: Residues close to the boundary will be considered exposed to solvent while they may be not

  • Pockets: Residues close to the boundary may be considered to have pockets while they have not.

Tip

This isn’t really possible because fpocket doesn’t allow you to intelligently “discard” atoms. If you remove atoms so it doesn’t find pockets in them then pockets can appear in the sites occupied by those atoms. For now, we discard the entire analysis when there’s something in PBC and that’s it.

  • Clusters: Since Clustering is RMSD-based it has the same limitations

If this field is set to ‘auto’ then PBC residues are set automatically. Solvent, counter ions and membrane lipids are selected in this cases. Note that these are the most tipical residues under periodic boundary conditions.

This field is also useful for those scenarions with several protein or nucleic acid molecules floating around. In this situation you can not image and fit all molecules. You must focus in one molecule and let the others stay in periodic boundary conditions.

These residues are defined using VMD selection language.

pbc_selection: water or ions

Default: auto

cg_selection

Set those atoms which are not actual “atoms” but coarse grained (CG) beads.

Warning

EXPERIMENTAL INPUT

dummy_selection

Set those atoms which are not real atoms but dummy atoms thus representing other entities. These dummy atoms are used in some methodologies such as alchemical bonds

Warning

EXPERIMENTAL INPUT

Default: auto

Representation parameters

These fields have an impact in the display of the simulation once in the web client

chainnames

Set optional custom chain names which may be longer than a single letter. This names are used to label chains in the web client.

chainnames:
  A: Protein
  B: Ligand

customs

The web client sets some default representations (molecular viewer configurations). They highlight important features in the structure according to the topology reference or interactions. In addition, you may set extra customized representations which are interesting for you. These representations will be available in the web client.

Warning

Make sure whatever you want to represent is not already represented by default or it will be duplicated.

customs:
  - name: A custom view focusing on an interesting residue
  representations:
  - name: The very interesing residue
  selection: VIR
  type: ball+stick
  color: element
  - name: The resting boring molecule
  selection: not VIR
  type: cartoon
  color: chainid

orientation

Set a specific starting orientation for the web client viewer. Normally this is done once the simulation has been uploaded since there is no easy way to get the orientation before.

[
72.05997406618104,
21.871748915422142,
47.89720038949639,
0,
34.3234627961572,
42.053333152877315,
-70.84188126104011,
0,
-39.93012781662099,
75.61943426331311,
25.542927052994127,
0,
-63.015499114990234,
-33.07249975204468,
-39.439000606536865,
1
]

Others

Other metadata

multimeric

Set if we have any multimeric form: monomer, dimer, trimer… This field was requested by the referees. Its only use for now is as a parameter in project queries.

Tip

This is temporary, it would be best to automate it.

multimeric:
  - monomer
  - trimer

Collections

Set also additional collection related metadata values.

collections

Set to which collection does this simulation belong to. This allows to freely group simulations together The collection may also provide additional reference data further For instance a description of the whole dataset If you want to include this simulation in an already exsiting collection then use its id If you want to create a new “collection id” please use short ids with no weird signs e.g. cv19, model, abc, …

cv19_unit

BioExcel-CV19 specific metadata fields. Set which family does this trajectory belong to. Supported units:

  • RBD-ACE2

  • RBD

  • ACE2

  • Spike

  • 3CLpro

  • PLpro

  • Polymerase

  • E protein

  • Exoribonuclease

  • Other

cv19_startconf

Set some additional inputs requested by the referees.

  • Starting conformation of the spike (options: close, 1 open, 2 open, 3 open)

  • Are there antibodies? (e.g. true)

  • Are there nanobodies? (e.g. false)

cv19_abs

cv19_nanobs

Input files

Directories for every MD in the project. Input file paths for topology, trajectory and structure. Note that all these values may be specified thorugh command line as well.

Note

These files have been passed to the workflow through command line traditionally.

Note

Now the inputs file also provides this option.

input_topology_filepath

Each projects contain one and only one topology Exceptionally a structure may be used instead of a topology if the topology is missing To do so, set ‘input_topology_filepath: no’ and pass the input_structure_filepath argument

input_structure_filepath

The input structure file is optional (except when the topology is missing)

input_trajectory_filepaths

You can set input trajectory filepath generic for the whole project here. However it makes more sense setting this value separatedly for each MD.

mds

Each project may contain several Molecular Dynamics (MD). Each MD is to be stored in an independent folder when running the workflow. Each MD must have a different name, which will be used also to assign directories in the workflow.

MDs may include additional metadata to overwrite the project metadata for a specific case.

mds:
  - name: replica 310 K
  mdir: replica_1
  input_trajectory_filepaths: my_raw_trajectory_1.nc
  temp: 310
  - name: replica 311 K
  mdir: replica_2
  input_trajectory_filepaths: my_raw_trajectory_2.nc
  temp: 311

mdref

Also the reference MD is to be defined by providing the index of the MDs list. If there is not an MD which is more important than others then simply set the first MD (0) as the reference.

mdref: 0

dataset_path

Path to the dataset storage file for the Dataset object.