Workflow Tasks

This page documents the available tasks in the MDDB Workflow system. These tasks can be specified with the -i (include) or -e (exclude) flags.

Project Tasks

These tasks are executed once per project:

aiidata [source]: produce a provenance file containing AiiDA data adapted for our database
chains [source]: define the main function that will be called from the main script. This function will get the parsed chains from the structure and request the InterProScan service to obtain the data for each chain.
charges [source]: extract charges from a source file.
inchikeys [source]: generate a dictionary mapping InChI keys to residue information for non-standard residues. This function uses MDAnalysis to parse the input structure and topology files and identifies residues that are not classified as ‘ion’, ‘solvent’, ‘nucleic’, or ‘protein’. For each identified residue, it converts the structure to RDKit format to obtain the InChI key and InChI string. The resulting data is stored in dictionaries to map InChI keys to residue details and residue names to InChI keys. PDB coordinates are necessary to distinguish stereoisomers.
inchimap [source]: generate InChI references for the database.
inputs [source]: set a function to load the inputs yaml file.
itopology [source]: get the input topology file. If the file is not found try to download it.
ligmap [source]: generate a map of residues associated to ligands. This function identifies and maps ligands in the molecular structure through a multi-step matching process: 1. Direct InChIKey matching: Extracts InChIKeys from structure fragments (excluding lipids and membrane components) and attempts direct matching with PubChem database. 2. Chemical similarity matching: If direct matching fails, progressively modifies the molecular structure and calculates Tanimoto coefficient (TC) for similarity assessment: 1. Neutralize charges 2. Remove stereochemistry information 3. Apply PubChem standardization (tautomer, protonation, etc.) 4. Match against PDB-derived ligands (TC threshold ≥ 0.9) 5. Perform similarity search in PubChem/ChEMBL (TC threshold ≥ 0.9) 3. Fallback handling: Unmatched ligands are saved as-is with warnings. 4. User-forced selections: Respects user-specified ligand selections from inputs.yaml, with warnings if TC compared to original fragment is insufficient.
lipmap [source]: add lipid-specific database information to InChIKeyData objects. This function queries SwissLipids and LIPID MAPS databases for each InChI key and adds the results directly to the InChIKeyData objects. It also performs quality checks on lipid classifications.
memmap [source]: generate a list of residue numbers of membrane components from a given structure and topology file.
pdbs [source]: prepare the PDB references json file to be uploaded to the database.
pmeta [source]: prepare a JSON file with all project metadata.
populations [source]: get the MSM equilibrium populations file.
protmap [source]: map the structure aminoacids sequences against the Uniprot reference sequences.
refbonds [source]: find reference safe bonds in the system. First try to mine bonds from a topology files. If the mining fails then search for the most stable bonds. If we trust in stable bonds then simply return the structure bonds.
resmap [source]: build the residue map from both proteins and ligands maps. This is formatted as both the standard topology and metadata generators expect them.
screenshot [source]: project screenshot is usually made with the reference frame, thus using the main structure. However the user may pass a custom frame.
stopology [source]: prepare the standard topology file to be uploaded to the database.
topology [source]: get the processed topology from the reference MD.
transitions [source]: get the MSM transition probabilities file.

MD Tasks

These tasks are executed for each MD in the project:

Files

inpro [source]: process input files to generate the processed files. This process corrects and standarizes the topology, the trajectory and the structure.
istructure [source]: get the input pdb filename from the inputs. If the file is not found try to download it.
itrajectory [source]: get the input trajectory filename(s) from the inputs. If file(s) are not found try to download it.
structure [source]:
trajectory [source]:

Analyses

apl [source]: area per lipid analysis.
average [source]: get an average structure from a trajectory. This process is carried by pytraj, since the Gromacs average may be displaced.
channels [source]: analyze channels in a membrane protein using MDAnalysis mda_hole.
clusters [source]: run the cluster analysis.
density [source]: membrane density analysis.
dihedrals [source]: calculate torsions and then dihedral energies for every dihedral along the trajectory.
dist [source]: calculate the distance mean and standard deviation of each pair of residues of different agents. Note that the distances are calculated for all residues in the agent, not only the interface residues.
energies [source]: perform the electrostatic and vdw energies analysis for each pair of interaction agents.
firstframe [source]: get the trajectory first frame in PDB format using Gromacs.
frames [source]: get the trajectory frames count.
hbonds [source]: perform an hydrogen bonds analysis for each interaction interface. The ‘interactions’ input may be an empty list (i.e. there are no interactions). In case there are no interactions the analysis stops. Note that this analysis find hydrogen bonds in a subset of frames along the trajectory. Storing the results for the whole trajectory is not possible due to storage limits.
helical [source]: helical parameters analysis.
inter [source]: find the residues of each interacting agent. It can automatically detect interactions based on chain names or ligand information, or use a predefined list of interactions.
linter [source]: lipid-protein interactions analysis.
lorder [source]: calculate lipid order parameters for membranes. This function computes the order parameter (S) for lipid acyl chains, defined as: S = 0.5*(3*<cos²θ> - 1) where θ is the angle between the C-H bond and the membrane normal (z-axis).
markov [source]: set the data needed to represent a Markov State Model graph in the client. This is finding the most populated frames and calculating an RMSD matrix between these frames.
mda_univ [source]: create a MDAnalysis universe using data in the workflow.
mdmeta [source]: produce the MD metadata file to be uploaded to the database.
pairwise [source]: perform an analysis for the overall structure and then one more analysis for each interaction.
pca [source]: perform a PCA analysis on the trajectory.
perres [source]: perform the RMSD analysis for each residue.
pockets [source]: perform the pockets analysis.
reframe [source]: return a reference frame number where all bonds are exactly as they should (by VMD standards). This is the frame used when representing the MD.
rgyr [source]: perform the RMSd analysis. Use the first trajectory frame in .pdb format as a reference.
rmsds [source]: run multiple RMSD analyses. One with each reference (first frame, average structure) and each selection (default: protein, nucleic).
rmsf [source]: perform the fluctuation analysis.
sas [source]: perform the Solvent Accessible Surface Analysis.
thickness [source]: membrane thickness analysis.
tmscore [source]: perform the tm score using the tmscoring package.

Task Groups

These are predefined groups of tasks that can be specified with a single flag.

download (itopology, inputs, populations, transitions, istructure, itrajectory): download missing input files (already ran with analyses).
setup (topology, structure, trajectory): process and test input files (already ran with analyses).
meta (pmeta, mdmeta): run project and md metadata analyses.
network (resmap, ligmap, lipmap, chains, pdbs, memmap): run dependencies which require internet connection.
minimal (pmeta, mdmeta, stopology): run dependencies required by the web client to work.
interdeps (inter, pairwise, hbonds, energies, perres, clusters, dist): run interactions and all its dependent analyses.
membs (memmap, density, thickness, apl, lorder, linter): run all membrane-related analyses.