Dataset Example
File Structure
[13]:
# dataset_dir = '/home/rchaves/ssh_dirs/mn5/res/others/agus_MoDeL-CNS'
# dataset_dir = '/home/rchaves/ssh_dirs/irbcluster/scratch/model-cns/'
# dataset_dir = '/home/rchaves/repo/MDDB/workflow/test/data/input/dataset/'
dataset_dir = '/home/rchaves/ssh_dirs/mn5/ruben/model/'
# YAML use for the configuration of the dataset and the automatic inputs.yaml generation.
# project_directories outside the dataset directory are not allowed
dataset_yaml_path = dataset_dir + "dataset.yaml"
print(dataset_yaml_path)
!cat {dataset_yaml_path}
/home/rchaves/ssh_dirs/mn5/ruben/model/dataset.yaml
projects:
- '*' # (matches dirs starting with a digit and all their subfolders)
ignore:
- scripts # (ignore 'scripts' folders, it applies after resolving project directories)
[14]:
inputs_template = dataset_dir + "inputs_template.yaml"
print(inputs_template)
!cat {inputs_template}
/home/rchaves/ssh_dirs/mn5/ruben/model/inputs_template.yaml
name: "{{ title }} ({{ DIR }})"
description: "{{ title }} (1 ms)"
authors: Agustín García
groups: IRB Barcelona, Orozco lab
citation: null
thanks: null
contact: agustin.garcia@irbbarcelona.org
type: trajectory
program: GROMACS
version: 2025.2
license: This trajectory dataset is released under a Creative Commons Attribution 4.0 International Public License
linkcense: tps://creativecommons.org/licenses/by/4.0/
method: Classical MD
accession: null
links:
- name: Structural data source
url: https://memprotmd.bioch.ox.ac.uk/_ref/PDB/{{ DIR }}
pdb_ids:
- {{ DIR }}
forced_references: null
framestep: 0.01
timestep: 2
ensemble: NPT
ff: 53A6 GROMOS
wat: TIP3P
boxtype: Cubic
mds:
- mdir: replica_1
mdref: 0
interactions: null
pbc_selection: auto
collections: mcns
chainnames: null
membranes: null
customs: null
multimeric: null
trjType: large
bucket: 8d3eha
temp: 310
ligands: null
[15]:
job_template = dataset_dir + "job_template.sh"
print(job_template)
!cat {job_template}
/home/rchaves/ssh_dirs/mn5/ruben/model/job_template.sh
#!/bin/bash
#SBATCH --account=irb95
#SBATCH --job-name={{DIR}}_mddb
#SBATCH --output=mwf_%j.out
#SBATCH --error=mwf_%j.err
#SBATCH --qos=gp_resc
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mail-type=END,FAIL
#SBATCH --time=24:00:00
#SBATCH --mail-user=ruben.chaves@irbbarcelona.org
set -x
module purge
module load singularity
# 1) Setup in MN5
#singularity run -H $PWD -C ../mddb_wf.sif mwf run -top res.tpr -md replica_1 fitted.xtc -i setup -m stabonds intrajrity
#singularity run -H $PWD -C ../mddb_wf.sif mwf run -i inchikeys reframe -ow
# 2) Referencias con sshfs
#rm ./replica_1/.trajectory.xtc_offsets.lock
#mwf run -i network meta stopology -m stabonds intrajrity
#mwf run -i protmap -m stabonds intrajrity
# 3) Resto en MN5
singularity run -H $PWD -C ../mddb_wf.sif mwf run -e network protmap clusters -m stabonds intrajrity
Python
[16]:
%load_ext autoreload
%autoreload 2
from mddb_workflow.core.dataset import Dataset
dt = Dataset(dataset_yaml_path)
# Print the project directories to verify they are correct
dt.project_directories[:5]
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
[16]:
['/home/rchaves/ssh_dirs/mn5/ruben/model/6kuy',
'/home/rchaves/ssh_dirs/mn5/ruben/model/7cmu',
'/home/rchaves/ssh_dirs/mn5/ruben/model/6wjc',
'/home/rchaves/ssh_dirs/mn5/ruben/model/7e2y',
'/home/rchaves/ssh_dirs/mn5/ruben/model/7e2z']
[17]:
# Information at dt.status
dt.display_status_with_links()
| state | message | last_modified | group | log_file | err_file | |
|---|---|---|---|---|---|---|
| rel_path | ||||||
| 6gdg | error | Running BioBB LiPyphilic ZPositions | 10:20:35 02/12/25 | 2 | mwf_33119397.out | mwf_33119397.err |
| 6j8h | error | Running BioBB LiPyphilic ZPositions | 10:37:14 02/12/25 | 2 | mwf_33119399.out | mwf_33119399.err |
| 6jzh | error | [92m-> Running task protmap (Protein residues mapping)[0m | 09:51:51 02/12/25 | 1 | mwf_33119391.out | mwf_33119391.err |
| 6k42 | error | Running BioBB LiPyphilic ZPositions | 10:20:54 02/12/25 | 2 | mwf_33119398.out | mwf_33119398.err |
| 6kux | error | [92m-> Running task linter (Membrane lipid-protein interactions analysis)[0m | 10:11:12 02/12/25 | 0 | mwf_33119404.out | mwf_33119404.err |
| 6kuy | error | [92m-> Running task linter (Membrane lipid-protein interactions analysis)[0m | 10:11:23 02/12/25 | 0 | mwf_33119386.out | mwf_33119386.err |
| 6ni3 | error | Running BioBB LiPyphilic ZPositions | 10:20:39 02/12/25 | 2 | mwf_33119394.out | mwf_33119394.err |
| 6nt3 | error | [92m-> Running task protmap (Protein residues mapping)[0m | 09:51:53 02/12/25 | 1 | mwf_33119395.out | mwf_33119395.err |
| 6oik | error | [92m-> Running task linter (Membrane lipid-protein interactions analysis)[0m | 10:20:27 02/12/25 | 0 | mwf_33119396.out | mwf_33119396.err |
| 6ps5 | error | [92m-> Running task protmap (Protein residues mapping)[0m | 09:51:52 02/12/25 | 1 | mwf_33119403.out | mwf_33119403.err |
| 6ps7 | error | [92m-> Running task protmap (Protein residues mapping)[0m | 09:51:53 02/12/25 | 1 | mwf_33119405.out | mwf_33119405.err |
| 6qfa | error | [92m-> Running task linter (Membrane lipid-protein interactions analysis)[0m | 10:26:31 02/12/25 | 0 | mwf_33119400.out | mwf_33119400.err |
| 6wjc | error | [92m-> Running task linter (Membrane lipid-protein interactions analysis)[0m | 10:11:48 02/12/25 | 0 | mwf_33119388.out | mwf_33119388.err |
| 7bz2 | error | Running BioBB LiPyphilic ZPositions | 10:21:53 02/12/25 | 2 | mwf_33119393.out | mwf_33119393.err |
| 7cmu | error | Running BioBB LiPyphilic ZPositions | 10:22:54 02/12/25 | 2 | mwf_33119387.out | mwf_33119387.err |
| 7dhr | error | Running BioBB LiPyphilic ZPositions | 10:20:10 02/12/25 | 2 | mwf_33119402.out | mwf_33119402.err |
| 7dtd | error | Running BioBB LiPyphilic ZPositions | 10:40:51 02/12/25 | 2 | mwf_33119392.out | mwf_33119392.err |
| 7e2y | error | Running BioBB LiPyphilic ZPositions | 10:20:33 02/12/25 | 2 | mwf_33119389.out | mwf_33119389.err |
| 7e2z | error | Running BioBB LiPyphilic ZPositions | 10:20:45 02/12/25 | 2 | mwf_33119390.out | mwf_33119390.err |
| 7jvr | error | Running BioBB LiPyphilic ZPositions | 10:20:59 02/12/25 | 2 | mwf_33119401.out | mwf_33119401.err |
[46]:
import requests
def obtener_titulo(DIR):
url = f"https://data.rcsb.org/rest/v1/core/entry/{DIR}"
r = requests.get(url)
if r.status_code == 200:
data = r.json()
return data.get('struct', {}).get('title', '').strip()
else:
raise ValueError(f"No se pudo obtener título para PDB {DIR}")
dt.generate_inputs_yaml(inputs_template, obtener_titulo)
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6kuy/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/7cmu/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6wjc/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/7e2y/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/7e2z/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6jzh/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/7dtd/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/7bz2/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6ni3/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6nt3/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6ps7/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6oik/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6gdg/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6k42/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6j8h/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6qfa/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/7jvr/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/7dhr/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6ps5/inputs.yaml
Skipping existing /home/rchaves/ssh_dirs/mn5/ruben/model/6kux/inputs.yaml
[70]:
dt.show_groups()
[70]:
| message | count | |
|---|---|---|
| group | ||
| 0 | [92m-> Running task protmap (Protein residues... | 4 |
| 1 | InputError: You must provide a .jpg file name! | 16 |
[7]:
# To launch the workflow with SLURM
dt.launch_workflow(
#include_groups=[0],
slurm=True,
job_template=job_template,
debug=True,
)
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6kuy
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/7cmu
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6wjc
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/7e2y
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/7e2z
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6jzh
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/7dtd
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/7bz2
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6ni3
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6nt3
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6ps7
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6oik
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6gdg
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6k42
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6j8h
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6qfa
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/7jvr
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/7dhr
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6ps5
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6kux
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
Command Line
[ ]:
!mwf dataset groups {dataset_yaml_path}
Project groups based on status messages:
Group 0:
Message: Done!
Projects:
- 6gdg
- 6j8h
- 6jzh
- 6k42
- 6kux
- 6kuy
- 6ni3
- 6nt3
- 6oik
- 6ps5
- 6qfa
- 6wjc
- 7bz2
- 7cmu
- 7dhr
- 7dtd
- 7e2y
- 7e2z
- 7jvr
Group 1:
Message: No output log available
Projects:
- 6ps7
[37]:
!mwf dataset run -h
usage: mwf dataset run [-h] [-ns] [-nc] [-ig [INCLUDE_GROUPS ...]]
[-jt JOB_TEMPLATE] [--debug]
dataset_yaml
positional arguments:
dataset_yaml
Path to the dataset YAML file.
options:
-h, --help
show this help message and exit
-ns, --no_symlinks
Do not use symlinks internally
-nc, --no_colors
Do not use colors for logging
-ig, --include-groups [INCLUDE_GROUPS ...]
List of group IDs to be run.
-eg, --exclude-groups [EXCLUDE_GROUPS ...]
List of group IDs to be excluded.
-n, --n_jobs N_JOBS
Number of jobs to run.
--slurm
Submit the workflow to SLURM.
-jt, --job-template JOB_TEMPLATE
Path to the SLURM job template file. Required if --slurm is used.
--debug
Enable debug mode.
[36]:
# In cmd: #mwf dataset run dataset.yaml --slurm -jt job_template.sh -eg 3 4 0
!mwf dataset run {dataset_yaml_path} --slurm -jt {job_template} -eg 1 --debug -n 2
cd /home/rchaves/ssh_dirs/mn5/ruben/model/6kuy
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh
cd /home/rchaves/ssh_dirs/mn5/ruben/model/7cmu
sbatch --output=logs/mwf_%j.out --error=logs/mwf_%j.err mwf_slurm_job.sh