API reference
This section details the functions and classes available in MDDB Workflow.
- class mddb_workflow.mwf.Project(directory: str = '.', accession: str | None = None, database_url: str = 'https://irb-dev.mddbr.eu/api/', inputs_filepath: str | None = None, input_topology_filepath: str | None = None, input_structure_filepath: str | None = None, input_trajectory_filepaths: list[str] | None = None, md_directories: list[str] | None = None, input_md_config: list[list[str]] | None = None, reference_md_index: int | None = None, forced_inputs: list[list[str]] | None = None, populations_filepath: str = 'populations.json', transitions_filepath: str = 'transitions.json', aiida_data_filepath: str | None = None, filter_selection: bool | str = False, pbc_selection: str | None = None, cg_selection: str | None = None, dummy_selection: str | None = None, forced_class_selections: dict | None = None, image: bool = False, fit: bool = False, translation: list[float] = [0, 0, 0], mercy: list[str] | bool = [], trust: list[str] | bool = [], faith: bool = False, ssleep: bool = False, pca_analysis_selection: str = "(protein and name N CA C) or (nucleic and name P O5' O3' C5' C4' C3')", pca_fit_selection: str = "(protein and name N CA C) or (nucleic and name P O5' O3' C5' C4' C3')", rmsd_cutoff: float = 9, interaction_cutoff: float = 0.1, interactions_auto: str | None = None, guess_bonds: bool = False, ignore_bonds: bool = False, sample_trajectory: int | None = None, screenshot_frame: int | None = None)[source]
Bases:
objectClass for the main project of an MDDB accession. A project is a set of related MDs. These MDs share all or most topology and metadata.
- __init__(directory: str = '.', accession: str | None = None, database_url: str = 'https://irb-dev.mddbr.eu/api/', inputs_filepath: str | None = None, input_topology_filepath: str | None = None, input_structure_filepath: str | None = None, input_trajectory_filepaths: list[str] | None = None, md_directories: list[str] | None = None, input_md_config: list[list[str]] | None = None, reference_md_index: int | None = None, forced_inputs: list[list[str]] | None = None, populations_filepath: str = 'populations.json', transitions_filepath: str = 'transitions.json', aiida_data_filepath: str | None = None, filter_selection: bool | str = False, pbc_selection: str | None = None, cg_selection: str | None = None, dummy_selection: str | None = None, forced_class_selections: dict | None = None, image: bool = False, fit: bool = False, translation: list[float] = [0, 0, 0], mercy: list[str] | bool = [], trust: list[str] | bool = [], faith: bool = False, ssleep: bool = False, pca_analysis_selection: str = "(protein and name N CA C) or (nucleic and name P O5' O3' C5' C4' C3')", pca_fit_selection: str = "(protein and name N CA C) or (nucleic and name P O5' O3' C5' C4' C3')", rmsd_cutoff: float = 9, interaction_cutoff: float = 0.1, interactions_auto: str | None = None, guess_bonds: bool = False, ignore_bonds: bool = False, sample_trajectory: int | None = None, screenshot_frame: int | None = None)[source]
Initialize a Project.
- Parameters:
directory (str) – Project directory where the whole workflow is to be run.
accession (Optional[str]) – Project accession to download missing input files from the database (if already uploaded).
database_url (str) – API URL to download missing data. when an accession is provided.
inputs_filepath (str) – Path to a file with inputs for metadata, simulation parameters and analysis config.
input_topology_filepath (Optional[str]) – Path or glob pattern to input topology file relative to the project directory.
input_structure_filepath (Optional[str]) – Path or glob pattern to input structure file. It may be relative to the project or to each MD directory. If this value is not passed then the standard structure file is used as input by default.
input_trajectory_filepaths (Optional[list[str]]) – Paths or glob patterns to input trajectory files relative to each MD directory. If this value is not passed then the standard trajectory file path is used as input by default.
md_directories (Optional[list[str]]) – Path or glob pattern to the different MD directories. Each directory is to contain an independent trajectory and structure. Several output files will be generated in every MD directory.
input_md_config (Optional[list[list[str]]]) – Configuration of a specific MD. You may declare as many as you want. Every MD requires a directory name and at least one trajectory path. The structure is -md <directory> <trajectory_1> <trajectory_2> … Note that all trajectories from the same MD will be merged. For legacy reasons, you may also provide a specific structure for an MD. e.g. -md <directory> <structure> <trajectory_1> <trajectory_2> …
reference_md_index (Optional[int]) – Index of the reference MD (used by project-level functions; defaults to first MD).
forced_inputs (Optional[list]) – Force a specific input through the command line. Inputs passed through command line have priority over the ones from the inputs file. In fact, these values will overwritten or be appended in the inputs file. Every forced input requires an input name and a value. The structure is -fi <input name> <new input value>
populations_filepath (str) – Path to equilibrium populations file (Markov State Model only)
transitions_filepath (str) – Path to transition probabilities file (Markov State Model only).
aiida_data_filepath (Optional[str]) – Path to the AiiDA data file. This file may be generated by the aiida-gromacs plugin and contains provenance data.
filter_selection (bool|str) – Atoms selection to be filtered in VMD format. If the argument is passed alone (i.e. with no selection) then water and counter ions are filtered.
pbc_selection (Optional[str]) – Selection of atoms which stay in Periodic Boundary Conditions even after imaging the trajectory. e.g. remaining solvent, ions, membrane lipids, etc. Selection passed through console overrides the one in inputs file.
cg_selection (Optional[str]) – Selection of atoms which are not actual atoms but Coarse Grained beads. Selection passed through console overrides the one in inputs file.
dummy_selection (Optional[str]) – Selection of atoms which are not real atoms but dummy atoms.
forced_class_selections – Custom forced selections for molecular classification.
image (bool) – Set if the trajectory is to be imaged so atoms stay in the PBC box. See -pbc for more information.
fit (bool) – Set if the trajectory is to be fitted (both rotation and translation) to minimize the RMSD to PROTEIN_AND_NUCLEIC_BACKBONE selection.
translation (list[float]) – Set the x y z translation for the imaging process. e.g. -trans 0.5 -1 0
mercy (list[str]|bool) – Failures to be tolerated (or boolean to set all/none).
trust (list[str]|bool) – Tests to skip/trust (or boolean to set all/none).
faith (bool) – If True, require input files to match expected output files and skip processing.
ssleep (bool) – If True, SSL certificate authentication is skipped when downloading data from an API.
pca_analysis_selection (str) – Atom selection for PCA analysis in VMD syntax.
pca_fit_selection (str) – Atom selection for the PCA fitting in VMD syntax.
rmsd_cutoff (float) – Set the cutoff for the RMSD sudden jumps analysis to fail. This cutoff stands for the number of standard deviations away from the mean an RMSD value is to be.
interaction_cutoff (float) – Set the cutoff for the interactions analysis to fail. This cutoff stands for percent of the trajectory where the interaction happens (from 0 to 1).
interactions_auto (Optional[str]) – Guess input interactions automatically. A VMD selection may be passed to limit guessed interactions to a specific subset of atoms.
guess_bonds (bool) – Force the workflow to guess atom bonds based on distance and atom radii in different frames along the trajectory instead of mining topology bonds.
ignore_bonds (bool) – Force the workflow to ignore atom bonds. This will result in many check-ins being skipped
sample_trajectory (Optional[int]) – If passed, download the first 10 (by default) frames from the trajectory. You can specify a different number by providing an integer value.
screenshot_frame (Optional[int]) – If passed, the project screenshot is made using the specified frame (0-based), from the reference MD. Negative number may be passed to select frames starting by the end. e.g. -1 is the last frame. By default the screenshot is made using the reference frame from the reference MD.
- property aiida_data_file: File | None
AiiDA data file (read only)
- property cg_residues: list[int]
Indices of residues in coarse grain (read only)
- property cg_selection: Selection
Periodic boundary conditions atom selection (read only)
- property charges
Atom charges (read only)
- static create_cache(directory: str = '.') Cache[source]
Create or load the project cache. Cannot fail unless the directory doesn’t exist, which is a pre-condition anyway.
- property dihedrals: list[dict]
Topology dihedrals (read only)
- get_cg_residues() list[int][source]
Get indices of coarse grain residues. Make sure they are coherent among all MDs.
- get_cg_selection() Selection[source]
Get the coarse grain atom selection. Make sure it is coherent among all MDs.
- get_chain_references = <Task (chains)>
- get_file(target_file: File) bool[source]
Check if a file exists. If not, try to download it from the database. If the file is not found in the database it is fine, we do not even warn the user. Note that nowadays this function is used to get populations and transitions files, which are not common.
- get_inchi_references = <Task (inchimap)>
- get_inchikeys = <Task (inchikeys)>
- get_input_topology_file() File | None[source]
Get the input topology file. If the file is not found try to download it.
- get_input_topology_filepath() str | None[source]
Get the input topology filepath from the inputs or try to guess it.
If the input topology filepath is a ‘no’ flag then we consider there is no topology at all So far we extract atom charges and atom bonds from the topology file In this scenario we can keep working but there are some consecuences:
Analysis using atom charges such as ‘energies’ will be skipped
The standard topology file will not include atom charges
Bonds will be guessed from atom radii and distance along multiple frames
- get_input_trajectory_files() list[File][source]
Get the input trajectory file(s) from the reference MD. If file(s) are not found then try to download them.
- get_ligand_references = <Task (ligmap)>
- get_lipid_references = <Task (lipmap)>
- get_membrane_map = <Task (memmap)>
- get_pdb_references = <Task (pdbs)>
- get_processed_interactions() dict[source]
Get the processed interactions from the reference replica, which are the same for all replicas.
- static get_project_directory(directory: str) str[source]
Get the project directory from the input directory.
- get_protein_map = <Task (protmap)>
- get_residue_map = <Task (resmap)>
- get_screenshot_filename = <Task (screenshot)>
- get_structure() Structure[source]
Get a reference structure. Use the reference MD structure but make sure there are no inconsistency with other MDs.
- get_warnings() list[source]
Get the warnings.
The warnings list should not be reasigned, but it was back in the day. To avoid silent bugs, we read it directly from the register every time.
- property inchikey_map
InChI references (read only)
- property inchikeys
InChI keys (read only)
- inherit_topology_filename() str | None[source]
Set the expected output topology filename given the input topology filename. Note that topology formats are conserved.
- property input_authors
Input authors (read only)
- property input_boxtype
Input boxtype (read only)
- property input_cg_selection
Selection of atoms which are not acutal atoms but Coarse Grained beads (read only)
- property input_chain_names
Input chain names (read only)
- property input_citation
Input citation (read only)
- property input_collections
Input collections (read only)
- property input_contact
Input contact (read only)
- property input_customs
Input custom representations (read only)
- property input_cv19_abs
Input Covid-19 antibodies (read only)
- property input_cv19_nanobs
Input Covid-19 nanobodies (read only)
- property input_cv19_startconf
Input Covid-19 starting conformation (read only)
- property input_cv19_unit
Input Covid-19 Unit (read only)
- property input_dataset
Dataset storage file. (read only)
- property input_description
Input description (read only)
- property input_dummy_selection
The original user input dummy atoms selection (read only)
- property input_ensemble
Input ensemble (read only)
- property input_force_fields
Input force fields (read only)
- property input_forced_class_selections
The original input custom forced selections for molecular classification (read only)
- property input_framestep
Input framestep (read only)
- property input_groups
Input groups (read only)
- property input_interactions
Interactions to be analyzed (read only)
- property input_license
Input license (read only)
- property input_ligands
Input ligand references (read only)
- property input_linkcense
Input license link (read only)
- property input_links
Input links (read only)
- property input_metadditions
Author-customizable metadata additional fields (read only)
- property input_method
Input method (read only)
- property input_multimeric
Input multimeric labels (read only)
- property input_name
Input name (read only)
- property input_orientation
Input orientation (read only)
- property input_pbc_selection
Selection of atoms which are still in periodic boundary conditions (read only)
- property input_pdb_ids
Protein Data Bank IDs used for the setup of the system (read only)
- property input_program
Input program (read only)
- property input_protein_references
Uniprot IDs to be used first when aligning protein sequences (read only)
- property input_structure_file: File
Input structure file for each MD (read only)
- property input_temperature
Input temperature (read only)
- property input_thanks
Input acknowledgements (read only)
- property input_timestep
Input timestep (read only)
- property input_topology_file: File | None
Input topology file (read only)
- property input_trajectory_files: list[File]
Input trajectory files for each MD (read only)
- property input_type
Set if its a trajectory or an ensemble (read only)
- property input_version
Input version (read only)
- property input_water
Input water force field (read only)
- property inputs: dict
Inputs from the inputs file (read only)
- property inputs_file: File
Inputs filename (read only)
- inputs_property(doc: str = '')[source]
Set a function to get a specific ‘input’ value by its key/name. Note that we return the property without calling the getter.
- property interactions: dict
Processed interactions (read only)
- is_inputs_file_available() bool[source]
Set a function to check if inputs file is available. Note that asking for it when it is not available will lead to raising an input error.
- property is_time_dependent: bool
Check if trajectory frames are time dependent (read only)
- property ligand_references
Ligand references (read only)
- property lipid_references
Lipid references (read only)
- property md_charges
Atom charges from each MD (read only)
- property membrane_map
Membrane mapping (read only)
- pathify(filename_or_relative_path: str) str[source]
Given a filename or relative path, add the project directory path at the beginning.
- property pbc_residues: list[int]
Indices of residues in periodic boundary conditions (read only)
- property pdb_ids: list[str]
Tested and standarized PDB ids (read only)
- property pdb_references
PDB references (read only)
- property populations: list[float] | None
Equilibrium populations from a MSM (read only)
- property populations_file: File | None
MSM equilibrium populations file (read only)
- prepare_metadata = <Task (pmeta)>
- prepare_standard_topology = <Task (stopology)>
- produce_provenance = <Task (aiidata)>
- property protein_map
Protein residues mapping (read only)
- property protein_references_file: File
File including protein refereces data mined from UniProt (read only)
- property reference_md_index: int
Reference MD index (read only)
- property residue_map
Residue map (read only)
- property screenshot_filename
Screenshot filename (read only)
- property snapshots: int
Reference MD snapshots (read only)
- property standard_topology_file: File
Standard topology filename (read only)
- property structure_file: File
Structure filename from the reference MD (read only)
- property topology_file: File
Topology file (read only)
- property topology_reader: Topology
Topology reader (read only)
- property trajectory_file: File
Trajectory filename from the reference MD (read only)
- property transitions: list[list[float]] | None
Transition probabilities from a MSM (read only)
- property transitions_file: File | None
MSM transition probabilities file (read only)
- property universe: int
MDAnalysis Universe object (read only)
- update_inputs(nested_key: str, new_value)[source]
Permanently update the inputs file. This may be done when command line inputs do not match file inputs.
- property warnings: list
Project warnings to be written in metadata
- class mddb_workflow.mwf.MD(project: Project, name: str, number: int, directory: str, input_topology_filepath: str, input_structure_filepath: str, input_trajectory_filepaths: list[str])[source]
Bases:
objectA Molecular Dynamics (MD) is the union of a structure and a trajectory. Having this data several analyses are possible. Note that an MD is always defined inside of a Project and thus it has additional topology and metadata.
- property average_structure_file: File
Average structure filename (read only)
- property cg_residues: list[int]
Indices of residues in coarse grain (read only)
- property cg_selection: Selection
Periodic boundary conditions atom selection (read only)
- property charges
Atom charges (read only)
- property dihedrals: list[dict]
Topology dihedrals (read only)
- property dummy_selection: Selection
Dummy atoms selection (read only)
- property first_frame_file: File
First frame (read only)
- property forced_class_selections: dict[str, Selection] | None
Custom forced selections for molecular classification (read only)
- get_MDAnalysis_Universe = <Task (mda_univ)>
- get_average_structure = <Task (average)>
- get_charges = <Task (charges)>
- get_file(target_file: File) bool[source]
Check if a file exists. If not, try to download it from the database. If the file is not found in the database it is fine, we do not even warn the user. Note that this function is used to get populations and transitions files, which are not common.
- get_first_frame = <Task (firstframe)>
- get_forced_class_selections() dict[str, Selection] | None[source]
Get the forced class atoms selection.
- get_input_structure_file() File[source]
Get the input pdb filename from the inputs. If the file is not found try to download it.
- get_input_topology_file() File | None[source]
Get the input topology file. If the file is not found try to download it.
- get_input_topology_filepath() str | None[source]
Get the input topology filepath from the inputs or try to guess it.
If the input topology filepath is a ‘no’ flag then we consider there is no topology at all So far we extract atom charges and atom bonds from the topology file In this scenario we can keep working but there are some consecuences:
Analysis using atom charges such as ‘energies’ will be skipped
The standard topology file will not include atom charges
Bonds will be guessed from atom radii and distance along multiple frames
- get_input_trajectory_files() list[File][source]
Get the input trajectory filename(s) from the inputs. If file(s) are not found try to download it.
- get_processed_interactions = <Task (inter)>
- get_reference_bonds = <Task (refbonds)>
- get_reference_frame = <Task (reframe)>
- get_snapshots = <Task (frames)>
- get_thickness_analysis = <Task (thickness)>
- get_transitions() list[list[float]][source]
Get transition probabilities from a MSM from the project.
- get_warnings() list[source]
Get the warnings.
The warnings list should not be reasigned, but it was back in the day. To avoid silent bugs, we read it directly from the register every time.
- inherit_topology_filename() str | None[source]
Set the expected output topology filename given the input topology filename. Note that topology formats are conserved.
- property input_cg_selection
Selection of atoms which are not actual atoms but coarse grain beads (read only)
- property input_dummy_selection
Selection of atoms which are not real atoms but dummy atoms (read only)
- input_files_processing = <Task (inpro)>
- property input_forced_class_selections
Custom forced selections for molecular classification (read only)
- input_getter()[source]
Get input values which may be MD specific. If the MD input is missing then we use the project input value.
- property input_interactions
Interactions to be analyzed (read only)
- property input_pbc_selection
Selection of atoms which are still in periodic boundary conditions (read only)
- property input_structure_file: File
Input structure filename (read only)
- property input_topology_file: File | None
Input topology file (read only)
- property input_trajectory_files: list[File]
Input trajectory filenames (read only)
- property interactions
Processed interactions (read only)
- is_inputs_file_available() bool[source]
Check if inputs file is available. Note that asking for it when it is not available will lead to raising an input error. This function is inherited from the project.
- property must_check_stable_bonds: bool
Check if we must check stable bonds (read only)
- pathify(filename_or_relative_path: str) str[source]
Given a filename or relative path, add the MD directory path at the beginning.
- property pbc_residues: list[int]
Indices of residues in periodic boundary conditions (read only)
- property pbc_selection: Selection
Periodic boundary conditions atom selection (read only)
- property populations: list[float]
Equilibrium populations from a MSM (read only)
- prepare_metadata = <Task (mdmeta)>
- property protein_map: dict
Residues mapping (read only)
- property reference_bonds
Atom bonds to be trusted (read only)
- property reference_frame
Reference frame to be used to represent the MD (read only)
- run_apl_analysis = <Task (apl)>
- run_channels_analysis = <Task (channels)>
- run_clusters_analysis = <Task (clusters)>
- run_density_analysis = <Task (density)>
- run_dihedral_energies = <Task (dihedrals)>
- run_dist_perres_analysis = <Task (dist)>
- run_energies_analysis = <Task (energies)>
- run_hbonds_analysis = <Task (hbonds)>
- run_helical_analysis = <Task (helical)>
- run_lipid_interactions_analysis = <Task (linter)>
- run_lipid_order_analysis = <Task (lorder)>
- run_markov_analysis = <Task (markov)>
- run_pca_analysis = <Task (pca)>
- run_pockets_analysis = <Task (pockets)>
- run_rgyr_analysis = <Task (rgyr)>
- run_rmsd_pairwise_analysis = <Task (pairwise)>
- run_rmsd_perres_analysis = <Task (perres)>
- run_rmsds_analysis = <Task (rmsds)>
- run_rmsf_analysis = <Task (rmsf)>
- run_sas_analysis = <Task (sas)>
- run_tmscores_analysis = <Task (tmscore)>
- property snapshots
Trajectory snapshots (read only)
- property structure_file: File
Structure file (read only)
- property thickness_analysis
Membrane thickness analysis
- property topology_file: File
Topology file (read only)
- property topology_filepath: str
Topology file path (read only)
- property topology_reader: Topology
Topology reader (read only)
- property trajectory_file: File
Trajectory file (read only)
- property transitions: list[list[float]]
Transition probabilities from a MSM (read only)
- property universe
MDAnalysis Universe object (read only)
- update_inputs(key: str, new_value)[source]
Permanently update current MD inputs in the inputs file. Do it only if the project value is not already the same.
- property warnings: list
MD warnings to be written in metadata
- class mddb_workflow.utils.structures.Structure(atoms: list[Atom] = [], residues: list[Residue] = [], chains: list[Chain] = [], residue_numeration_base: int = 10)[source]
Bases:
objectA structure is a group of atoms organized in chains and residues.
- SUPPORTED_SELECTION_SYNTAXES = {'gmx', 'pytraj', 'vmd'}
- property atom_count: int
The number of atoms in the structure (read only)
- auto_chainer(verbose: bool = False)[source]
Smart function to set chains automatically. Original chains will be overwritten.
- property bonds: list[list[int]]
The structure bonds
- property chain_count: int
Number of chains in the structure (read only)
- chainer(selection: Selection | None = None, letter: str | None = None, whole_fragments: bool = False)[source]
Set chains on demand. If no selection is passed then the whole structure will be affected. If no chain is passed then a “chain by fragment” logic will be applied.
- check_incoherent_bonds() bool[source]
Check bonds to be incoherent i.e. check atoms not to have more or less bonds than expected according to their element. Return True if any incoherent bond is found.
- check_merged_residues(fix_residues: bool = False, display_summary: bool = False) bool[source]
There may be residues which contain unconnected (unbonded) atoms. They are not allowed. They may come from a wrong parsing and be indeed duplicated residues.
Search for merged residues. Create new residues for every group of connected atoms if the fix_residues argument is True. Note that the new residues will be repeated, so you will need to run check_repeated_residues after. Return True if there were any merged residues.
- check_repeated_atoms(fix_atoms: bool = False, display_summary: bool = False) bool[source]
Check atoms to search for repeated atoms. Atoms with identical chain, residue and name are considered repeated atoms.
- Parameters:
fix_atoms (bool) – If True, rename repeated atoms.
display_summary (bool) – If True, display a summary of repeated atoms.
- Returns:
True if there were any repeated atoms, False otherwise.
- Return type:
bool
- check_repeated_chains(fix_chains: bool = False, display_summary: bool = False) bool[source]
There may be chains which are equal in the structure (i.e. same chain name). This means we have a duplicated/splitted chain. Repeated chains are usual and they are usually supported but with some problems. Also, repeated chains usually come with repeated residues, which means more problems (see explanation below).
- In the context of this structure class we may have 2 different problems with a different solution each:
There is more than one chain with the same letter (repeated chain) -> rename the duplicated chains
There is a chain with atom indices which are not consecutive (splitted chain) -> create new chains
Rename repeated chains or create new chains if the fix_chains argument is True.
WARNING: These fixes are possible only if there are less chains than the number of letters in the alphabet. Although there is no limitation in this code for chain names, setting long chain names is not compatible with pdb format.
Check splitted chains (a chains with non consecutive residues) and try to fix them if requested. Check repeated chains (two chains with the same name) and return True if there were any repeats.
- check_repeated_residues(fix_residues: bool = False, display_summary: bool = False) bool[source]
There may be residues which are equal in the structure (i.e. same chain, number and icode). In case 2 residues in the structure are equal we must check distance between their atoms. If atoms are far it means they are different residues with the same notation (duplicated residues). If atoms are close it means they are indeed the same residue (splitted residue).
Splitted residues are found in some pdbs and they are supported by some tools. These tools consider all atoms with the same ‘record’ as the same residue. However, there are other tools which would consider the splitted residue as two different residues. This causes inconsistency along different tools besides a long list of problems. The only possible is fix is changing the order of atoms in the topology. Note that this is a breaking change for associated trajectories, which must change the order of coordinates. However here we provide tools to fix associates trajectories as well.
Duplicated residues are usual and they are usually supported but with some problems. For example, pytraj analysis outputs use to sort results by residues and each residue is tagged. If there are duplicated residues with the same tag it may be not possible to know which result belongs to each residue. Another example are NGL selections once in the web client. If you select residue ‘:A and 1’ and there are multiple residues 1 in chain A all of them will be displayed.
Check residues to search for duplicated and splitted residues. Renumerate repeated residues if the fix_residues argument is True. Return True if there were any repeats.
- check_splitted_chains(fix_chains: bool = False, display_summary: bool = False) bool[source]
Check if non-consecutive atoms belong to the same chain. If so, separate pieces of non-consecuite atoms in different chains. Note that the new chains will be duplicated, so you will need to run check_repeated_chains after.
- Parameters:
fix_chains (bool) – If True then the splitted chains will be fixed.
display_summary (bool) – If True then a summary of the splitted chains will be displayed.
- Returns:
True if we encountered splitted chains and false otherwise.
- Return type:
bool
- property dummy_atom_indices: set
Atom indices for what we consider dummy atoms
- filter(selection: Selection | str, selection_syntax: str = 'vmd') Structure[source]
Create a new structure from the current using a selection to filter the atoms we want to keep.
- filter_away(selection: Selection | str, selection_syntax: str = 'vmd') Structure[source]
Create a new structure from the current using a selection to filter the atoms we want to remove.
- find_covalent_bonds(selection: Selection | None = None, safe_elements: bool = True) list[list[int]][source]
Get all atomic covalent (strong) bonds. Bonds are defined as a list of atom indices for each atom in the structure. Rely on VMD logic to do so.
- find_fragments(selection: Selection | None = None, coherent: bool = True, exclude_dummy_fragments: bool = False, atom_bonds: list[list[int]] | None = None) Generator[Selection, None, None][source]
Find fragments in a selection of atoms. A fragment is a selection of covalently bonded atoms. All atoms are searched if no selection is provided.
WARNING: Note that fragments generated from a specific selection may not match the structure fragments. A selection including 2 separated regions of a structure fragment will yield 2 fragments.
For convenience, bonds between non-consecutive residues are excluded from this logic. This is useful to ignore disulfide bonds. May also help to properly find chains in CG simulations where chains may be bonded.
There is also a flag to exclude fragments which are made of dummy atoms only
- find_ptms() Generator[dict, None, None][source]
Find Post Translational Modifications (PTM) in the structure.
- find_residue(chain_name: str, number: int, icode: str = '') Residue | None[source]
Find a residue by its chain, number and insertion code.
- find_rings(max_ring_size: int, selection: Selection | None = None) list[list[Atom]][source]
Find rings with a maximum specific size or less in the structure and yield them as they are found.
- find_whole_fragments(selection: Selection) Generator[Selection, None, None][source]
Given a selection of atoms, find all whole structure fragments on them.
- fix_atom_elements(trust: bool = True, show_warnings: bool = True) bool[source]
Fix atom elements by guessing them when missing. Set all elements with the first letter upper and the second (if any) lower. Also check if atom elements are coherent with atom names.
- Parameters:
trust (bool) – If ‘trust’ is set as False then we impose elements according to what we can guess from the atom name.
- Returns:
Return True if any element was modified or False if not.
- Return type:
bool
- force_classifications(classifications: dict[str, Selection])[source]
Apply a set of forced residue classifications
- property fragments: list[Selection]
The structure fragments (read only)
- classmethod from_file(mysterious_filepath: str)[source]
Set the structure from a file if the file format is supported.
- classmethod from_mdanalysis(mdanalysis_universe)[source]
Set the structure from an MD analysis object.
- classmethod from_mmcif(mmcif_content: str, model: int = 1, author_notation: bool = False)[source]
Set the structure from mmcif. You may filter the content for a specific model. You may ask for the author notation instead of the standarized notation for legacy reasons. This may have an effect in atom names, residue names, residue numbers and chain names. Read the pdb content line by line and set the parsed atoms, residues and chains.
- classmethod from_mmcif_file(mmcif_filepath: str, model: int = 1, author_notation: bool = False)[source]
Set the structure from a mmcif file.
- classmethod from_pdb(pdb_content: str, model: int | None = None, flexible_numeration: bool = True)[source]
Set the structure from a pdb file. You may filter the PDB content for a specific model. Some weird numeration systems are not supported and, when encountered, they are ignored. In these cases we set our own numeration system. Set the flexible numeration argument as false to avoid this behaviour, thus crashing instead.
- classmethod from_pdb_file(pdb_filepath: str, model: int | None = None, flexible_numeration: bool = True)[source]
Set the structure from a pdb file. You may filter the input PDB file for a specific model. Some weird numeration systems are not supported and, when encountered, they are ignored. In these cases we set our own numeration system. Set the flexible numeration argument as false to avoid this behaviour, thus crashing instead.
- classmethod from_pdb_id(pdb_id: str, model: int = 1, author_notation: bool = False)[source]
Download and parse the structure from a PDB entry.
- generate_pdb(show_warnings: bool = True)[source]
Generate a pdb file content with the current structure.
- generate_pdb_file(pdb_filepath: str, show_warnings: bool = True)[source]
Generate a pdb file with current structure.
- get_available_chain_name() str | None[source]
Get an available chain name. Find alphabetically the first letter which is not yet used as a chain name. If all letters in the alphabet are used already then raise an error.
- get_bonds(safe: bool = True) list[list[int]][source]
Get the bonds between atoms. The safe argument makes sure elemnts are corrected before the calculation. Note that elements are important since atom radii are taken in count to calculate bonds.
- get_next_available_chain_name(anterior: str) str[source]
Get the next available chain name.
- Parameters:
anterior (str) – The last chain name used, which is expected to be a single letter
- Raises:
ValueError – If the anterior is not a letter or if there are more chains than available
- get_parsed_chains(only_protein: bool = False) list[source]
Get each chain name and aminoacids sequence in a topology.
- get_rechained_structure(atom_chain_map: list) Structure[source]
Given a chain map, copy this structure but applying the new chain map
- get_selection_chain_indices(selection: Selection) list[int][source]
Given an atom selection, get a list of chain indices for chains implicated. Note that if a single atom from the chain is in the selection then the chain index is returned.
- get_selection_chains(selection: Selection) list[Chain][source]
Given an atom selection, get a list of chains implicated. Note that if a single atom from the chain is in the selection then the chain is returned.
- get_selection_outer_bonds(selection: Selection) list[int][source]
Given an atom selection, get all bonds between these atoms and any other atom in the structure. Note that inner bonds between atoms in the selection are discarded.
- get_selection_residue_indices(selection: Selection) list[int][source]
Given an atom selection, get a list of residue indices for residues implicated. Note that if a single atom from the residue is in the selection then the residue index is returned.
- get_selection_residues(selection: Selection) list[Residue][source]
Given an atom selection, get a list of residues implicated. Note that if a single atom from the residue is in the selection then the residue is returned.
- get_sequences(polymer_type: str | None = None) list[str][source]
Get list of protein sequences in the structure.
- property ion_atom_indices: set
Atom indices for what we consider supported ions
- name_selection(selection: Selection) str[source]
Name an atom selection depending on the chains it contains. This is used for debug purpouses.
- ptm_options = {'acetyl': 'Acetylation', 'amide': 'Amidation', 'carbohydrate': 'Glycosilation', 'dna': 'DNA linkage', 'fatty': 'Lipidation', 'ion': Warning('Ion is covalently bonded to protein'), 'other': Warning('Unknow type of PTM'), 'protein': ValueError('A PTM residue must never be protein'), 'rna': 'RNA linkage', 'solvent': Warning('Solvent is covalently bonded to protein'), 'steroid': 'Steroid linkage'}
- purge_chain(chain: Chain)[source]
Purge chain from the structure. This can be done only when the chain has no residues left in the structure. Renumerate all chain indices which have been offsetted as a result of the purge.
- purge_residue(residue: Residue)[source]
Purge residue from the structure and its chain. This can be done only when the residue has no atoms left in the structure. Renumerate all residue indices which have been offsetted as a result of the purge.
- raw_protein_chainer()[source]
This is an alternative system to find protein chains (anything else is chained as ‘X’). This system does not depend on VMD. It totally overrides previous chains since it is expected to be used only when chains are missing.
- property residue_count: int
Number of residues in the structure (read only)
- select(selection_string: str, syntax: str = 'vmd') Selection | None[source]
Select atoms from the structure thus generating an atom indices list. Different tools may be used to make the selection: - vmd (default) - pytraj
- select_atom_indices(atom_indices: list[int]) Selection[source]
Set a function to make selections using atom indices.
- select_by_classification(classification: str) Selection[source]
Select atoms according to the classification of its residue.
- select_cartoon(include_terminals: bool = False) Selection[source]
Select cartoon representable regions for VMD.
- Rules are:
Residues must be protein (i.e. must contain C, CA, N and O atoms) or nucleic (P, OP1, OP2, O3’, C3’, C4’, C5’, O5’)
There must be at least 3 covalently bonded residues
It does not matter their chain, numeration or even index order as long as they are bonded. * Note that we can represent cartoon while we display one residue alone, but it must be connected anyway. Also, we have the option to include terminals in the cartoon selection although they are not representable. This is helpful for the screenshot: terminals are better hidden than represented as ligands.
- select_counter_ions(charge: str | None = None) Selection[source]
Select counter ion atoms. WARNING: This logic is a bit guessy and it may fail for non-standard atom named structures
- select_ligands(inchikey_map: list[dict]) Selection[source]
Get a selection of all the ligand residues in the system based on the inchikey map.
- select_pbc_guess() Selection[source]
Return a selection of the typical PBC atoms: solvent, counter ions and lipids. WARNING: This is just a guess.
- select_protein() Selection[source]
Select protein atoms. WARNING: Note that there is a small difference between VMD protein and our protein. This function is improved to consider terminal residues as proteins. VMD considers protein any residue including N, C, CA and O while terminals may have OC1 and OC2 instead of O.
- select_residue_indices(residue_indices: list[int]) Selection[source]
Set a function to make selections using residue indices.
- select_water() Selection[source]
Select water atoms. WARNING: This logic is a bit guessy and it may fail for non-standard residue named structures
- set_new_chain(chain: Chain)[source]
Set a new chain in the structure. WARNING: Residues and atoms must be set already before setting chains.
- set_new_coordinates(new_coordinates: list[tuple[float, float, float]])[source]
Set new coordinates.
- set_new_residue(residue: Residue)[source]
Set a new residue in the structure. WARNING: Atoms must be set already before setting residues.
- class mddb_workflow.utils.structures.Chain(name: str | None = None, classification: str | None = None)[source]
Bases:
objectA chain of residues.
- property atom_count: int
Number of atoms in the chain (read only)
- property atom_indices: list[int]
Atom indices for all atoms in the chain (read only)
- property atoms: list[int]
Atoms in the chain (read only)
- property classification: str
Classification of the chain (manual or automatic)
- find_or_create_residue(name: str, number: int, icode: str = '') Residue[source]
Find a residue by its number and insertion code or create it if does not exist.
- find_residue(number: int, icode: str = '', name: str = None) Residue | None[source]
Find a residue by its number and insertion code. Name is optional.
- get_atom_indices() list[int][source]
Get atom indices for all atoms in the chain (read only). In order to change atom indices they must be changed in their corresponding residues.
- get_atoms() list[int][source]
Get the atoms in the chain (read only). In order to change atoms they must be changed in their corresponding residues.
- get_residues() list[Residue][source]
The residues in this chain. If residues are set then make changes in all the structure to make this change coherent.
- property index: int | None
The residue index according to parent structure residues (read only)
- remove_residue(residue: Residue)[source]
Remove a residue from the chain. WARNING: Note that this function does not trigger the set_residue_indices.
- property residue_count: int
Number of residues in the chain (read only)
- property residue_indices: list[int]
The residue indices according to parent structure residues for residues in this residue
- class mddb_workflow.utils.structures.Residue(name: str | None = None, number: int | None = None, icode: str | None = None)[source]
Bases:
objectA residue class.
- property atom_count: int
The number of atoms in the residue (read only)
- property atom_indices: list[int]
The atom indices according to parent structure atoms for atoms in this residue
- property chain_index: int
The residue chain index according to parent structure chains
- property classification: str
Get the residue biochemistry classification.
WARNING: Note that this logic will not work in a structure without hydrogens.
Available classifications: - protein - dna - rna - carbohydrate - fatty - steroid - ion - solvent - acetyl - amide - other
- get_atom_indices() list[int][source]
Get the atom indices according to parent structure atoms for atoms in this residue. If atom indices are set then make changes in all the structure to make this change coherent.
- get_atoms() list[Atom][source]
Get the atoms in this residue. If atoms are set then make changes in all the structure to make this change coherent.
- get_bonded_residue_indices() list[int][source]
Get residue indices from residues bonded to this residue.
- get_classification() str[source]
Get the residue biochemistry classification.
WARNING: Note that this logic will not work in a structure without hydrogens.
Available classifications: - protein - dna - rna - carbohydrate - fatty - steroid - ion - solvent - acetyl - amide - other
- get_classification_by_name() str[source]
Set an alternative function to “try” to classify the residues according only to its name. This is useful for corase grain residues whose atoms may not reflect the real atoms. WARNING: This logic is very limited and will return “unknown” most of the times. WARNING: This logic relies mostly in atom names, which may be not standard.
- get_index() int | None[source]
Get the residue index according to parent structure residues (read only). This value is set by the structure itself.
- get_letter() str[source]
Get the residue equivalent single letter code. Note that this makes sense for aminoacids and nucleotides only. Non recognized residue names return ‘X’.
- get_structure() Structure | None[source]
Get the parent structure (read only). This value is set by the structure itself.
- property index: int | None
The residue index according to parent structure residues (read only)
- is_bonded_with_residue(other: Residue) bool[source]
Given another residue, check if it is bonded with this residue.
- is_cg() bool[source]
Ask if the current residue is in coarse grain. Note that we assume there may be not hybrid aa/cg residues.
- property label: str
The residue standard label (read only)
- split(first_residue_atom_indices: list[int], second_residue_atom_indices: list[int], first_residue_name: str | None = None, second_residue_name: str | None = None, first_residue_number: int | None = None, second_residue_number: int | None = None, first_residue_icode: str | None = None, second_residue_icode: str | None = None) tuple[Residue, Residue][source]
Split this residue in 2 residues and return them in a tuple. Keep things coherent in the structure (renumerate all residues below this one). Note that all residue atoms must be covered by the splits.
- split_by_atom_names(first_residue_atom_names: list[str], second_residue_atom_names: list[str], first_residue_name: str | None = None, second_residue_name: str | None = None, first_residue_number: int | None = None, second_residue_number: int | None = None, first_residue_icode: str | None = None, second_residue_icode: str | None = None) tuple[Residue, Residue][source]
Parse atom names to atom indices and then call the split function.
- class mddb_workflow.utils.structures.Atom(name: str | None = None, element: str | None = None, coords: tuple[float, float, float] | None = None)[source]
Bases:
objectAn atom class.
- property bonds: list[int] | None
Atoms indices of atoms in the structure which are covalently bonded to this atom
- property chain_index: int | None
The atom chain index according to parent structure chains
- get_bonds(skip_ions: bool = False, skip_dummies: bool = False, only_residue: bool = False, safe: bool = True) list[int] | None[source]
Get indices of other atoms in the structure which are covalently bonded to this atom.
- get_chain() Chain | None[source]
The atom chain (read only). In order to change the chain it must be changed in the atom residue.
- get_chain_index() int | None[source]
Get the atom chain index according to parent structure chains.
- get_index() int | None[source]
The residue index according to parent structure residues (read only). This value is set by the structure itself.
- get_residue() Residue | None[source]
The atom residue. If residue is set then make changes in all the structure to make this change coherent.
- get_residue_index() int[source]
The atom residue index according to parent structure residues. If residue index is set then make changes in all the structure to make this change coherent.
- get_structure() Structure | None[source]
The parent structure (read only). This value is set by the structure itself.
- property index: int | None
The residue index according to parent structure residues (read only)
- is_carbohydrate_candidate() bool[source]
Check if this atom meets specific criteria: 1. It is a carbon 2. It is connected only to other carbons, hydrogens or oxygens 3. It is connected to 1 or 2 carbons 4. It is connected to 1 oxygen
- is_fatty_candidate() bool[source]
Check if this atom meets specific criteria: 1. It is a carbon 2. It is connected only to other carbons and hydrogens 3. It is connected to 1 or 2 carbons
- property label: str
Get a standard label.
- property residue_index: int
The atom residue index according to parent structure residues