PDB File Header
This module defines functions for parsing header data from PDB files.
- class prody.proteins.header.Chemical(resname)[source]
A data structure for storing information on chemical components (or heterogens) in PDB structures.
A
Chemicalinstance has the following attributes:Attribute
Type
Description (RECORD TYPE)
resname
str
residue name (or chemical component identifier) (HET)
name
str
chemical name (HETNAM)
chain
str
chain identifier (HET)
resnum
int
residue (or sequence) number (HET)
icode
str
insertion code (HET)
natoms
int
number of atoms present in the structure (HET)
description
str
description of the chemical component (HET)
synonyms
list
synonyms (HETSYN)
formula
str
chemical formula (FORMUL)
pdbentry
str
PDB entry that chemical data is extracted from
Chemical class instances can be obtained as follows:
- chain
chain identifier
- description
description of the chemical component
- formula
chemical formula
- icode
insertion code
- name
chemical name
- natoms
number of atoms present in the structure
- pdbentry
PDB entry that chemical data is extracted from
- resname
residue name (or chemical component identifier)
- resnum
residue (or sequence) number
- synonyms
list of synonyms
- class prody.proteins.header.DBRef[source]
A data structure for storing reference to sequence databases for polymer components in PDB structures. Information if parsed from DBREF[1|2] and SEQADV records in PDB header.
- accession
database accession code
- database
sequence database, one of UniProt, GenBank, Norine, UNIMES, or PDB
- dbabbr
database abbreviation, one of UNP, GB, NORINE, UNIMES, or PDB
- diff
list of differences between PDB and database sequences,
(resname, resnum, icode, dbResname, dbResnum, comment)
- first
initial residue numbers,
(resnum, icode, dbnum)
- idcode
database identification code, i.e. entry name in UniProt
- last
ending residue numbers,
(resnum, icode, dbnum)
- class prody.proteins.header.Polymer(chid)[source]
A data structure for storing information on polymer components (protein or nucleic) of PDB structures.
A
Polymerinstance has the following attributes:Attribute
Type
Description (RECORD TYPE)
chid
str
chain identifier
name
str
name of the polymer (macromolecule) (COMPND)
fragment
str
specifies a domain or region of the molecule (COMPND)
synonyms
list
synonyms for the polymer (COMPND)
ec
list
associated Enzyme Commission numbers (COMPND)
engineered
bool
indicates that the polymer was produced using recombinant technology or by purely chemical synthesis (COMPND)
mutation
bool
indicates presence of a mutation (COMPND)
comments
str
additional comments
sequence
str
polymer chain sequence (SEQRES)
dbrefs
list
sequence database records (DBREF[1|2] and SEQADV), see
DBRefmodified
list
modified residues (MODRES)when modified residues are present, each will be represented as:(resname, chid, resnum, icode, stdname, comment)pdbentry
str
PDB entry that polymer data is extracted from
Polymer class instances can be obtained as follows:
- chid
chain identifier
- comments
additional comments
- dbrefs
sequence database reference records
- ec
list of associated Enzyme Commission numbers
- engineered
indicates that the molecule was produced using recombinant technology or by purely chemical synthesis
- fragment
specifies a domain or region of the molecule
- modified
modified residues
- mutation
indicates presence of a mutation
- name
name of the polymer (macromolecule)
- pdbentry
PDB entry that polymer data is extracted from
- sequence
polymer chain sequence
- synonyms
list of synonyms for the molecule
- prody.proteins.header.assignSecstr(header, atoms, coil=True)[source]
Assign secondary structure from header dictionary to atoms. header must be a dictionary parsed using the
parsePDB(). atoms may be an instance ofAtomGroup,Selection,ChainorResidue. ProDy can be configured to automatically parse and assign secondary structure information usingconfProDy(auto_secondary=True)command. See alsoconfProDy()function.The Dictionary of Protein Secondary Structure, in short DSSP, type single letter code assignments are used:
G = 3-turn helix (310 helix). Min length 3 residues.
H = 4-turn helix (alpha helix). Min length 4 residues.
I = 5-turn helix (pi helix). Min length 5 residues.
T = hydrogen bonded turn (3, 4 or 5 turn)
E = extended strand in parallel and/or anti-parallel beta-sheet conformation. Min length 2 residues.
B = residue in isolated beta-bridge (single pair beta-sheet hydrogen bond formation)
S = bend (the only non-hydrogen-bond based assignment).
C = residues not in one of above conformations.
See http://en.wikipedia.org/wiki/Protein_secondary_structure#The_DSSP_code for more details.
Following PDB helix classes are omitted:
Right-handed omega (2, class number)
Right-handed gamma (4)
Left-handed alpha (6)
Left-handed omega (7)
Left-handed gamma (8)
2 - 7 ribbon/helix (9)
Polyproline (10)
Secondary structures are assigned to all atoms in a residue. Amino acid residues without any secondary structure assignments in the header section will be assigned coil (C) conformation. This can be prevented by passing
coil=Falseargument.
- prody.proteins.header.buildBiomolecules(header, atoms, biomol=None)[source]
Returns atoms after applying biomolecular transformations from header dictionary. Biomolecular transformations are applied to all coordinate sets in the molecule.
Some PDB files contain transformations for more than 1 biomolecules. A specific set of transformations can be choosen using biomol argument. Transformation sets are identified by numbers, e.g.
"1","2", …If multiple biomolecular transformations are provided in the header dictionary, biomolecules will be returned as
AtomGroupinstances in alist().If the resulting biomolecule has more than 26 chains, the molecular assembly will be split into multiple
AtomGroupinstances each containing at most 26 chains. TheseAtomGroupinstances will be returned in a tuple.Note that atoms in biomolecules are ordered according to chain identifiers. When multiple chains in a biomolecule have the same chain identifier, they are given different segment names to distinguish them.
- prody.proteins.header.parsePDBHeader(pdb, *keys, **kwargs)[source]
Returns header data dictionary for pdb. This function is equivalent to
parsePDB(pdb, header=True, model=0, meta=False), likewise pdb may be an identifier or a filename.List of header records that are parsed.
Record type
Dictionary key(s)
Description
HEADER
classificationdeposition_dateidentifiermolecule classificationdeposition datePDB identifierTITLE
title
title for the experiment or analysis
SPLIT
split
list of PDB entries that make up the whole structure when combined with this one
COMPND
polymers
see
PolymerEXPDTA
experiment
information about the experiment
NUMMDL
n_models
number of models
MDLTYP
model_type
additional structural annotation
AUTHOR
authors
list of contributors
JRNL
reference
- reference information dictionary:
authors: list of authors
title: title of the article
editors: list of editors
issn:
reference: journal, vol, issue, etc.
publisher: publisher information
pmid: pubmed identifier
doi: digital object identifier
DBREF[1|2]
polymers
SEQADV
polymers
see
PolymerSEQRES
polymers
see
PolymerMODRES
polymers
see
PolymerHELIX
polymers
see
PolymerSHEET
polymers
see
PolymerHET
chemicals
see
ChemicalHETNAM
chemicals
see
ChemicalHETSYN
chemicals
see
ChemicalFORMUL
chemicals
see
ChemicalREMARK 2
resolution
resolution of structures, when applicable
REMARK 4
version
PDB file version
REMARK 350
biomoltrans
biomolecular transformation lines (unprocessed)
REMARK 900
related_entries
related entries in the PDB or EMDB
Header records that are not parsed are: OBSLTE, CAVEAT, SOURCE, KEYWDS, REVDAT, SPRSDE, SSBOND, LINK, CISPEP, CRYST1, ORIGX1, ORIGX2, ORIGX3, MTRIX1, MTRIX2, MTRIX3, and REMARK X not mentioned above.