PDB File Header

This module defines functions for parsing header data from PDB files.

class prody.proteins.header.Chemical(resname)[source]

A data structure for storing information on chemical components (or heterogens) in PDB structures.

A Chemical instance has the following attributes:

Attribute

Type

Description (RECORD TYPE)

resname

str

residue name (or chemical component identifier) (HET)

name

str

chemical name (HETNAM)

chain

str

chain identifier (HET)

resnum

int

residue (or sequence) number (HET)

icode

str

insertion code (HET)

natoms

int

number of atoms present in the structure (HET)

description

str

description of the chemical component (HET)

synonyms

list

synonyms (HETSYN)

formula

str

chemical formula (FORMUL)

pdbentry

str

PDB entry that chemical data is extracted from

Chemical class instances can be obtained as follows:

chain

chain identifier

description

description of the chemical component

formula

chemical formula

icode

insertion code

name

chemical name

natoms

number of atoms present in the structure

pdbentry

PDB entry that chemical data is extracted from

resname

residue name (or chemical component identifier)

resnum

residue (or sequence) number

synonyms

list of synonyms

class prody.proteins.header.DBRef[source]

A data structure for storing reference to sequence databases for polymer components in PDB structures. Information if parsed from DBREF[1|2] and SEQADV records in PDB header.

accession

database accession code

database

sequence database, one of UniProt, GenBank, Norine, UNIMES, or PDB

dbabbr

database abbreviation, one of UNP, GB, NORINE, UNIMES, or PDB

diff

list of differences between PDB and database sequences, (resname, resnum, icode, dbResname, dbResnum, comment)

first

initial residue numbers, (resnum, icode, dbnum)

idcode

database identification code, i.e. entry name in UniProt

last

ending residue numbers, (resnum, icode, dbnum)

class prody.proteins.header.Polymer(chid)[source]

A data structure for storing information on polymer components (protein or nucleic) of PDB structures.

A Polymer instance has the following attributes:

Attribute

Type

Description (RECORD TYPE)

chid

str

chain identifier

name

str

name of the polymer (macromolecule) (COMPND)

fragment

str

specifies a domain or region of the molecule (COMPND)

synonyms

list

synonyms for the polymer (COMPND)

ec

list

associated Enzyme Commission numbers (COMPND)

engineered

bool

indicates that the polymer was produced using recombinant technology or by purely chemical synthesis (COMPND)

mutation

bool

indicates presence of a mutation (COMPND)

comments

str

additional comments

sequence

str

polymer chain sequence (SEQRES)

dbrefs

list

sequence database records (DBREF[1|2] and SEQADV), see DBRef

modified

list

modified residues (MODRES)
when modified residues are present, each will be represented as: (resname, chid, resnum, icode, stdname, comment)

pdbentry

str

PDB entry that polymer data is extracted from

Polymer class instances can be obtained as follows:

chid

chain identifier

comments

additional comments

dbrefs

sequence database reference records

ec

list of associated Enzyme Commission numbers

engineered

indicates that the molecule was produced using recombinant technology or by purely chemical synthesis

fragment

specifies a domain or region of the molecule

modified

modified residues

mutation

indicates presence of a mutation

name

name of the polymer (macromolecule)

pdbentry

PDB entry that polymer data is extracted from

sequence

polymer chain sequence

synonyms

list of synonyms for the molecule

prody.proteins.header.assignSecstr(header, atoms, coil=True)[source]

Assign secondary structure from header dictionary to atoms. header must be a dictionary parsed using the parsePDB(). atoms may be an instance of AtomGroup, Selection, Chain or Residue. ProDy can be configured to automatically parse and assign secondary structure information using confProDy(auto_secondary=True) command. See also confProDy() function.

The Dictionary of Protein Secondary Structure, in short DSSP, type single letter code assignments are used:

  • G = 3-turn helix (310 helix). Min length 3 residues.

  • H = 4-turn helix (alpha helix). Min length 4 residues.

  • I = 5-turn helix (pi helix). Min length 5 residues.

  • T = hydrogen bonded turn (3, 4 or 5 turn)

  • E = extended strand in parallel and/or anti-parallel beta-sheet conformation. Min length 2 residues.

  • B = residue in isolated beta-bridge (single pair beta-sheet hydrogen bond formation)

  • S = bend (the only non-hydrogen-bond based assignment).

  • C = residues not in one of above conformations.

See http://en.wikipedia.org/wiki/Protein_secondary_structure#The_DSSP_code for more details.

Following PDB helix classes are omitted:

  • Right-handed omega (2, class number)

  • Right-handed gamma (4)

  • Left-handed alpha (6)

  • Left-handed omega (7)

  • Left-handed gamma (8)

  • 2 - 7 ribbon/helix (9)

  • Polyproline (10)

Secondary structures are assigned to all atoms in a residue. Amino acid residues without any secondary structure assignments in the header section will be assigned coil (C) conformation. This can be prevented by passing coil=False argument.

prody.proteins.header.buildBiomolecules(header, atoms, biomol=None)[source]

Returns atoms after applying biomolecular transformations from header dictionary. Biomolecular transformations are applied to all coordinate sets in the molecule.

Some PDB files contain transformations for more than 1 biomolecules. A specific set of transformations can be choosen using biomol argument. Transformation sets are identified by numbers, e.g. "1", "2", …

If multiple biomolecular transformations are provided in the header dictionary, biomolecules will be returned as AtomGroup instances in a list().

If the resulting biomolecule has more than 26 chains, the molecular assembly will be split into multiple AtomGroup instances each containing at most 26 chains. These AtomGroup instances will be returned in a tuple.

Note that atoms in biomolecules are ordered according to chain identifiers. When multiple chains in a biomolecule have the same chain identifier, they are given different segment names to distinguish them.

prody.proteins.header.parsePDBHeader(pdb, *keys, **kwargs)[source]

Returns header data dictionary for pdb. This function is equivalent to parsePDB(pdb, header=True, model=0, meta=False), likewise pdb may be an identifier or a filename.

List of header records that are parsed.

Record type

Dictionary key(s)

Description

HEADER

classification
deposition_date
identifier
molecule classification
deposition date
PDB identifier

TITLE

title

title for the experiment or analysis

SPLIT

split

list of PDB entries that make up the whole structure when combined with this one

COMPND

polymers

see Polymer

EXPDTA

experiment

information about the experiment

NUMMDL

n_models

number of models

MDLTYP

model_type

additional structural annotation

AUTHOR

authors

list of contributors

JRNL

reference

reference information dictionary:
  • authors: list of authors

  • title: title of the article

  • editors: list of editors

  • issn:

  • reference: journal, vol, issue, etc.

  • publisher: publisher information

  • pmid: pubmed identifier

  • doi: digital object identifier

DBREF[1|2]

polymers

see Polymer and DBRef

SEQADV

polymers

see Polymer

SEQRES

polymers

see Polymer

MODRES

polymers

see Polymer

HELIX

polymers

see Polymer

SHEET

polymers

see Polymer

HET

chemicals

see Chemical

HETNAM

chemicals

see Chemical

HETSYN

chemicals

see Chemical

FORMUL

chemicals

see Chemical

REMARK 2

resolution

resolution of structures, when applicable

REMARK 4

version

PDB file version

REMARK 350

biomoltrans

biomolecular transformation lines (unprocessed)

REMARK 900

related_entries

related entries in the PDB or EMDB

Header records that are not parsed are: OBSLTE, CAVEAT, SOURCE, KEYWDS, REVDAT, SPRSDE, SSBOND, LINK, CISPEP, CRYST1, ORIGX1, ORIGX2, ORIGX3, MTRIX1, MTRIX2, MTRIX3, and REMARK X not mentioned above.