Atom Selections
This module defines a class for selecting subsets of atoms. You can read
this page in interactive sessions using help(select).
ProDy offers a fast and powerful atom selection class, Select.
Selection features, grammar, and keywords are similar to those of VMD.
Small differences, that is described below, should not affect most practical
uses of atom selections. With added flexibility of Python, ProDy selection
engine can also be used to identify intermolecular contacts. You may see
this and other usage examples in contacts and
selection-operations.
First, we import everything from ProDy and parse a protein-DNA-ligand complex structure:
parsePDB() returns AtomGroup instances, p in this case,
that stores all atomic data in the file. We can count different types of
atoms using Atom Flags and numAtoms() method as follows:
Last two counts suggest that ligand has 26 atoms, i.e. number of hetero atoms less the number of water atoms.
Atom flags
We select subset of atoms by using AtomGroup.select() method.
All Atom Flags can be input arguments to this methods as follows:
This operation returns Selection instances, which can be an input
to functions that accepts an atoms argument.
Logical operators
Flags can be combined using 'and' and 'or' operators:
'protein and water' did not result in selection of protein and
water atoms. This is because, no atom is flagged as a protein and a
water atom at the same time.
Note
Interpreting selection strings
You may think as if a selection string, such as 'protein and water', is
evaluated on a per atom basis and an atom is selected if it satisfies the
given criterion. To select both water and protein atoms, 'or' logical
operator should be used instead. A protein or a water atom would satisfy
'protein or water' criterion.
We can also use 'not' operator to negate an atom flag. For example,
the following selection will only select ligand atoms:
If you omit the 'and' operator, you will get the same result:
Note
Default operator between two flags, or other selection tokens that will
be discussed later, is 'and'. For example, 'not water hetero'
is equivalent to 'not water and hetero'.
We can select Cα atoms of acidic residues by omitting the default logical operator as follows:
Quick selections
For simple selections, such as shown above, following may be preferable over
the select() method:
The result is the same as using p.select('acidic calpha'). Underscore,
_, is considered as a whitespace. The limitation of this approach is that
special characters cannot be used.
Atom data fields
In addition to Atom Flags, Atom Data Fields can be used in atom selections when combined with some values. For example, we can select Cα and Cβ atoms of alanine residues as follows:
Note that we omitted the default 'and' operator.
Note
Whitespace or empty string can be specified using an '_'.
Atoms with string data fields empty, such as those with no a chain
identifiers or alternate location identifiers, can be selected using
an underscore.
Numeric data fields can also be used to make selections:
A special case for residues is having insertion codes. Residue numbers and insertion codes can be specified together as follows:
'resnum 5'selects residue 5 (all insertion codes)
'resnum 5A'selects residue 5 with insertion code A
'resnum 5_'selects residue 5 with no insertion code
Number ranges
A range of numbers using 'to' or Python style slicing with ':':
Note
Number ranges specify continuous intervals:
'to'is all inclusive, e.g.'resnum 1 to 4'means'1 <= resnum <= 4'
':'is left inclusive, e.g.'resnum 1:4'means'1 <= resnum < 4'
Consecutive use of ':', however, specifies a discrete range of numbers,
e.g. 'resnum 1:4:2' means 'resnum 1 3'
Special characters
Following characters can be specified when using Atom Data Fields for atom selections:
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
~@#$.:;_',
For example, "name C' N` O~ C$ C#" is a valid selection string.
Note
Special characters (~!@#$%^&*()-_=+[{}]\|;:,<>./?()'") must be
escaped using grave accent characters (``).
Negative numbers
Negative numbers and number ranges must also be escaped using grave accent
characters, since negative sign '-' is considered a special character
unless it indicates subtraction operation (see below).
Omitting the grave accent character will cause a SelectionError.
Regular expressions
Finally, you can specify regular expressions to select atoms based on data fields with type string. Following will select residues whose names start with capital letter A
Note
Regular expressions can be specified using double quotes, "...".
For more information on regular expressions see re.
Numerical comparisons
Atom Data Fields with numeric types can be used as operands in numerical comparisons:
Comparison |
Description |
|---|---|
< |
less than |
> |
greater than |
<= |
less than or equal |
>= |
greater than or equal |
== |
equal |
= |
equal |
!= |
not equal |
It is also possible to chain comparison statements as follows:
This would be the same as the following selection:
Furthermore, numerical comparisons may involve the following operations:
Operation |
Description |
|---|---|
x ** y |
x to the power y |
x ^ y |
x to the power y |
x * y |
x times y |
x / y |
x divided by y |
x // y |
x divided by y (floor division) |
x % y |
x modulo y |
x + y |
x plus y |
x - y |
x minus y |
These operations must be used with a numerical comparison, e.g.
Finally, following functions can be used in numerical comparisons:
Function |
Description |
|---|---|
abs(x) |
absolute value of x |
acos(x) |
arccos of x |
asin(x) |
arcsin of x |
atan(x) |
arctan of x |
ceil(x) |
smallest integer not less than x |
cos(x) |
cosine of x |
cosh(x) |
hyperbolic cosine of x |
floor(x) |
largest integer not greater than x |
exp(x) |
e to the power x |
log(x) |
natural logarithm of x |
log10(x) |
base 10 logarithm of x |
sin(x) |
sine of x |
sinh(x) |
hyperbolic sine of x |
sq(x) |
square of x |
sqrt(x) |
square-root of x |
tan(x) |
tangent of x |
tanh(x) |
hyperbolic tangent of x |
Distance based selections
Atoms within a user specified distance (A) from a set of user specified atoms
can be selected using 'within . of .' keyword, e.g. 'within 5 of water'
selects atoms that are within 5 A of water molecules. This setting will
results selecting water atoms as well.
User can avoid selecting specified atoms using exwithin . of .. setting,
e.g. 'exwithin 5 of water' will not select water molecules and is
equivalent to 'within 5 of water and not water'
Sequence selections
One-letter amino acid sequences can be used to make atom selections.
'sequence SAR' will select SER-ALA-ARG residues in a chain. Note
that the selection does not consider connectivity within a chain. Regular
expressions can also be used to make selections: 'sequence "MI.*KQ"' will
select MET-ILE-(XXX)n-ASP-LYS-GLN pattern, if present.
Expanding selections
A selection can be expanded to include the atoms in the same residue,
chain, or segment using same .. as .. setting, e.g.
'same residue as exwithin 4 of water' will select residues that have
at least an atom within 4 A of any water molecule.
Additionally, a selection may be expanded to the immediately bonded atoms using
bonded [n] to ... setting, e.g. bonded 1 to calpha will select atoms
bonded to Cα atoms. For this setting to work, bonds must be set by the user
using the AtomGroup.setBonds() or AtomGroup.inferBonds() method.
It is also possible to select bonded atoms by excluding the originating atoms
using exbonded [n] to ... setting. Number '[n]' indicates number of
bonds to consider from the originating selection.
Selection macros
ProDy allows you to define a macro for any valid selection string. Below functions are for manipulating selection macros:
You can also use this macro as follows:
Macros are stored in ProDy configuration file permanently. You can delete them if you wish as follows:
:keyword select() method also accepts keyword arguments that can simplify:
:keyword some selections. Consider the following case where you want to select some:
:keyword protein atoms that are close to its center:
:keyword .. ipython:: python: protein = p.protein
calcCenter(protein).round(2) sel1 = protein.select(‘sqrt(sq(x–21.17) + sq(y-35.86) + sq(z-79.97)) < 5’) sel1
- keyword Instead, you could pass a keyword argument and use the keyword in the:
- keyword selection string:
- keyword .. ipython::
python: sel2 = protein.select(‘within 5 of center’, center=calcCenter(protein)) sel2 sel1 == sel2
- keyword Note that selection string for sel2 lists indices of atoms. This:
- keyword substitution is performed automatically to ensure reproducibility of the:
- keyword selection without the keyword center.:
:keyword Keywords cannot be reserved words (see listReservedWords()) and must be:
:keyword all alphanumeric characters.:
- class prody.atomic.select.Select[source]
Select subsets of atoms based on a selection string. See
selectmodule documentation for selection grammar and examples. This class makes use of pyparsing_ module.- getBoolArray(atoms, selstr, **kwargs)[source]
Returns a boolean array with True values for atoms matching selstr. The length of the boolean
numpy.ndarraywill be equal to the length of atoms argument.
- getIndices(atoms, selstr, **kwargs)[source]
Returns indices of atoms matching selstr. Indices correspond to the order in atoms argument. If atoms is a subset of atoms, they should not be used for indexing the corresponding
AtomGroupinstance.
- exception prody.atomic.select.SelectionError(sel, loc=0, msg='', tkns=None)[source]
Exception raised when there are errors in the selection string.
- exception prody.atomic.select.SelectionWarning(sel='', loc=0, msg='', tkns=None)[source]
A class used for issuing warning messages when potential typos are detected in a selection string. Warnings are issued to
sys.stderrvia ProDy package logger. UseconfProDy()to selection warnings on or off, e.g.confProDy(selection_warning=False).
- prody.atomic.select.defSelectionMacro(name, selstr)[source]
Define selection macro selstr with name name. Both name and selstr must be string. An existing keyword cannot be used as a macro name. If a macro with given name exists, it will be overwritten.