Implementation Of 3d Database Searching

In common with the substructure searching methods discussed in Chapter 1, a two-stage procedure is most commonly used to perform 3D searches of databases such as the CSD where the objective is to identify conformations that match the query. The first stage employs some form of rapid screen to eliminate molecules that could not match the query. The second stage uses a graph-matching procedure to identify those structures that do truly match the query.

One way to achieve the initial screen is to encode information about the distances between relevant groups in the molecular conformation. This can be achieved using bitstrings in which each bit position corresponds to a distance range between a specific pair of atoms or groups of atoms (e.g. functional groups or features such as rings). For example, the first position may correspond to the distance range 2-3A between a carbonyl oxygen and an amine, the second position to the range 3-4A between these two features and so on. Initially the bitstring (or distance key) would contain all zeros. For each molecular conformation, the distances between all pairs of atoms or groups are calculated and the appropriate bits set to "1" in the distance key. It is common to use smaller distance ranges for the more common distances found in 3D structures and larger distance ranges for the less common distances; this provides a more even population distribution across the distance bins than were a uniform binning scheme employed [Cringean et al. 1990]. In addition to pairwise distances it is also possible to encode angular information involving sets of three atoms and torsional information from four atoms to further improve the search efficiency [Poirrette et al. 1991, 1993].

To perform a search, the distance key for the query is generated and used to screen the database. Only those structures that remain after the screening process are subjected to the more time-consuming second stage. This typically involves a subgraph isomorphism method such as the Ullmann algorithm. The key difference between the 2D and 3D searches is that in the former the only edges in the molecular graph are those between atoms that are formally bonded together. By contrast, in the corresponding 3D graph there is an edge between all pairs of atoms in the molecule; the value associated with each edge is the appropriate interatomic distance. Thus a 3D graph represents the topography of a molecule. The query is often specified in terms of distance ranges; should the distance in the database conformation fall within the bounds of these ranges then that would be counted as a potential match. Potential matches are then confirmed by fitting the relevant conformation to the query in Cartesian space. This final fitting step is needed, for example, because stereoisomers cannot be distinguished using distance information alone.

Fitness Resolution Fortress

Fitness Resolution Fortress

Learning About Fitness Resolution Fortress Can Have Amazing Benefits For Your Life And Success! Start Planning To Have Excellent Health And Fitness Today!

Get My Free Ebook

Post a comment