Markush Structures & Combinatorial LibrariesFingerprinting & DictionariesClusteringDiversity AnalysisChemical Query Conversion
Torus™ToolkitsMain ProgramsWeb ServicesThird Party Integration
  About Us
  Products
  Consulting
  Support
  News & Events
  Contact Us
  Sitemap
 
Click here to login
 
 
... Fingerprinting and Dictionary Generation
This page gives an introduction to fingerprints, fingerprinting and dictionary generation and includes:
An Introduction to Fingerprints, Fingerprinting and Dictionaries
Digital Chemistry Fingerprinting and Dictionary Generation Tools
How to get more Information and Evaluation Software
An Introduction to Fingerprints, Fingerprinting and Dictionaries

Elsewhere on the website, introductions have been given to clustering and diversity analysis of chemical structures, particularly with regard to their importance in drug discovery. It is implied of course that if one can cluster molecules by computational methods it must be possible to represent chemical structures in such a way that a computer can interpret and compare them.

Such representations of chemical structure are called 'fingerprints'. Generated by fingerprinting, they are essential for searching chemical libraries for compounds containing particular substructures or pharmacophores.

Fingerprinting is the process of converting a chemical structure (in the form of a connection table) into a binary form (i.e. a string of on/off values). This binary form represents a kind of chemical shorthand which identifies the presence or absence of some structural feature in the original chemistry. An example fingerprint is shown in the diagram below.

There are a number of ways of generating fingerprints from chemical structures, but generally all these methods fall into one of two categories; hash-based and dictionary-based fingerprints. Digital Chemistry uses the dictionary-based approach.

To generate dictionary-based fingerprints it is of course necessary to have a dictionary. This is a set of structural fragments that is used to determine whether each bit in the binary string is 'on' or 'off', i.e. each bit of the fingerprint represents one or more fragments which must be present in the main structure for that bit to be set in the fingerprint.

An example is given below which shows a simple dictionary consisting of just five fragments (typically dictionaries contain between 500 and 5000 fragments); four specific and one generalised. The last generalised fragment actually represents two substructures; one containing sulphur and the other oxygen. When fingerprinted, the structure shown generates the fingerprint 11100 meaning that the first three fragments are present, but the last two are not.

Digital Chemistry Fingerprinting and Dictionary Generation Tools

Digital Chemistry offers Fingerprinting and Dictionary Generating software as a combined package, including some unique features that are not available with other fingerprinting applications.

As mentioned above there are two main types of fingerprint generation, in some the fragments present in a molecule are "hash-coded" to fingerprint bit positions. This has the advantage that any fragment present in the molecule will be encoded in the fingerprint, but the disadvantage that several different fragments may set the same bit, thus leading to ambiguity.

With Digital Chemistry Fingerprint Generation software, dictionaries of user-defined fragments determine exactly which fragment(s) each bit represents. Our unique fingerprinting approach allows the creation in the dictionary of "generalised" fragments, in which some fuzziness is used for the atom and bond types involved, for example an atom position may represent any number of atom types. It is also possible to create customised dictionaries, based on a statistical analysis of the fragments actually present in the structures in a particular dataset.

Additional key features of Digital Chemistry Fingerprinting and Dictionary Generation tools include:

  • Standard fragment dictionaries supplied with the package
  • Tools to create customised dictionaries
  • Fingerprint generation using either standard or custom dictionaries
  • Input and output of chemical structures in a variety of formats; SDfile, SMILES and Sybyl Line Notation; enabling integration with other packages.
Digital Chemistry Fingerprinting and Dictionary Generation is available in 3 formats as listed below, if you would like more detailed information about these please click on the links:
Digital Chemistry also supports the following operating systems, for a full list of hardware and software requirements for Digital Chemistry products please click here.
  • Windows
  • SUN Solaris
  • Linux
How to get more Information and Evaluation Software
If you would like any more information about Digital Chemistry's software or if you would like to request an evaluation copy please contact us, our details are given opposite.
Top

 

 
   
  search :
     
 

For further information,
please e-mail:


info@digitalchemistry.co.uk

or visit our Contact Us page
for more options