TopSearch is a new powerful gateway to protein structures and their structural relationships. Given a query structure the search engine instantly returns a list of known protein structures sorted by structural similarity. TopSearch operates on three structural levels: chains, biological assemblies/quaternary structures (yes, including ribosomes!), and asymmetric units. All at the same speed! In working with structures, numbers are not enough - with TopSearch you see what you get. Immediately!

System Requirements

TopSearch requires Java to run the Jmol plugin. Due to this platform independent setup, TopSearch should run on most operating systems as well as on all common browsers. If you experience any trouble using TopSearch, please feel free to contact us.

Overview

To get more information about the various functions of TopSearch, please click on the respective area in the screenshot:

Overview of the TopSearch interface. Click on any area in the screenshot to get more information on the features of TopSearch.

Input

Usually, the input for TopSearch will be a four letter PDB code. Alternatively, it is possible to upload a structure in PDB file format or to perform a full text search of the database, e.g. with a protein function or the name of an enzyme.

The input can be a 4-letter PDB code, a user-supplied file in PDB format, or any string for a keyword search.

Result of a free text search for "sialidase". Protein chains that match the search term can be accessed by clicking on the respective line.

After a full text search you can pick any PDB chain in the result table and start to explore its nearest neighbors. For any PDB code, TopSearch provides several selectable structural entities (see next section).

Selection

After hitting the Enter key or clicking on the Go button, TopSearch will present you with several options to generate a nearest neighbor list. Essentially, you can search on three different layers of structural organisation:

Biological Unit(s): All biological units (biological assemblies) that are defined in the PDB file. Each biological assembly is denoted with an @i (i being an integer) following the PDB code.
Protein Chain(s): All protein chains defined in the PDB file.
Asymmetric Unit (or PDB file): Complete PDB file. In case of X-ray structures, this represents the unique part of a crystal structure. In case of NMR structures, the first model is used.

The nearest neighbor search is launched by clicking on any of the protein structural entities.

For a given PDB code, the selection window provides access to the structure on three different layers of structural organisation.

The Nearest Neighbor List

After clicking on a structural entity, the TopSearch webpage will display a list containing the most similar proteins sorted by structural similarity to the query protein. Additionally the Jmol widget will show you the structure of the selected entity.

The heart of TopSearch: The nearest neighbor list.

The list is ordered by a similarity score, S, which combines the number of aligned residues derived from the respective structure alignment with the distance between each two paired residues (see below).

Table Columns

The hitlist table has the following columns:

Rank: The rank of the entry in the result list.
Target: The access code of the entry; depends on the selected protein structure unit.
Size: Number of residues in the target structure.
Release: Date on which the protein structure was released by the PDB.
M: One letter code for the experimental method that was used to solve the protein structure.
Resol.: Resolution of the protein structure in Ångström.
S: Structural similarity between query and target. Similarity is quantified by a score of structural similarity based on Gaussian functions (see Sippl & Wiederstein (2012)). If the structurally equivalent parts in query and target match perfectly, S is equal to the length of the structure alignment. With increasing spatial deviation of the aligned residues, S approaches 0.
*: A '*' symbol in this column indicates that the value of S is estimated.
S_q: Query cover based on similarity score S, expressed in percent.
S_t: Target cover based on similarity score S, expressed in percent.
Species: The scientific name of the organism which the protein was derived from.
Kingdom: Bacteria, Archaea, Eukaryota, or Viruses.
Compound: Compound information of the PDB file. Usually a short description of the function/molecule name for each chain. For multiple chains, the compound terms are separated by "//".
Ligand: List of ligands (abbreviated) in the structure. Multiple ligands are separated by "//".
Chains: Number of chains the structure consists of.

Additionally it is possible to add/remove or reorder columns using the Configure Hitlist option.

Customizing the hit list to your likings. Here we removed the GS column and switched the positions of the Size and M columns.

Sorting by Table Columns

To examine the hit list more efficiently, it is possible to sort the list by clicking on the list headers. By clicking on the header of any column again, the current mode of sorting is reversed.
Note that only the results visible in the current hitlist will be filtered.

Complete List / Group List

TopSearch offers a simple filtering option with the Complete List / Group List radio buttons on top of the nearest neighbor list. By default, the filter is switched off, i.e. a complete result list is displayed. The Group List option keeps only one entry (the "representative") from a group of structurally similar targets. Several ways of filtering are available: The Reduced List option removes targets that have at least 90% sequence similarity AND at least 90% structure similarity to any other target. Several other options (L30,...,L99) allow to remove redundancy on different levels of structural similarity.

If the Group List radio button is selected, only representative targets are shown.

By clicking on the tree icon the members of a group can be analyzed in detail in TopTree

Searching in Topsearch Results

Sometimes it might be convenient to be able to search for specific PDB codes, organisms, kingdoms, etc. in the hit list. TopSearch offers a simple search tool that enables the user to find PDB codes in the current hit list page. Entering a search term into the white input box located under each column header restricts the results to entries that contain the specified term. By using search term for multiple columns it is possible to retrieve results matching all given criteria.
Note that only the results visible in the current hitlist will be sorted. Therefore, sorting by reverse similarity will give the 200th entry at the top of the list.

With the search function it is possible to filter the current hit list for various criteria. In the example all Bacterial hits from 2013 are displayed

Links to External Resources

More Information on proteins.

Superimpose two structures

To visualize the query's superposition with any entry in the result list, just click on the respective row in the list. The Jmol widget will then show the superposition and additional data that characterize the alignment as calculated by the TopMatch program. You can align any two entries in the list by entering or dragging their codes into the TopMatch widget.

The TopMatch widget offers two buttons:

Match, which shows the superimposition in the Jmol widget
Details, which opens a new TopMatch-web window every time you click on it

The Jmol widget

The Jmol widget is used to visualize protein structures and their superpositions. When a structural entity is selected by clicking on it, the respective nearest neighbor list is generated and the entity is shown in the Jmol widget. If chains are selected, the whole asymmetric unit is displayed and the selected chain is colored blue, the rest of the structure is shown in gray.

The structural superposition shows query and target superimposed. The query is colored in blue, the target in green. Aligned residues are colored in orange (query) or red (target). The widget can be enlarged by clicking on the magnifying glass to examine the structure and its superposition in more detail or viewed in stereographic mode bmy clicking on the 3D icon. Query and target can selectively be hidden to facilitate the analysis. Furthermore, the Jmol widget is capable of displaying ligands and highlighting sequence identities in the structure alignment.

The Jmol widget instantly visualizes alignments. The magnifying glass triggers a larger view of the structure(s) displayed in the widget. The 3D icon enables the stereographic display of the proteins.

References

Wiederstein, M., Gruber, M., Frank, K., Melo, F. and Sippl, M.J. (2014) Structure-Based Characterization of Multiprotein Complexes. Structure 20, 1063-1070. [view]

Berman, H. M. et al. (2000). The Protein Data Bank. Nucleic Acids Research. 28, 235–242. [view]

Sippl M. J. (2009). Fold Space Unlimited. Current Opinion in Structural Biology. 19:312-320. [view]

Sippl M. J. (2008). On distance and similarity in fold space. Bioinformatics 24, 872-873. [view]

Sippl M. J. and Wiederstein, M. (2012). Detection of correlations in protein structures and molecular complexes. Structure 20, 718-728. [view]

Suhrer S., Wiederstein M., Gruber M. and Sippl M. J. (2009). COPS — a novel workbench for explorations in fold space. Nucleic Acids Research. 37:W539-W544. [view]