Where does the interaction data available for search and download on this web portal come from?

The protein-protein interaction (PPI) data comes from two different sources. The majority was generated in a  series of systematic screens of ever larger ORF collections that were performed at the Center for Cancer Systems Biology (CCSB). These screens were performed using a systematic binary protein interaction mapping pipeline based upon primary screens using a high-throughput yeast two-hybrid (Y2H) assay, followed by validation of the dataset using two or more orthogonal assays (Dreze et al Methods in Enzymology 2010).

The remainder of the PPIs, Lit-BM, represents a curated set of high quality PPIs reported in the literature with at least two experimental evidences of which at least one represents a binary method for PPI detection (Rolland et al Cell 2014). PPIs from Lit-BM have been shown to recover at similar rates compared to PPIs identified in systematic screens at CCSB when tested in PPI detection assays that are orthogonal to Y2H (Rolland et al Cell 2014).


Does the interaction data originate from experiments and/or predictions?

All PPIs provided for search and download via this web portal have direct experimental evidence.


I have my query genes in a different identifier format than currently allowed for search on this web portal. What can I do to still use them as query?

Currently our portal can only be searched using gene symbols, Uniprot, Entrez Gene, or Ensembl identifiers. You can convert your list of query genes into any of these identifiers using this gene mapping tool of UniProt


Why does my search not return any PPIs?

We have currently screened ORFs corresponding to over 17,500 human protein-coding genes using our binary interaction mapping pipeline (a full list of the genes we have screened is available here). However, we may not have screened your gene of interest yet because we do not currently have an ORF clone available for this gene. If you have an ORF for a gene that we currently do not cover and if you would be ready to share this ORF with us for screening purposes, please, contact us.

The other possibility for why there is no PPI returned for your query is that even though we might have screened for PPIs with an ORF of your gene of interest, this ORF may not have resulted in any PPIs. While our binary interaction mapping pipeline is designed to be systematic and unbiased, there are some proteins which may prove to be refractory to the assays used. For example, (i) proteins that are secreted or require significant post-translational modification may not form stable interactions under our assay conditions, (ii) some human proteins may be unstable or not fold correctly when expressed in yeast, or (iii) some proteins may only interact as parts of large complexes and not as binary pairs.


What would be a good confidence score cutoff to filter the interactions?

The confidence score is intended to rank human binary PPIs identified in systematic screens at CCSB based on their biophysical quality, rather than serve as an absolute cutoff to filter interactions. This score quantifies only a small variance in biophysical quality within the dataset, therefore should not be used to discard PPIs for quality concerns but instead it can be used to prioritize a list of PPIs of interest for experimental follow-up.


Why is there not a confidence score for every PPI? 

The confidence score of a pair is calculated based on several features of how the interaction was detected during the screening. This data is only available for pairs detected in the most recent screens (HI-III-18), and hence we are only able to calculate confidence scores for these pairs.


In which format do I need to save the search results for upload into Cytoscape?

To upload the search results into Cytoscape, export the interactions and interactors as .csv files or open Cytoscape and click on the results page of your search on the little network icon in the bottom left corner of the network visualization window. It will open your search results right in Cytoscape.


Can I use your unpublished interaction data in my publication?

Yes, there are no restrictions on using small numbers of our unpublished interactions in your publications. We have a publication moratorium on global analyses on the full unpublished dataset. For more details please see the Guidelines on use of preliminary data.


How should I cite the web portal?

A manuscript to describe the web portal is in preparation. Please, check back for updates. To cite the interaction data, please, see below.


How should I cite the published interaction data?

If you use published interaction data, please cite the relevant publication indicated in the About section.


How should I cite the unpublished interaction data?

Users are expected to acknowledge the following in all oral or written presentations, disclosures, or publications of the analyses:

The Center for Cancer Systems Biology (CCSB) at the Dana-Farber Cancer Institute

The funding organization(s) that supported the work:

(1) The National Human Genome Research Institute (NHGRI) of NIH

(2) The Ellison Foundation, Boston, MA

(3) The Dana-Farber Cancer Institute Strategic Initiative


How can I get information on the clone used to identify an interaction returned from my search?

The clones used in our screens come from the ORFeome clone collection assembled at CCSB and via the ORFeome Collaboration. For a given PPI shown on the results page, experimental information is provided upon click on the PPI. A PPI can have been found in multiple screening efforts with different Y2H assay versions and in different orientations (DNA-binding domain and activation domain fusions of both interaction partners). For every experimental instance, both interaction partners are displayed with their internal ORF IDs. Clicking on an ORF ID will redirect you to the corresponding results page on the ORFeome web portal with details on the cloning strategy, the source material and the nucleotide sequence of the clone.


Why are there interactions involving non-coding genes?


We use a sophisticated pipeline to map our ORFs, using their nucleotide and protein sequences, to gene and transcript models from GENCODE. GENCODE provides the most up-to-date annotation of the human genome and all genes identified therein. Based on current knowledge, GENCODE annotates genes into different categories, such as protein-coding and non-coding. The majority of the ORFs used in our screening efforts have originally been obtained from a cDNA library generated from a transcriptome from a given sample, and represent transcripts of genes with a start and stop codon that therefore have the potential to be translated into a protein sequence. PPIs identified for an ORF from a non-coding gene in the Y2H assay might not be functional because the ORF might not be translated into a functional protein sequence in vivo. However, equally likely is the possibility that our current genome annotation is incomplete and the fact that an ORF in our collection can mediate detectable PPIs, can represent experimental evidence suggesting that the corresponding gene might actually be protein-coding. 


Isn't yeast two-hybrid data full of false positives?

No! Like any other experimental approach, the quality of the data generated is dependent on the careful design of the experiment and rigorous attention to detail in performing the experiments. We have been developing our binary interaction mapping pipeline for over 15 years and established numerous quality control measures. All of our primary yeast two-hybrid datasets are validated by testing a subset of interactions in at least two orthogonal assays to ensure that the quality of the dataset is equal to, or greater than, a representative sample of interactions selected from the literature (Venkatesan et al Nature Methods 2009, Rolland et al Cell 2014).