Grant: $510,300 - National Institutes of Health - Sep. 28, 2009
No votes have been cast for this award yet
Award Description: Information on subcellular location is critical to determining the function of unknown proteins, and high-resolution knowledge of locations for entire proteomes will be essential for systems biology approaches to copmprehensive modeling of cell behavior. In addition, many diseases are associated with changes in the subcellular location of proteins. We have coined the term Location Proteomics to describe that subfield of proteomics concerned with automatically, objectively and quantitatively determining the subcellular locations of expressed proteins. The heart of this approach is the use of the machine learning methods for analyzing subcelllar patterns that we have described over the past thirteen years. Under grant RO1 GM075205, we are currently using a combination of CD-tagging (a powerful approach to produce cell lines expressing GFP-tagged proteins), automated microscopy, and machine learning to determine the subcellular location in NIH 3T3 cells of around 100 proteins per week. This competitive revision proposal addresses a higher level question - characterizing the dependency of protein subcellular location on other factors, such as the presence of drugs or expression of inhibitory RNAs. Identifying changes in the subcellular locations of specific proteins in response to various compounds is the basis of large scale cell-based assays used in basic research and drug development. Such assays typically take a single protein and examine its response to tens or hundreds of thousands of compounds. If we consider a two-dimensional matrix in which rows correspond to proteins and columns correspond to compounds, current cell-based assay approaches can fill in a single row in the matrix with (typically) binary values indicating whether a given protein responds to a given compound. Filling in the full matrix with current approaches would require running on the order of ten thousand different screens, which is well beyond the capabilities of all current screening efforts. If we wish to determine the effect of pairs of compounds, the matrix becomes three dimensional and it becomes completely unrealistic to consider filling it in by brute force. The key to our proposal is the use of Active Learning (a powerful machine learning technique) to iteratively propose, execute, and learn from expreiments in order to avoid exhaustive testing of every protein-compound combination. This project has two aims: To use active learning and robotics to determine how the subcellular location patterns of 94 GFP-tagged proteins expressed in NIH 3T3 cell cultures change upon application of 94 compounds, and to extend these methods to determine the effects of pairs of the same compounds on the subcellular location of the same proteins. The deliverables from this project will be the modeling and active learning software developed, the experimental protocols produced, the image data collected, the results generated on the feasibility and efficiency of our active learning approcah, and the model of dependency produced for the specific proteins and compounds studied. The work described here also has the potential for dramatically changing the way in which cell-based assays are carried out for drug discovery and repurposing.
Project Description: Award received at end of the current reporting quarter. Work on the project will begin immediately.
Jobs Summary: N/A (Total jobs reported: 0)
Project Status: Not Started
This award's data was last updated on Sep. 28, 2009. Help expand these official descriptions using the wiki below.