Bioinformatics - Services
Research Project Design and Development: Bioinformatics/Computational specialists are available to Cancer Institute of New Jersey members to help map out the nature and scope of cyber-infrastructure needs, as well as the data to be collected and analyzed in support of the study objectives and specific aims. This includes detailed discussion on the underlying science, preliminary data mining on public data repositories and the published literature, consideration of the appropriate tools and techniques (and the specific data needed) to try to anticipate possible barriers and/or sources of error that may limit the detection of signal from noise etc.
Development of Methodology: In some instances, consultation on project design identifies the need to develop unique analytical methods to fully capitalize on the data, to conserve resources, or to optimize experiments. An example of this need is the large quantity of gene expression data generated using DNA microarray technology, as well as next generation sequencing (NGS) data. In this respect, close collaboration with colleagues in our Biometrics and Functional Genomics Shared Resource is essential.
Grant Proposal Review: Members from the Core review the data analysis/bioinformatics section in grant proposals from basic, clinical, and population science researchers. This includes (but not limited to) writing the relevant section on the grant where our expertise and resources is to be deployed.
Computation and Bioinformatics Analyses: Methods for appropriate data analyses, computational resources, tools and techniques depend on the objectives and specific aims of the study under consideration. The integration of the tools, compute resources and data visualization is undertaken by the bioinformatics specialists working in partnership with the IT members to facilitate the interpretation of theresults generated to test the hypotheses of a study.
Publications: The Bioinformatics Shared Resource offers assistance in the production of data reports (tables, graphs, description of the tools/techniques used etc. to support grant applications, publication in peer reviewed journals etc.
Education and Training Activities: Bioinformatics specialists, faculty and staff support the education and training activities. This includes (but not limited to) short day-log based training sessions on the use of specific tools, workshops on specific topics related to the analysis of data generated form experiments such as Microarrays, lectures to graduate students as guest lecturers and planned course modules in NGS, Pathway Analysis, Systems Biology etc. We feel that these education and training activities are important for communications between the Core and CINJ members as it serves as an avenue to advertise our capabilities to support their research as well as improving the overall quality of the science this shared resource supports.
Bioinformatics Support for Genome Scale Analyses: Next Generation Sequencing (NGS) promises to revolutionize biomedical research. Through technological advances based on massive parallelization, NGS provides an enormous number of reads and permits sequencing of entire genomes (and their transcriptome). The Bioinformatics Shared Resource made considerable investment in computational hardware and bioinformatics software development to manage the very large data sets generated and to extract biomedical insights from the data collected especially given the imminent expansion of the Functional Genomics Shared Resource (see Shared Resource section 09.1.02) to include NGS technologies, and the quantum leap in computational biology this will require. We have pipelines to analyze data collected in RNA-Seq, Chip-Seq, whole genome sequencing, etc. using in-house shared memory computational platforms as well as deployed massively parallel sequence alignment/assembly codes on our University's Newton MPP supercomputer. Bioinformatics deployed, and is using, sequence assembly, alignment codes (SOAP, Bowtie, Abyss) and data visualization tools (Galaxy, IGV from the Broad Institute) on shared memory and massively parallel supercomputers to analyze and visualize the vast data set generation from the Illumina and ABI/Solid Next Generation sequencers.
Microarray Database and Analysis: The Bioinformatics Shared Resource supports experimental design and analysis of microarray data, often generated by the Functional Genomics Shared Resource, including pathway analysis and molecular modeling. We expect this effort to lead to co-authorships in publications in the near future and requests for bioinformatics support in future grant applications. In addition, we have deployed and manage data backup/storage (via a dedicated server and disk/tape devices) to ensure that the data collected in a PI laboratory is automatically copied to dedicated server storage at the Cancer Institute Data Center every
Cancer Institute of New Jersey Warehouse Services/Integrative Cancer Biology and Data Mining: Biomedical research has yet to fully harness the transformative power of information technology to enhance research productivity and efficiency and accelerate research discoveries that transform clinical practice. Effective aggregation and management of knowledge and data resources is critical to advancing clinical and translational science. It is a priority of the Bioinformatics Shared Resource to organize and implement informatics initiatives and their associated cyber-infrastructure and support capabilities to meet these needs. As in most medical centers, the information management needs of the combined clinical and translational research community at Cancer Institute of New Jersey have historically been met by disparate resources available within individual research programs. In the past two years, Cancer Institute of New Jersey leadership and individual investigators have been strong advocates and supporters of the Bioinformatics Shared Resource in its efforts to develop and deploy a model (based on the NCI-funded caBIG initiative) of information integration and data sharing to catalyze the translation of research discoveries and to advance research into quantifiable outcomes across traditional institutional and geographic boundaries. The Bioinformatics Shared Resource is working with Cancer Institute investigators to facilitate mining of existing data based on each research project's scientific priorities. Therefore, a significant component of the resource includes detailed outreach activities that are designed to maximize the power of the Bioinformatics Shared Resource's expertise and resources to advance research. A major component of the efforts in support of integrative cancer biology is focused on developing, deploying and supporting data repositories using commercial and/or open source software that meet the specific requirements of Cancer Institute members. The data repositories provide federated access to clinical data, archive research datasets from completed studies, link research data sources for multidisciplinary collaboration, and serve as a platform for translational research. Clinical data sources include the ARIA-EMR containing encounter, laboratory, other EMR data, radiology PACS, computerized physician order entry, radiology reports, pathology reports, surgical notes, clinical history, and nursing notes. Examples of research data sources that will be integrated into the data repository include genomics/proteomics and epidemiologic data from Cancer Institute Shared Resources, NJ State Cancer Registry/SEER and data bases linking Network of Hospitals, etc. As part of data repository development, wherever possible, the Bioinformatics Shared Resource is implementing a standard format to which data from heterogeneous sources will be transformed for further use. The Bioinformatics Core is working on deploying a secure Web Portal interface that will provide a mechanism for researchers collaborating in that specific project (for example members of a SPORE) to identify available data sources as well as guide them through the workflow of requesting and receiving data from the data repository.
Clinical Data Repository and Services: There is a need for a data repository for the population sciences, epidemiologists and clinicians in the various disease specific groups at Cancer Institute of New Jersey to view populations and disease trends of patients seen at the Cancer Institute. To support phase I studies and NCI-investigator driven clinical trials at the Cancer Institute, the Bioinformatics Shared Resource works with the Cancer Institute's Office of Human Research Services shared resource in providing hardware, software and disaster recovery support for their OnCore Clinical Trial Management System. They are working with the commercial vendor to expand on the capabilities for real-time electronic data capture from our Aria-EMR as well as expanding on the OnCore Biospecimen Module that will facilitate integration with the data capture efforts in our Biospecimen Repository Shared Resource. The Bioinformatics Shared Resource ensures that the right data in the right format flows into appropriate data repositories in an efficient and secure manner. They will work with CINJ members to ensure that the appropriate data repositories, such as Caisis, will be developed, deployed and managed. Caisis, an open source, web-based, cancer data management system that integrates research with patient care, it was developed by Memorial Sloan-Kettering Cancer Center and is now utilized by many of the Comprehensive Cancer Centers for their own research as well as for the exchange of data to create larger populations and cohorts for study. Collaboration with multiple centers has allowed Caisis to develop and evolve in an environment of constant feedback and scrutiny and the Bioinformatics Shared Resource is implementing Caisis version 4.5 to unify all the disparate disease and tumor databases that we have under a common platform.
Chemical Informatics Analysis: Informaticians use small molecule/peptide databases for 'in silico' screening studies of key enzymes and receptors to identify lead targets for design and development of novel therapeutics. Molecular dynamics techniques are applied to develop a broader understanding of the biophysical significance of mutation. This service is a key component of the translational science efforts at the Cancer Institute of New Jersey.
Support of Other CINJ Shared Resources: The Bioinformatics Shared Resource develops the web portals for access to the services of most CINJ Shared Resources. In addition, more specific bioinformatics needs of various Shared Resources are met, as follows:
Biometrics Shared Resource - The Bioinformatics Shared Resource provides (1) software engineering support and web portal development and maintenance; for example, the Biometrics Shared Resource required conversion of a complex Fortran program based on a bioinformatics statistical algorithm to Java to ensure greater scalability and portability; (2) system administration support to maintain system and application software and trouble shoot the desktop workstations used for statistical analysis by members of the Biometrics Shared Resource; and (3) high performance computing resources and scientific computing expertise on multi-core massively parallel supercomputers.
Functional Genomics Shared Resource - We are in the midst of a major expansion of the Functional Genomics Shared Resource to accommodate the rapidly growing needs of our membership for Next-Generation Sequencing technologies. The Bioinformatics Shared Resource will deploy scalable, efficient and secure storage and archiving capabilities to ensure seamless flow from data acquisition to storage and processing. This involves increasing the existing computing resources within the Bioinformatics Shared Resource to meet immediate needs (currently 30TB and projected to grow to 100TB within a year). Based on the anticipated growth in storage needs and the costs involved in deploying and maintaining an in-house solution, we are also exploring new avenues (i.e. based on Cloud Computing) as a cost-efficient, financially and operationally viable option. The Bioinformatics Shared Resource also provides data analysis for users of the Functional Genomics Resource.
Biorepository Shared Resource - The availability of annotated specimens is enhanced by leveraging data available from our new EMR aswell as from the NJ State Tumor Registry (NJSTR).
Histopathology and Imaging Shared Resource - The Bioinformatics Shared Resource provides cyber-infrastructure advice and support for theTelepathology Project, linking instrumentation for remote viewing of organs for transplant within a facilitylocated at the Robert Wood Johnson University Hospital. In addition, informatics and software engineering experts at Dr. Foran’s Bioimaging Center are using the Bioinformatics Shared Resource HPC resources located at the CINJ data center for code development and large-scale simulations. The Bioinformatics Shared Resource is working in partnership with Dr. Foran’s team to develop a large-scale storage facility to support research needs in pathology informatics in general and the data generated by the tissue microarray service. The Bioinformatics Shared Resource will maintain the hardware, provide system administration support and will be responsible for data backup and disaster recovery of all pathology imaging data.
Office of Human Research Services - The Bioinformatics Shared Resource provides cyber-infrastructure advice and support for the Oncore Clinical Trials Software, a caBIG compatible commercial CTMS system that supports the data collection/management and study protocols for CINJ’s clinical trials. In addition, Oncore includes a software module that supports the data management for CINJ’s Biorepository. Our Database Architect worked with the company (Forte Research Systems) and biorepository colleagues to migrate the data stored in a homemade Perl-based software into the Oracle-based OnCore module. Our HPC supervisor worked with the company to migrate the entire OnCore application and legacy data to new hardware, in the process implementing a failover mechanism by reconfiguring the hardware so that two of the servers are located in our data center and the other two identical servers at the Robert Wood Johnson University Hospital Data Center (with appropriate tape backup for disaster recovery). This would ensure 24/7 availability of data to OHRS and minimize loss of data in case of hardware or network failure.
Epidemiology Services Shared Resource - The Bioinformatics Shared Resource provides the newly developing Epidemiology Services Shared Resource with infrastructure and design support for research tracking and analytical databases. Specifically, in support of all ESC-run population-based investigations of cancer prevention, etiology, treatment and outcomes, the Bioinformatics Shared Resource houses and maintains an application server (SunFire x4140) equipped with 2X6-core Opteron cpus, 32GB of RAM and 4x146GB disks linked to a back-up application server; and a storage server with 18 terabytes of secure, encrypted storage linked to a duplicate back-up server. The application server can be used for statistical analysis of data sets using R or SAS. The storage server provides up to 18 terabytes of secure, encrypted storage as noted. In addition, the developing Epidemiology Services Shared Resource has access to the Bioinformatics Shared Resource secure, encrypted web server for hosting various research projects/initiatives. As new study-specific needs arise for data base design and access controls, the Bioinformatics Shared Resource works with Epidemiology Support Services staff to meet these needs and support ongoing work.
Web-based Application Development - It is difficult for many projects to separate web development (web site) from medical/clinical informatics and database design/installation since all these applications are through a web portal requiring a “website/interface” with underlying database design, programming and linkage to informatics. While a majority of the web portals are used to support ongoing outreach and educational initiatives, a significant number are targeted to the collection and integration of data in support of research in population science and translational research where integration of data from the lab, clinic and population based studies will advance our research agenda in cancer prevention, control and survivorship. At a practical level, the resources and manpower devoted to developing and deploying web portals benefits our research agenda at CINJ as it helps to increase recruitment for clinical trials carried out at CINJ by making patients and clinicians aware of the ongoing trials. Examples of projects implemented and maintained by the Bioinformatics Shared Resource include the following:
Network Hospital Portal - The Cancer Institute of New Jersey Network includes 16 hospitals across the State that provides cancer care to over one third of New Jerseyans. Each Network hospital offers their patients access to the latest cancer therapies and state-of-the-art cancer care available only at NCI-designated Cancer Centers and their Networks. This database connects the affiliated hospitals via web (http://cinjweb.umdnj.edu/network/content/about).
Investigator Toolkit - Easy access to various tools, documents, and processes for Clinical Trial submission. This site can be accessed internally at CINJ: http://intranet.cinj.org/OHRS/investigatorToolBox.asp.