Posted in Other 30+ days ago.
Type: Full Time
Job Description: Biomedical Research Data Management Engineer
The Getz Lab in the Broad Institute's Cancer Program has an open position for a Biomedical Research Data Management (RDM) Engineer. The holder of this position will be responsible for organizing all data assembled and produced in the course of our lab's projects. Since the Getz Lab is primarily a computational lab emphasizing data-driven science, the RDM Engineer will be involved and contribute to all the lab's research.
The RDM engineer will define and implement a data life cycle model and apply that model to all or the lab's research projects. Our lab's data life cycle model will be drawn from the Harvard Medical School Data Management Working Group (), but tailored to meet the specific needs of our lab and the Broad Institute. The HMS life cycle model covers the:
Initial production of raw data and initial short-term storage
Analysis of raw data, generating analyzed data
Publication and distribution of data, including providing access to datasets e.g. through a public data repository and lab portals
Evaluation of which data should be retained long-term
Long-term storage of data to meet requirements of human subject data privacy guidelines, other granting agencies, and the Broad Institute, as well as deletion of unneeded data.
Archiving data as part of an historical record
Developing and managing data storage policies and monitoring expenses (e.g. cloud and on-prem data storage tiers).
Discovery, citation, and reuse of data through well organized, well documented datasets
The RDM engineer will work with project leads to create data management plans for each of the lab's projects. The data management plans will reflect the evolving requirements of a project. These data management plans will account for security, accessibility, sharing, storage, maintenance and reuse. The RDM Engineer will then be responsible for implementing these plans.
The RDM Engineer will also make significant contributions to a project's data analysis, including harmonizing data drawn from multiple public repositories, conducting ETL (extract, transform, load) operations and ensuring 'omics data sets are compatible with our lab's data analysis pipelines and needs.Requirements
A Bachelors or Masters degree in Computer Science, Bioinformatics or a related field
Fluency in Python and Unix Shell programming
Experience in cloud infrastructure
Familiarity with genomic data types and data sets
Excellent communication skills and the ability to perform effectively in a fast paced environment
Must be able to handle a variety of tasks; effectively solve problems with numerous and complex variables; and be able to shift priorities rapidly
Excellent oral and written communication skills (The position will work closely with small project teams of 2-4 researchers. Interfacing with project teams in-person and on-line is a key aspect of this position.)
Data organization experience, with a preference for genomic and biomedical data.
Understanding of security and cost of data storage and data lifecycle.
A medical, genomics, or scientific background is preferred, but we primarily are looking for an enthusiasm to contribute to advances in the scientific understanding and treatment of cancer .Lab Overview
The Getz Lab has established itself as a world leader in the development and application of computational tools for the analysis of cancer genomes. The lab also has an experimental arm that conducts wet-lab in-vitro studies that complement its computational activities. The lab specializes in cancer genome analyses which include: (i) Characterizing the cancer Genome, (ii) Identifying cancer-associated genes and pathways, (iii) Characterizing the heterogeneity and clonal evolution of cancer, (iv) increasing our understanding of how cancers beoccme resistant to therapies.
The Getz Lab brings together approximately 45 research scientists, engineers and academic trainees, focusing their expertise and talents on increasing our understanding of cancer biology and identifying avenues for improved treatments.
Characterizing the cancer genome
Cancer is a disease of the genome that is driven by a combination of possible germline risk-alleles together with a set of \\"driver\\" somatic mutations that are acquired during the clonal expansion of increasingly fitter clones. In order to generate a comprehensive list of all germline and somatic events that occurred during life and the development of the cancer, the lab develops and applies highly sensitive and specific tools for detecting different types of mutations in massively-parallel sequencing data. The volume, noise and complexity of these data require developing computational tools using state-of-the-art statistical and machine learning approaches to extract the signal from the noise.
Identifying cancer-associated genes and pathways
Detected oncogenic events across a cohort of samples are analyzed, searching for genes/ pathways, as well as non-coding variants, that show significant signals of positive selection. To that end, we construct a statistical model of the background mutational processes and then detect genes that deviate from it. As part of constructing the models, we study and infer the mutational processes that affected the samples (carcinogens, defects in repair mechanisms, etc.) and their timing.
We have developed tools for detecting significantly gained or lost genes in cancer and genes with increased density or irregular patterns of mutations. Our work demonstrated the importance of modeling the heterogeneity of these models across patients, sequence contexts and the genome, when searching for cancer genes.
Heterogeneity and clonal evolution of cancer
Cancer samples are heterogeneous, containing a mixture of normal cells and cancer cells that often represent multiple subclones. We developed and continue to develop tools for characterizing the heterogeneity of cancer samples using copy-number and mutation data measured on bulk samples and now also analyzing the genomic material in individual cells. Using these tools, we can infer which mutations are clonal or sub-clonal, as well as estimate the number of subclones and their distribution over space and time. Correlating these analyses with clinical data, we can gain insite into the development of resistance during the course of treatment. We are now working to introduce these concepts to clinical trials and eventually clinical care.
Members of the Getz Lab work closely with clinical researchers and in large collaborative projects sponsored by NIH (e.g., TCGA, CPTAC) and charitable initiatives (e.g, Stand Up to Cancer, Chan Zuckerberg Initiative) and industry partners. These collaborative efforts are significantly enhanced by the effective sharing of experimental data, analysis results and tools.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.
Check out this video for a look into our community!.