=Paper=
{{Paper
|id=Vol-1149/bd2014_constantinescu
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1149/bd2014_constantinescu.pdf
|volume=Vol-1149
}}
==None==
ABSTRACTS : scientific Implementing a clinical genomics infrastructure to sequence 18,000 human genomes per year Liviu Constantinescua, Mark Cowleya,b, Kevin Yinga, Peter Budda, Derrick Lina, Warren Kaplana,b, Marcel Dingera,b Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, NSW 2010, Australia a St Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, NSW 2010, Australia b SUMMARY Clinical genomics is a rapidly evolving field focused on the use of genome sequencing information to guide patient diagnosis and treatment. Whole genome sequencing has been dubbed “the test to replace all genetic Dr Liviu Constantinescu tests”, since one sequencing run can identify all genetic variants present in a patient’s genome. Implementing clinical-grade, whole genome sequencing across large patient cohorts represents a substantial big data Information Architect challenge. We will present our “Sabretooth” plan for scaling operations in our centre from an estimated 800 Garvan Institute of Medical Research to 18,000 genomes per year. INTRODUCTION Sequencing of patient genomes is anticipated to have a large impact upon healthcare and the delivery of l.constantinescu@garvan.org.au personalised medicine in three key areas: stratifying patients for appropriate cancer treatment; diagnosing inherited genetic disease; and tailoring prescriptions by anticipating adverse drug reactions. Recently, the Kinghorn Centre for Clinical Genomics (KCCG) purchased the Illumina HiSeq X™ Ten sequencing system, which has the capacity to sequence 18,000 whole human genomes at an average of 30x coverage, Liviu Constantinescu completed his PhD in computer per year. This will generate 150 genomes every 3 days, or 1.4 PB per year. science at the University of Sydney as part of the Biomedical and Multimedia Information Technology Although we anticipate that the storage issue can be addressed via currently available computing architectures, Research Group, specialising in software development the new challenge lies in the delivery of this architecture in a manner that is both sufficiently versatile to and multimedia technologies. His research focuses on keep pace with the rapidly changing bioinformatics landscape and rigorous enough to fulfil the stringent improving the practice of healthcare through state-of- regulatory requirements for clinical data. This presentation will focus on the implementation of modern the-art networking and software development methods. software development processes and infrastructure adopted by the thought leaders in IT5, to meet NATA quality standards and allow us the flexibility to continuously improve our processes and analytics. DESCRIPTION Our bioinformatic workflow includes phenotype capture, read alignment, mutation calling, variant annotation and filtering by inheritance pattern, rarity, predicted functional impact and known disease association. Each stage utilises one or more software components, most of which are developed externally. These are supported by information systems that manage clinical data, laboratory processes and logistics. Every study traverses this “Sabretooth” pipeline, from accession to result. Systems and modules in this pipeline undergo continuous, research-driven change, resulting in increased accuracy and diagnostic sensitivity. As the state of the art advances, obsoleted components must adapt or be replaced. This continuous change has a flow-on effect on subsequent components, and on the middleware interconnecting them. It poses four major challenges: managing software change; adapting and modularising workflows; generating auditable records; and allowing re-runs of legacy pipelines. The first two of these apply equally to clinical and research genomics, whereas the latter two are specific to a clinical context. To manage software and requirements changes, the KCCG has put an agile software development process in place to continuously improve the modules, applications and information systems that make up our pipeline. By implementing daily stand-ups, feature backlogs, test driven development, automated testing suites, continuous integration and continuous deployment we gain confidence not only in the quality of the software we produce, but in our ability to manage the rapid release/deployment cycle of our systems, recover from hardware failures and roll out new features to the clinical and research arms of our group. Our implementation of the agile process strongly addresses the requirements and recommendations cited as critical to the development of high-quality bioinformatics software in the scientific literature 1,2,5,6. For high-level management of repeatable, modular workflows, the KCCG have entered into collaboration with 26 #bd14 | big data conference the SeqWare working group at the Ontario Institute for Cancer Research (OICR). We’ve developed an in-house adaptation of their SeqWare framework, a set of infrastructure tools designed to guarantee the correctness of sequence analysis pipelines and deploy new versions on-the-fly. This framework supports a full hierarchy of functional, scientific and regression tests; retains history and metrics for every run; and incorporates a powerful query engine for interrogating our growing corpus of genome datasets8. Finally, a suite of agile process management and documentation tools centred around Atlassian’s JIRA3 augments our pipeline via automatic collection of business intelligence data regarding every stage of the process, guaranteeing end-to-end auditability and allowing clinical, analytical and management teams to tap into continuously updated information that traditional paper-based reporting cannot capture4. This information integrates release management, continuous integration and issue tracking, so the scope of every software and analytics change can be constantly monitored in terms of its impact on business and clinical outcomes. CONCLUSION KCCG is leading the charge toward the implementation of large-scale clinical genomics in Australia. We present Sabretooth as a case study in balancing the demands of clinical-grade informatics against the need to manage continuous change, so as to deliver the benefits of the most recent genomic research to all Australian patients in a cost-effective and reliable way. REFERENCES 1. K. Rother, et al., A toolbox for developing bioinformatics software. Briefings in Bioinformatics 2011, 13(2), 244–257. 2. K. Beck, Test Driven Development: By Example. Addison-Wesley Professional, Boston, 2002. 3. Jira: Bug tracking, issue tracking, and project management. Available: http://www.atlassian.com/software/jira. Accessed 15/1/2013. 4. D. Larson, Agile Methodologies for Business Intelligence. Business Intelligence and Agile Methodologies for Knowledge-Based Organizations: Cross-Disciplinary Applications. IGI Global, 2012. 101-119. Web. 15 Jan. 2014 5. S. Baxter, S. Day, et. al., Scientific Software Development Is Not an Oxymoron. PLOS Computational Biology 2006, 2(9), e87. 6. D. Kane, M. Hohman, et al., Agile methods in biomedical software development: a multi-site experience report, BMC Bioinformatics 2006, 7:273 7. T. Nyrönen, J. Laitinen, et al. (2012). Delivering ICT infrastructure for biomedical research. Presented in the WICSA/ECSA ‘12: Proceedings of the WICSA/ECSA 2012 Companion Volume, ACM. 8. B. O’Connor, B. Merriman, et al., SeqWare Query Engine: storing and searching sequence data in the cloud, BMC Bioinformatics 2010, 11 Suppl 12, S2. 3 - 4 april 2014 | melbourne 27