In this OPP series we will be interviewing scientists in Plant Pathology or related areas who has embraced open science and contributed knowledge and tools to advance the field. Our first OPP interview features Dr. Niklaus Grünwald, a plant pathologist with USDA ARS, Corvallis, OR, USA. Nik is well recognized by his research on population genetics and genomics of plant pathogens - mainly on oomycetes of major importance to global agriculture.
Our first OPP interview features Dr. Niklaus Grünwald @PhytophthoraLab on Twitter, a plant pathologist with USDA ARS, Corvallis, OR, USA. Nik is well recognized by his research on population genetics and genomics of plant pathogens - mainly on oomycetes of major importance to global agriculture. The Grünwald Lab has made significant scientific contributions to the understanding of the population biology and evolution of Phytophthora infestans worldwide.
More recently, his students have developed R packages which have contributed considerably to advance and facilitate the analysis of genetic and genome sequence data with applications in taxonomy and evolution, beyond the field of plant pathology. Members of the Grünwald Lab has offered workshops on reproducible research and the analysis of genomic data during the APS meetings using those packages. It is our great pleasure to have Nik answering six questions that we prepared for him.
I have used programming as a tool on and off since my undergraduate days. In fact I used to earn pocket money as a programmer. As an econ undergrad major I learned Pascal as my first language. During my PhD I programmed extensively in SAS. As a postdoc I wanted to calculate genotypic diversity indices and could not do it in SAS so I programmed some routines in C to enable calculation of rarefaction as well as for speed.
About 8 years ago my lab switched from SAS to using R. I am absolutely loving R because of the open source (and free) nature, the ability to produce publication ready graphs, tools for reproducibility, and the fact that many members of the R community contribute packages. The fact that R is open is important because you can actually see the raw code and find mistakes and contribute new functions. SAS procedures are closed. We switched once we could do everything in R that I used to do in SAS. I also love R because it is free. When you have several postdocs and graduate students in the lab, subscriptions to software become very expensive. Another reason I love R is that we can create electronic lab notebooks using R Markdown that allow for better, albeit not perfect, reproducibility.
My way of approaching this is that graduate students should be trained in hypothesis-driven biological research. Interviewing for a faculty position or promotion and tenure rely on the perceived impact of their fundamental biological research. However, while conducting this research we often find ourselves developing custom scripts in ecology, genetics or genomics. If these scripts can be used by other colleagues we release these as a package in R (Fig. 1). We have done this so far with the R packages poppr, vcfR, metacodeR, taxa, effectr, and popprxl. This sometimes results in an extra publication (eg: MetacodeR and vcfR) with a modest amount of additional effort. Methods papers releasing R packages are highly cited; thus, computational biology resources contribute significantly to the citations and the impact of a scientist. But I caution students and postdocs not to rely solely on computational papers since biology papers will remain the bread and butter for recognition and impact of our work.
Large data sets, like genome sequences, microbiome or imaging data, are wonderful resources for all of us if they are available with the proper metadata and copyright. The Resource Announcement we created for APS journals as a new category provides a venue for getting citations and credit for these types of efforts. We want to incentivize release of data, code and related resources so that it can be reused. Several journals are doing this now including mBio and Molecular Ecology Resources. With the new category of Resource Announcement (Fig. 2), we are keeping plant pathology resources within the ecosystem of APS journals.
I think every plant pathologist benefits from following the #OPP philosophy of sharing data, code and resources openly and following principles of reproducible research. In the US, the federal government requires open access to publications, data, code and related resources from federally funded grants. Major donors like the Welcome Trust or the Bill and Melinda Gates Foundation similarly require open approaches. #OPP provides a network and resources for training that benefit all of us in doing open research. I think the fundamental role that #OPP is playing is to incentivize open, transparent and reproducible research. Not everybody needs to be a hardcore coder, but basic coding including reproducible lab notebooks help reproducibility and benefits all of us. Given organizations like Retraction Watch, PubPeer, and others that monitor reproducibility and rigor in science, it is incumbent upon all of to us to keep good records so that we can correct mistakes in the literature when they are identified. #OPP facilitates this process and helps all of us learn from each other to become better open scientists.
You do not need to be a programmer if you want to work in my lab. However, you should not be afraid of using computational tools to ask biological questions. Today’s biology is highly quantitative and I do not see this changing anytime soon. Molecular biology when I went to grad school was not quantitative. In the era of high throughput sequencing the fields of genetics and genomics have adopted highly sophisticated computational and statistical approaches.
The tweet below highlights members of the Grünwald Lab teaching a workshop during ICPP 2018, Boston, USA.
.@zacharyfoster19 @knaus_brian @ncarleson teaching microbiome analysis workshop at #icpp2018 pic.twitter.com/9pC4IlZoxZ
— Niklaus Grunwald (@PhytophthoraLab) July 29, 2018
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/openplantpathology/OpenPlantPathology, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".