Introducing Julia to Plant Pathologists

Julia4PlantPath R4PlantPath Reproducible Research

Do you feel the need for speed?

Adam Sparks https://adamhsparks.netlify.app
2022-02-06

Wikipedia describes Julia by saying, it “is a is a high-level, high-performance, dynamic programming language. While it is a general-purpose language and can be used to write any application, many of its features are well suited for numerical analysis and computational science.”1 In short, it can be scripted like R, which many of us are likely to be more familiar with. But provides efficiency gains, being much faster than R to execute code2. While this may not matter much in day-to-day work for you, if you have a large set of data to work with, e.g., large geographic or temporal sets of data covering country or global areas for many years, Julia may be useful to you in performing the analysis. Alternatively, maybe you’re just curious and want to try something new. Even though it’s not as mature, Julia does have some features that R does not like multiple dispatch, which may be of interest to more advanced users.

To introduce you to Julia, I will demonstrate how to calculate area under the disease progress curve (AUDPC) (Shaner and Finney 1977) in R and then in Julia.

Installation

If you’ve ever installed R then you’ll be able to install and use Julia as well. Simply head over to https://julialang.org/downloads/ and download the proper version for your computer’s OS and processor. Then follow the directions from https://julialang.org/downloads/platform/ to install it for your platform.

Calculating AUDPC

Using R

In the “Disease Progress Over Time” module of the “Epidemiology and Ecology in R,” Sparks et al. (2008) demonstrate how to write an R function to calculate AUDPC as follows.

# Build a function for AUDPC calculation
# the left curly bracket indicates the beginning
# of the function
audpc <- function(disease.severity, time.period) {
  #n is the length of time.period, or
  #  the total number of sample dates
  n <- length(time.period)
  
  # meanvec is the vector (matrix with one dimension)
  # that will contain the mean percent infection
  # it is initialized containing -1 for all entries
  # this sort of initialization is sometimes useful
  # for debugging
  meanvec <- matrix(-1, (n - 1))
  
  # intvec is the vector that will contain the length of
  # time between sampling dates
  intvec <- matrix(-1, (n - 1))
  
  # the loop goes from the first to the penultimate entry
  # the left curly bracket indicates the beginning of
  # commands in the loop
  for (i in 1:(n - 1)) {
    #the ith entry in meanvec is replaced with the
    #   mean percent infection
    #between sample time i and sample time i+1
    meanvec[i] <- mean(c(disease.severity[i],
                         disease.severity[i + 1]))
    
    #the ith entry in intvec is replaced with the length
    # of the time interval between time i and time i+1
    intvec[i] <- time.period[i + 1] - time.period[i]
    
    #the right curly bracket ends the loop
  }
  
  # the two vectors are multiplied together
  # one entry at a time
  infprod <- meanvec * intvec
  
  # the sum of the entries in the resulting vector
  # gives the AUDPC
  sum(infprod)
  
  #the right curly bracket ends the function
}

This function in R can be improved for efficiency as follows.

r_audpc <- function(disease.severity, time.period) {
  # only calculate n-1 once
  n <- length(time.period) - 1
  
  # set up vectors, not matrices, as double precision to contain the mean
  # percent infection (meanvec) and difference between obs times (intvec)
  meanvec <- vector(mode = "double", length = n)
  intvec <- vector(mode = "double", length = n)
  
  # using `seq_len()` is safer than 1:n, if n = 0
  for (i in seq_len(n)) {
    
    # double brackets for assigning values in a vector are preferred
    # using `sum()` in place of `+` for `i + 1` is faster
    meanvec[[i]] <-
      mean(c(disease.severity[[i]], disease.severity[[sum(i, 1)]]))
    intvec[[i]] <- time.period[sum(i, 1)] - time.period[[i]]
  }
  
  # save a step and put everything on one line, returning the AUDPC value
  return(sum(meanvec * intvec))
}

Or, if you prefer, you can install libraries from CRAN that provide functions for calculating AUDPC rather than writing your own. The R packages agricolae (de Mendiburu 2021) and epifitter (Alves and Del Ponte 2021) provide easy to use functions to calculate AUDPC, audpc() and AUDPC(), respectively. The following code an example from agricolae’s help showing how to calculate AUDPC in R using our own function, r_audpc().

> dates <- c(14, 21, 28) # days
> evaluation <- c(40, 80, 90)
> r_audpc(evaluation, dates)
[1] 14 21 28
[1] 40 80 90
[1] 1015

Using Julia

Currently I’m unaware of any packages for Julia offering this functionality. However, we can easily duplicate r_audpc() from R to Julia but need to make a few minor changes for differences in the languages. Most of the following function should look familiar. The first thing is that we cannot have a . in the function argument names, so we’ll just use a single word here for clarity. The second is that we cannot use <- for assigning values as in R. We must use = to assign in Julia, however you can use = in R as well if you wish. Functions are written in roughly the same manner as in R and they are called in the same way, fn_name(arg1, arg2, ...).

function j_audpc(evaluation, dates)
    
    n = length(evaluation) - 1
    
    # again we preallocate vectors to hold the values but also create a third
    # object, out, since it will be inside our for loop unlike in R
    meanvec = Base.zeros(n)
    intvec = Base.zeros(n)
    out = 0.0

    # the for loop looks roughly the same but here we just use 1:n
    for i in 1:n
        # rather than using a mean() we just divide by 2 here
        meanvec[i] = (evaluation[i] + evaluation[i + 1]) / 2
        
        # this is the same
        intvec[i] = dates[i + 1] - dates[i]
        
        # here we sum the values in the loop as Julia will compile this and it
        # will run more quickly. In R this is undesireable behaviour so we
        # performed this outside of the loop.
        
        # also note here that before the `*` we have a `.`, this means to
        # "broadcast" across the vector. In R this automatically happens, in
        # Julia to perform operations to all values in a vector you must
        # broadcast
        out = sum(meanvec .* intvec)
    end
  
    # return the object, `out` from the for loop
    return out

# end the function (no curly brackets!)
end
j_audpc (generic function with 1 method)
julia> dates = [14, 21, 28] # days
julia> evaulation = [40, 80, 90]
julia> j_audpc(evaulation, dates)
3-element Vector{Int64}:
 14
 21
 28
3-element Vector{Int64}:
 40
 80
 90
1015.0

The AUDPC values match!

Conclusion

This is just a quick example of how you can use Julia in plant pathology to show new users how it compares with R with a commonly used function. If you’re curious to know more, the Julia docs are a great place to start. In particular, the noteworthy differences is a useful bit to refer to if you’re familiar with R.

For a more detailed comparison of complete Julia and R packages that offer an existing plant disease model, EPIRICE (Savary et al. 2012), see Epicrop.jl (Sparks 2022), a port of epicrop (Sparks et al. 2021) to Julia, which has demonstrated faster speeds in benchmarking tests for the same rice disease predictions.

Colophon

This post was constructed using R Version 4.1.2 (R Core Team 2021) and Julia Version 1.7.1 (Bezanson et al. 2017) using JuliaCall Pull Request #174.

Alves, K. dos S., and Del Ponte, E. M. 2021. epifitter: Analysis and Simulation of Plant Disease Progress Curves. Available at: https://CRAN.R-project.org/package=epifitter.
Bezanson, J., Edelman, A., Karpinski, S., and Shah, V. B. 2017. Julia: A fresh approach to numerical computing. SIAM review. 59:65–98.
de Mendiburu, F. 2021. agricolae: Statistical Procedures for Agricultural Research. Available at: https://CRAN.R-project.org/package=agricolae.
R Core Team. 2021. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.
Savary, S., Nelson, A., Willocquet, L., Pangga, I., and Aunario, J. 2012. Modeling and mapping potential epidemics of rice diseases globally. Crop Protection. 34:6–17.
Shaner, G., and Finney, R. E. 1977. The effect of nitrogen fertilization on the expression of slow-mildewing resistance in Knox wheat. Phytopathology. 67:1051–1056.
Sparks, A. 2022. Simulation modelling of crop diseases using a healthy-latent-infectious-postinfectious (HLIP) model in Julia. Available at: https://github.com/adamhsparks/Epicrop.jl.
Sparks, A. H., Hijmans, R., Savary, S., Pangga, I., and Aunario, J. 2021. epicrop: Simulation modelling of crop diseases using a susceptible-exposed-infectious-removed (SEIR) model. Available at: https://github.com/adamhsparks/epicrop.
Sparks, A., Esker, P. D., Bates, M., Dall’Acqua, W., Guo, Z., Segovia, V., et al. 2008. Ecology and epidemiology in R: Disease progress over time. The Plant Health Instructor.

  1. https://en.wikipedia.org/wiki/Julia_(programming_language)↩︎

  2. https://julialang.org/benchmarks/↩︎

References

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/openplantpathology/OpenPlantPathology, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".