11-11-2024
Link to presentation:
Link to GitLab package template:
Imagine taking over a project with 10000 lines of dense code in a single file with no documentation or examples on how to run it.
Reusability:
Efficiency:
Reproducibility:
\(\rightarrow\) Software sustainability
\(\rightarrow\) Trade-off between costs and benefits
Estimating income from age and sex with linear regression:
Two ways to easily create an R package:
RStudio: File \(\rightarrow\) New Project \(\rightarrow\) New Directory \(\rightarrow\) R Package (with name testR
)
usethis package: usethis::create_package("testR")
Best practice: The usethis
package contains useful functions to automate package development
Creates new folder with minimal R package skeleton:
DESCRIPTION
: Metadata (e.g, package name, version, author, dependencies)NAMESPACE
: Which functions to export and which other packages to importR/
: R functions (with hello.R
example file).Rbuildignore
: Files to ignore when building the package (e.g., old R scripts)Build package by clicking on Build
\(\rightarrow\) Install
or devtools::install()
Advantages of functions:
Two ways to create a new function:
create_model.R
file in the R/
folderusethis::use_r("create_model")
. This will automatically create R/create_model.R
Best practice: Give functions clear and consistent names (e.g., create_model
instead of model
or create_mod
)
Example:
Best practice: Use clear and consistent argument names (e.g., dependent
instead of dep
; df
is a common abbreviation)
To try out the function, run devtools::load_all()
(or Ctrl+Shift+L) and then create_model(df, "income", c("age", "sex"))
R/
is executed when the binary package is built (e.g., devtools::install()
or by CRAN) and results are savedlibrary(testR)
)Example:
Important when defining aliases:
Best practice: Don’t use library
, require
, or source
in a package
Best practice: Don’t use functions that change the global state in a package, e.g., setwd
, options
, par
, instead use the withr
package
Example:
Code should be robust to avoid silent failures
Workflow for writing robust R functions:
df
is a data frame)Example:
R/create_model.R
create_model <- function(df,
dependent = character(),
predictors = character()) {
# Checks if arguments have expected type
stopifnot(
is.data.frame(df),
is.character(dependent),
is.character(predictors)
)
model_formula <- formula(
paste(dependent, "~", paste(predictors, collapse = " + "))
)
model <- lm(model_formula, data = df)
return(model)
}
R/create_model.R
create_model <- function(df,
dependent = character(),
predictors = character()) {
stopifnot(
is.data.frame(df),
is.character(dependent),
is.character(predictors)
)
# Checks if another assumption is met and handles exception
if (nrow(df) == 0) {
stop("Data frame not valid")
}
model_formula <- formula(
paste(dependent, "~", paste(predictors, collapse = " + "))
)
model <- lm(model_formula, data = df)
return(model)
}
R/create_model.R
create_model <- function(df,
dependent = character(),
predictors = character()) {
stopifnot(
is.data.frame(df),
is.character(dependent),
is.character(predictors)
)
# Returns an informative error message
if (nrow(df) == 0) {
stop("Data frame contains zero rows")
}
model_formula <- formula(
paste(dependent, "~", paste(predictors, collapse = " + "))
)
model <- lm(model_formula, data = df)
return(model)
}
Example:
R/create_model.R
create_model <- function(df,
dependent = character(),
predictors = character()) {
stopifnot(
is.data.frame(df),
is.character(dependent),
is.character(predictors)
)
if (nrow(df) == 0) {
stop("Data frame contains zero rows")
}
model_formula <- formula(
paste(dependent, "~", paste(predictors, collapse = " + "))
)
tryCatch(
{
model <- lm(model_formula, data = df)
},
error = function(error) {
# Does not know whether dependent variable is numeric
stop("Dependent variable must be numeric")
}
)
return(model)
}
R/create_model.R
create_model <- function(df,
dependent = character(),
predictors = character()) {
stopifnot(
is.data.frame(df),
is.character(dependent),
is.character(predictors)
)
if (nrow(df) == 0) {
stop("Data frame contains zero rows")
}
model_formula <- formula(
paste(dependent, "~", paste(predictors, collapse = " + "))
)
tryCatch(
{
model <- lm(model_formula, data = df)
},
error = function(error) {
# Returns what it knows
stop(paste("Model could not be created:", error$message))
}
)
return(model)
}
Best practice: Isolate side-effects (e.g. writing files, plotting) from core functions
Two (bad) extremes:
Best practice: Large functions with lots of documentation should have their own files. Small functions can be grouped together in one file
Function definitions can be found with Code
\(\rightarrow\) Go to File/Function
(Ctrl+.
) or by moving the cursor into the function name and pressing F2
Most R users test their code implicitly.
Typical development workflow:
Common problems with ad-hoc testing:
Advantages of automated testing with testthat
:
\(\rightarrow\) Tests as documentation and starting point for new developers
Setup automated testing with usethis::use_testthat()
\(\rightarrow\) creates folder tests/testthat/
for test files
Add a new test file with usethis::use_test("create_model")
which creates tests/testthat/test-create_model.R
with a dummy passing test
tests/testthat/test-create_model.R
Test passed 🎊
tests/testthat/test-create_model.R
Test passed 🌈
Best practice: Helper functions and the withr
package can be used to create self-sufficient and self-contained tests
\(\rightarrow\) Only proceed to next layer if previous layer succeeds
Regression tests: Check whether the output of a function is still the same (e.g., tables, plots)
\(\rightarrow\) Does not check if output is correct
R packages can be easily documented with roxygen2
:
NAMESPACE
Setup documentation with usethis::use_roxygen_md()
Add documentation to R/create_model.R
by clicking into the function definition and Code
\(\rightarrow\) Insert Roxygen Skeleton
To update documentation, run devtools::document()
(or Ctrl+Shift+D)
Example:
#' Create a linear regression model
#'
#' Creates a linear regression model from a data frame and dependent and independent variables.
#'
#' @param df A data frame containing the variables included in the model.
#' @param dep A single character string with the name of the dependent variable.
#' @param preds A character vector with the names of the independent variables.
#'
#' @return A linear regression model of class `"lm"`.
#'
#' @details The function uses the \link{lm} function to estimate a linear regression model.
#'
#' @export
#'
#' @examples
#' N = 100
#'
#' age <- sample(18:99, N, replace = TRUE)
#' sex <- sample(0:1, N, replace = TRUE)
#' income <- 2 + 0.1 * age + 0.2 * sex + rnorm(N)
#'
#' df <- data.frame(age, sex, income)
#'
#' mod <- create_model(df, "income", c("age", "sex"))
#'
create_model <- function(df, dep, preds) {
f <- formula(paste(dep, "~", paste(preds, collapse = " + ")))
m <- lm(f, data = df)
return(m)
}
Complex examples, background information (e.g., theories, model equations, simulation studies), and tutorials should not live in the function documentation but in vignettes.
Create a new vignette with usethis::use_vignette("create_model")
This creates a new vignettes/
folder with a create_model.Rmd
file.
Add content to vignettes/create_model.Rmd
Documentation for developers/users who see the package on GitHub/GitLab/CRAN
Answers three questions about a package:
Create a new R markdown README file with usethis::use_readme_md()
and add content to README.Rmd
Combine README, vignettes, and function documentation in a website with pkgdown
Setup website with usethis::use_pkgdown()
pkgdown
automatically collects all function documentation, vignettes, and README files and creates a website in the docs/
folder
Update website with pkgdown::build_site()
or usethis::build_site()
R/
and vignettes/
devtools::document()
devtools::load_all()
devtools::test()
or devtools::test_active_file()
If tests pass:
devtools::check()
Setup version control with usethis::use_git()
and connect to GitHub/GitLab with usethis::use_github()
or usethis::use_gitlab_ci()
Add automated testing on GitHub with usethis::use_github_action("testthat")
Running usethis::use_gitlab_ci()
creates a .gitlab-ci.yml
file in the root directory of the package:
.gitlab-ci.yml
image: rocker/tidyverse
stages:
- build
- test
- deploy
building:
stage: build
script:
- R -e "remotes::install_deps(dependencies = TRUE)"
- R -e 'devtools::check()'
# To have the coverage percentage appear as a gitlab badge follow these
# instructions:
# https://docs.gitlab.com/ee/user/project/pipelines/settings.html#test-coverage-parsing
# The coverage parsing string is
# Coverage: \d+\.\d+
testing:
stage: test
allow_failure: true
when: on_success
only:
- master
script:
- Rscript -e 'install.packages("DT")'
- Rscript -e 'covr::gitlab(quiet = FALSE)'
artifacts:
paths:
- public
# To produce a code coverage report as a GitLab page see
# https://about.gitlab.com/2016/11/03/publish-code-coverage-report-with-gitlab-pages/
pages:
stage: deploy
dependencies:
- testing
script:
- ls
artifacts:
paths:
- public
expire_in: 30 days
only:
- master