Customizing a publication list with R Markdown

Rstats

publication list

Notes on creating a customizable list of publications for an academic website using R markdown, .bib files, and other fun stuff.

Author

Matt Crump

Published

May 17, 2022

NOTE: Some of the code here stopped working with an upgrade in R.

I’ve been using R Markdown to generate my lab website for years. I recently switched from the generic R markdown website to a website generated by pkgdown. I’m happy with the result. As a part of the migration I’m revisiting individual pages like my publications page.

Over the years I’ve tried different ways to list publications. I like any process that takes a .bib file containing my publications, and then auto-generates everything I want to have.

bibbase

I was previously using bibbase, which takes a .bib file as input and embeds a list of publications into a webpage. For example, I used to generate a publication list by inserting a script into the .Rmd for my publications page.

 <script src="https://bibbase.org/show?bib=https://crumplab.github.io/Crump.bib&jsonp=1&nocache=1&theme=side&authorFirst=1"></script>

It was quick, easy, and pretty good overall.

bibbase issues

But, there were nuisances. I couldn’t get the formatting exactly right. I don’t think bibbase supports different .csl formats, so it doesn’t display citations in APA format.

Bibbase recognizes extra tags in the .bib file to define arbitrary links, and then have the links printed to each citation. For example, a citation might have a pdf, a website, and data associated with it. That was nice.

However, the links double-clicked themselves. I’m not sure why this happened to me, but clicking a link to download a .pdf would cause the file to be downloaded twice. That was annoying.

What I wanted

Here is the workflow that I wanted to achieve:

Maintain my list of publications in a zotero folder. Then, export the folder as a biblatex repository (with .pdfs).
Have an .Rmd file that reads in the .bib file, and then outputs the list of publications
The list ideally could be formatted by any .csl file, which would make it easy to output in APA format
The list should automatically add any extra links and stuff that I want (provided those things can be extracted from the .bib file).

R Markdown issues and solutions

R Markdown is generally great for citing things. For example, I could cite a paper (Vuorre and Crump 2021), the citation would appear in the text, and a full citation would be printed in a reference section at the end of the document.

However, it’s not so easy to print a full citation in the middle of an R Markdown document, in a style that you want defined by .csl, and with additional stuff you might want like extra links.

At least, I couldn’t find a way to do that until this morning, when I came across a life-saver function from stevemisc called print_refs().

There’s at least a handful of ways to input a .bib file into R, and then print out a single entry. For example, RefManageR can do something like this, but it doesn’t support .csl, so the output may not be in the style you want (and it doesn’t output to APA).

Here’s a quick example of print_refs() in action.

library(bib2df)
library(stevemisc)
library(stringi)

# load a bib file to data frame
bib_df <- bib2df(file="Crump.bib")

# clean entries
bib_df$TITLE <- stri_replace_all_regex(bib_df$TITLE, "[\\{\\}]", "")
bib_df$JOURNAL <- stri_replace_all_regex(bib_df$JOURNAL, "[\\{\\}]", "")
bib_df$BOOKTITLE <- stri_replace_all_regex(bib_df$BOOKTITLE, "[\\{\\}]", "")

# convert a single row back to .bib entry
bib_entry <- paste0(capture.output(df2bib(bib_df[1,])), collapse="")
bib_entry

# print out the citation
stevemisc::print_refs(bib_entry,
                      csl = "apa.csl",
                      spit_out = TRUE,
                      delete_after = FALSE)

I’m so glad this function exists. It turns the .bib file into markdown that can be printed directly inside an .Rmd. And, this can be done programmatically using knitr chunks. For example, using results=asis in the knitr chunk options allows the citation to printed to the .Rmd document.


```{r, results="asis", echo=FALSE}
  print_me <- paste0(stevemisc::print_refs(bib_entry,csl = "apa.csl",
                                        spit_out = FALSE,
                                        delete_after = FALSE), collapse=" ")
  cat(print_me)
```

And, this means the citation should show up nicely on the webpage, like this:

Adding links to the citation

A next step was to add any other links to a given citation. I add extra tags to a .bib file in the extra field for citations in zotero. For example, this line is in the extra field for the Vuorre and Crump (2021) paper.

tex.url_website: https://crumplab.github.io/vertical/

As a result, when the .bib file is loaded into R as a data.frame, it will contain a column called URL_WEBSITE. I can then retrieve that info and write some custom code to smash together the markdown for a citation, along with any html I want to add it. The script below auto-generates a list of the first five publications in the .bib file (after sorting by year, so the most recent are first).

# sort bib_df by year
bib_df <- bib_df[order(bib_df$DATE, decreasing=T),]

# print individual entries to page

for (i in 1:5 ){
  t_bib_entry <- paste0(capture.output(df2bib(bib_df[i,])), collapse="")
  t_md_citation<- paste0(stevemisc::print_refs(t_bib_entry,csl = "apa.csl",
                                        spit_out = FALSE,
                                        delete_after = FALSE), collapse=" ")
  cat(t_md_citation)



  cat("<span class = 'publinks'>")

  if(any(names(bib_df)=="FILE")){
    if( !is.na(bib_df[i,"FILE"]) ){
      pdf_url <- paste0("../Crump/",bib_df[i,"FILE"], collapse = "")
      cat(c("  ",'<a href="',pdf_url,'"> <i class="fas fa-file-pdf"> pdf </i></a>'),
        sep="")
    }
  }

  if(any(names(bib_df)=="URL_WEBSITE")){
    if( !is.na(bib_df[i,"URL_WEBSITE"]) ){
      pdf_url <- as.character(bib_df[i,"URL_WEBSITE"])
      cat(c("  ",'<a href="',pdf_url,'"> <i class="fas fa-globe"> website </i></a>'),
        sep="")
    }
  }

  if(any(names(bib_df)=="URL_DATA")){
  if( !is.na(bib_df[i,"URL_DATA"]) ){
      pdf_url <- as.character(bib_df[i,"URL_DATA"])
      cat(c("  ",'<a href="',pdf_url,'"> <i class="fas fa-database"> data </i></a>'),
        sep="")
    }
  }

  cat("</span>")
  cat("\n\n")
}

NOTE: the pdf links weren’t working…oops, will fix that below.

That’s all

I now have a working pipeline that inputs a .bib file, and outputs a list of publications in APA format, with a few customizable bells and whistles.

I would feel like this excursion was wrapped up if I refactored the script into a set of functions. But, I’ll leave that for another day.

via GIPHY

Functionalizing

Ideally I would like to run a single function like this, and have a whole publication list generated, complete with extra links and icons add to each entry.

bib_2_pub_list("mybib.bib")

I don’t have that solution yet, but may update this post when I have time to make progress in that direction.

In order for the above to work it be necessary to include any metadata for the links in the .bib file. This could be done using the extras field in zotero. I’m already using this approach to export urls. I ran into a few roadblocks attempting to generalize this approach.

Alternatively, two inputs might be better. For example, a .yml file could be used to define metadata for links.

bib_2_pub_list("mybib.bib","mybib.yml")

Hmmm, need to brainstorm a .yml structure. This should work. A citation key, followed by numbered links, each containing a name, url, and font awesome icon.

vuorreSharingOrganizingResearch2021:
  link1:
    name: 'website'
    url: 'https://www.crumplab.com/vertical'
    icon: 'fas fa-globe'
  link2:
    name: 'github'
    url: 'https://github.com/CrumpLab/vertical'
    icon: 'fas fa-github'

behmerCrunchingBigData2017:
  link1:
    name: 'data'
    url: 'https://github.com/CrumpLab/BehmerCrump2017_BigData'
    icon: 'fas fa-database'

I can read in the .yml like this, which turns everything into a list.

yml_links <- yaml::read_yaml("Crump.yml")

Then, need to write some functions…

add_link_icon <- function(url_path,url_text, icon_class){
  html <- glue::glue('<a href = "{url_path}"> <i class="{icon_class}"> {url_text} </i></a>')
  cat("  ",html, sep="")
}

bib_2_pub_list <- function(bib,yml,pdf_dir,base_url_to_pdfs){

  # load bib file to df
  bib_df <- bib2df::bib2df(bib)

  # clean {{}} from entries
  # to do: improve this part
  bib_df$TITLE <- stringi::stri_replace_all_regex(bib_df$TITLE, "[\\{\\}]", "")
  bib_df$JOURNAL <- stringi::stri_replace_all_regex(bib_df$JOURNAL, "[\\{\\}]", "")
  bib_df$BOOKTITLE <- stringi::stri_replace_all_regex(bib_df$BOOKTITLE, "[\\{\\}]", "")

  # sort bib_df by year
  # to do: add sort options
  bib_df <- bib_df[order(bib_df$DATE, decreasing=T),]

  # read yml with links for bib entries
  yml_links <- yaml::read_yaml(yml)

  # print entries

  for (i in 1:dim(bib_df)[1] ){

    # convert row to .bib entry
    # to do: make row to bib entry a function
    t_bib_entry <- paste0(capture.output(bib2df::df2bib(bib_df[i,])), collapse="")
    # generate markdown text for citation
    t_md_citation<- paste0(stevemisc::print_refs(t_bib_entry,csl = "apa.csl",
                                                 spit_out = FALSE,
                                                 delete_after = FALSE), collapse=" ")
    cat(t_md_citation)

    cat("<span class = 'publinks'>")

    ### add pdf links
    if( !is.na(bib_df$FILE[i]) ) { #check pdf exists

      pdf_name <- basename(bib_df$FILE[i])
      rel_path_to_pdf <- list.files(here::here(pdf_dir),
                                    basename(bib_df$FILE[i]),
                                    recursive=T)
      build_url <- paste0(base_url_to_pdfs,"/",rel_path_to_pdf,collapse="")
      crumplab::add_link_icon(build_url,"pdf","fas fa-file-pdf")

    }

    ## add all other links
    if( exists(bib_df$BIBTEXKEY[i],yml_links) ) { # check yml bib entry exists

      link_list <- yml_links[[bib_df$BIBTEXKEY[i]]]

      for(l in link_list){
        crumplab::add_link_icon(l$url,l$name,l$icon)
      }

    }
    cat("</span>")
    cat("\n\n")
  }

}

Does it blend?

crumplabr::bib_2_pub_list("Crump.bib",
                         "Crump.yml",
                         "pkgdown/assets/Crump/files",
                         "https://www.crumplab.com/Crump/files")

via GIPHY

That works pretty well.

Next step is to include this function in my crumplab package that is part of this webpage, and make it work for real.

References

Vuorre, Matti, and Matthew J. C. Crump. 2021. “Sharing and Organizing Research Products as R Packages.” Behavior Research Methods 53: 792–802. https://doi.org/10.3758/s13428-020-01436-x.