library(bib2df)
library(stevemisc)
library(stringi)
# load a bib file to data frame
bib_df <- bib2df(file="Crump.bib")
# clean entries
bib_df$TITLE <- stri_replace_all_regex(bib_df$TITLE, "[\\{\\}]", "")
bib_df$JOURNAL <- stri_replace_all_regex(bib_df$JOURNAL, "[\\{\\}]", "")
bib_df$BOOKTITLE <- stri_replace_all_regex(bib_df$BOOKTITLE, "[\\{\\}]", "")
# convert a single row back to .bib entry
bib_entry <- paste0(capture.output(df2bib(bib_df[1,])), collapse="")
bib_entry
# print out the citation
stevemisc::print_refs(bib_entry,
csl = "apa.csl",
spit_out = TRUE,
delete_after = FALSE)Customizing a publication list with R Markdown
NOTE: Some of the code here stopped working with an upgrade in R.
I’ve been using R Markdown to generate my lab website for years. I recently switched from the generic R markdown website to a website generated by pkgdown. I’m happy with the result. As a part of the migration I’m revisiting individual pages like my publications page.
Over the years I’ve tried different ways to list publications. I like any process that takes a .bib file containing my publications, and then auto-generates everything I want to have.
bibbase
I was previously using bibbase, which takes a .bib file as input and embeds a list of publications into a webpage. For example, I used to generate a publication list by inserting a script into the .Rmd for my publications page.
<script src="https://bibbase.org/show?bib=https://crumplab.github.io/Crump.bib&jsonp=1&nocache=1&theme=side&authorFirst=1"></script>
It was quick, easy, and pretty good overall.
bibbase issues
But, there were nuisances. I couldn’t get the formatting exactly right. I don’t think bibbase supports different .csl formats, so it doesn’t display citations in APA format.
Bibbase recognizes extra tags in the .bib file to define arbitrary links, and then have the links printed to each citation. For example, a citation might have a pdf, a website, and data associated with it. That was nice.
However, the links double-clicked themselves. I’m not sure why this happened to me, but clicking a link to download a .pdf would cause the file to be downloaded twice. That was annoying.
What I wanted
Here is the workflow that I wanted to achieve:
- Maintain my list of publications in a zotero folder. Then, export the folder as a biblatex repository (with .pdfs).
- Have an
.Rmdfile that reads in the.bibfile, and then outputs the list of publications - The list ideally could be formatted by any
.cslfile, which would make it easy to output in APA format - The list should automatically add any extra links and stuff that I want (provided those things can be extracted from the
.bibfile).
R Markdown issues and solutions
R Markdown is generally great for citing things. For example, I could cite a paper (Vuorre and Crump 2021), the citation would appear in the text, and a full citation would be printed in a reference section at the end of the document.
However, it’s not so easy to print a full citation in the middle of an R Markdown document, in a style that you want defined by .csl, and with additional stuff you might want like extra links.
At least, I couldn’t find a way to do that until this morning, when I came across a life-saver function from stevemisc called print_refs().
There’s at least a handful of ways to input a .bib file into R, and then print out a single entry. For example, RefManageR can do something like this, but it doesn’t support .csl, so the output may not be in the style you want (and it doesn’t output to APA).
Here’s a quick example of print_refs() in action.
I’m so glad this function exists. It turns the .bib file into markdown that can be printed directly inside an .Rmd. And, this can be done programmatically using knitr chunks. For example, using results=asis in the knitr chunk options allows the citation to printed to the .Rmd document.
```{r, results="asis", echo=FALSE}
print_me <- paste0(stevemisc::print_refs(bib_entry,csl = "apa.csl",
spit_out = FALSE,
delete_after = FALSE), collapse=" ")
cat(print_me)
```And, this means the citation should show up nicely on the webpage, like this:
Adding links to the citation
A next step was to add any other links to a given citation. I add extra tags to a .bib file in the extra field for citations in zotero. For example, this line is in the extra field for the Vuorre and Crump (2021) paper.
tex.url_website: https://crumplab.github.io/vertical/
As a result, when the .bib file is loaded into R as a data.frame, it will contain a column called URL_WEBSITE. I can then retrieve that info and write some custom code to smash together the markdown for a citation, along with any html I want to add it. The script below auto-generates a list of the first five publications in the .bib file (after sorting by year, so the most recent are first).
# sort bib_df by year
bib_df <- bib_df[order(bib_df$DATE, decreasing=T),]
# print individual entries to page
for (i in 1:5 ){
t_bib_entry <- paste0(capture.output(df2bib(bib_df[i,])), collapse="")
t_md_citation<- paste0(stevemisc::print_refs(t_bib_entry,csl = "apa.csl",
spit_out = FALSE,
delete_after = FALSE), collapse=" ")
cat(t_md_citation)
cat("<span class = 'publinks'>")
if(any(names(bib_df)=="FILE")){
if( !is.na(bib_df[i,"FILE"]) ){
pdf_url <- paste0("../Crump/",bib_df[i,"FILE"], collapse = "")
cat(c(" ",'<a href="',pdf_url,'"> <i class="fas fa-file-pdf"> pdf </i></a>'),
sep="")
}
}
if(any(names(bib_df)=="URL_WEBSITE")){
if( !is.na(bib_df[i,"URL_WEBSITE"]) ){
pdf_url <- as.character(bib_df[i,"URL_WEBSITE"])
cat(c(" ",'<a href="',pdf_url,'"> <i class="fas fa-globe"> website </i></a>'),
sep="")
}
}
if(any(names(bib_df)=="URL_DATA")){
if( !is.na(bib_df[i,"URL_DATA"]) ){
pdf_url <- as.character(bib_df[i,"URL_DATA"])
cat(c(" ",'<a href="',pdf_url,'"> <i class="fas fa-database"> data </i></a>'),
sep="")
}
}
cat("</span>")
cat("\n\n")
}NOTE: the pdf links weren’t working…oops, will fix that below.
That’s all
I now have a working pipeline that inputs a .bib file, and outputs a list of publications in APA format, with a few customizable bells and whistles.
I would feel like this excursion was wrapped up if I refactored the script into a set of functions. But, I’ll leave that for another day.
Functionalizing
Ideally I would like to run a single function like this, and have a whole publication list generated, complete with extra links and icons add to each entry.
bib_2_pub_list("mybib.bib")I don’t have that solution yet, but may update this post when I have time to make progress in that direction.
In order for the above to work it be necessary to include any metadata for the links in the .bib file. This could be done using the extras field in zotero. I’m already using this approach to export urls. I ran into a few roadblocks attempting to generalize this approach.
Alternatively, two inputs might be better. For example, a .yml file could be used to define metadata for links.
bib_2_pub_list("mybib.bib","mybib.yml")Hmmm, need to brainstorm a .yml structure. This should work. A citation key, followed by numbered links, each containing a name, url, and font awesome icon.
vuorreSharingOrganizingResearch2021:
link1:
name: 'website'
url: 'https://www.crumplab.com/vertical'
icon: 'fas fa-globe'
link2:
name: 'github'
url: 'https://github.com/CrumpLab/vertical'
icon: 'fas fa-github'
behmerCrunchingBigData2017:
link1:
name: 'data'
url: 'https://github.com/CrumpLab/BehmerCrump2017_BigData'
icon: 'fas fa-database'
I can read in the .yml like this, which turns everything into a list.
yml_links <- yaml::read_yaml("Crump.yml")Then, need to write some functions…
add_link_icon <- function(url_path,url_text, icon_class){
html <- glue::glue('<a href = "{url_path}"> <i class="{icon_class}"> {url_text} </i></a>')
cat(" ",html, sep="")
}bib_2_pub_list <- function(bib,yml,pdf_dir,base_url_to_pdfs){
# load bib file to df
bib_df <- bib2df::bib2df(bib)
# clean {{}} from entries
# to do: improve this part
bib_df$TITLE <- stringi::stri_replace_all_regex(bib_df$TITLE, "[\\{\\}]", "")
bib_df$JOURNAL <- stringi::stri_replace_all_regex(bib_df$JOURNAL, "[\\{\\}]", "")
bib_df$BOOKTITLE <- stringi::stri_replace_all_regex(bib_df$BOOKTITLE, "[\\{\\}]", "")
# sort bib_df by year
# to do: add sort options
bib_df <- bib_df[order(bib_df$DATE, decreasing=T),]
# read yml with links for bib entries
yml_links <- yaml::read_yaml(yml)
# print entries
for (i in 1:dim(bib_df)[1] ){
# convert row to .bib entry
# to do: make row to bib entry a function
t_bib_entry <- paste0(capture.output(bib2df::df2bib(bib_df[i,])), collapse="")
# generate markdown text for citation
t_md_citation<- paste0(stevemisc::print_refs(t_bib_entry,csl = "apa.csl",
spit_out = FALSE,
delete_after = FALSE), collapse=" ")
cat(t_md_citation)
cat("<span class = 'publinks'>")
### add pdf links
if( !is.na(bib_df$FILE[i]) ) { #check pdf exists
pdf_name <- basename(bib_df$FILE[i])
rel_path_to_pdf <- list.files(here::here(pdf_dir),
basename(bib_df$FILE[i]),
recursive=T)
build_url <- paste0(base_url_to_pdfs,"/",rel_path_to_pdf,collapse="")
crumplab::add_link_icon(build_url,"pdf","fas fa-file-pdf")
}
## add all other links
if( exists(bib_df$BIBTEXKEY[i],yml_links) ) { # check yml bib entry exists
link_list <- yml_links[[bib_df$BIBTEXKEY[i]]]
for(l in link_list){
crumplab::add_link_icon(l$url,l$name,l$icon)
}
}
cat("</span>")
cat("\n\n")
}
}Does it blend?
crumplabr::bib_2_pub_list("Crump.bib",
"Crump.yml",
"pkgdown/assets/Crump/files",
"https://www.crumplab.com/Crump/files")That works pretty well.
Next step is to include this function in my crumplab package that is part of this webpage, and make it work for real.