From epub to print copy, which of the traditional ‘pain’ journals is the quickest?

Peter Kamerman

5 February 2017

Background

Traditional journals (those that publish hardcopy volumes) typically publish an electronic version of the article before the print copy is produced, presumably because the practise reduces the time between a journal article being accepted for publication and the information being disseminated. These epubs have a DOI, which makes them readily citable.

In my neck of the woods (South Africa), the number of original research outputs by a university is factored into the annual government subsidy an institution receives. Only articles with page numbers are included in the calculation, which for traditional journals means that the articles must have been published in hardcopy format. At my institution, the University of the Witwatersrand, a small fraction of that government subsidy for publications trickles down to the originating labs as a research incentive. It’s not much money, but every bit helps in these tight funding times, and so ‘time to print’ is something we have to consider when selecting which journal(s) to submit our work to.

To help us decide which of the traditional pain-focused journals have the quickest electronic to hardcopy conversion rate, I have performed a very crude analysis of the ‘time to print’ by the four top-ranked traditional pain journals (based on impact factor) we typically consider submitting manuscripts to (Table 1.).

# Make a dataframe to populate the table
tab_df <- tibble(Journal = c('<a href="http://journals.lww.com/clinicalpain/pages/default.aspx" target="_blank">Clinical Journal of Pain</a>', '<a href="http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1532-2149" target="_blank">European Journal of Pain</a>', '<a href="http://www.jpain.org/home" target="_blank">Journal of Pain</a>', '<a href="http://journals.lww.com/pain/pages/default.aspx" target="_blank">PAIN</a>'), 
                 `Impact factor` = c(2.7, 2.9, 4.5, 5.6),
                 `Year started` = c(1985, 1997, 2000, 1975),
                 `Frequency (issues per year)` = c(12, 10, 12, 12)) 

# Print table
kable(x = tab_df,
      align = 'lrrr',
      caption = '<b>Table 1.</b> Journals included in assessment')

**Table 1.** Journals included in assessment
Journal	Impact factor	Year started	Frequency (issues per year)
Clinical Journal of Pain	2.7	1985	12
European Journal of Pain	2.9	1997	10
Journal of Pain	4.5	2000	12
PAIN	5.6	1975	12

Getting the data

I obtained the electronic and print publication dates of articles for the past four years from PubMed. Beyond the usual web browser method of searching PubMed, you can remotely access the full database through the user-friendly and well-documented Entrez Programming Utilities API (E-utilities). In R you can make these queries to the PubMed database directly using packages such as xml2, or if you are not familiar with using web APIs, the guys at rOpenSci have given us the excellent rentrez package. I have used the direct approach here (see code below).

# Set eval = FALSE after first run so as to speed-up knit on future run
# First run used to save data outputs from this chunk to file, which can 
# be read into memory in future runs.

############################################################
#                                                          #
#            Query PubMed for records from the             #
#         top four journals from the past 4 years          #
#                                                          #
############################################################

# Set E-Utilities base query string
base_url <- 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/'

# Set database query to search and fetch from PubMed
database_query <- 'esearch.fcgi?db=pubmed'
database_query2 <- 'efetch.fcgi?db=pubmed'

# Set search criteria
## Restricted to:
### 1. journal articles
### 2. articles with abstracts
### 3. Clin J Pain, Eur J Pain, J Pain, PAIN
### 4. Entrez entry date range of 2013/01/01 to 2016/12/31
### 5. First 10,000 articles
### 6. xml format
terms <- '&term=((journal+article[Publication+Type]+AND+hasabstract[All+Fields])+AND+("2013/01/01"[EDAT]+:+"2016/12/31"[EDAT]))+AND+((("Pain"[Journal]+OR+"J+Pain"[Journal])+OR+"Clin+J+Pain"[Journal])+OR+"Eur+J+Pain"[Journal])&rettype=xml&retmax=10000'

# Piece together the search query string
search_query <- paste0(base_url, 
                       database_query,
                       terms)

# Execute search
search_get <- read_xml(search_query)

# Find xpath for PMIDs from 'query'
pmid_path <- xml_find_all(search_get, xpath = './/Id')

# Use xpath to extract PMIDs
pmids <- xml_text(pmid_path)

############################################################
#                                                          #
#        Fetch the records using the returned PMIDs        #
#                                                          #
############################################################

# Get the number of records returned by the search
record_count = length(pmids)
    
# Split the 'pmids' vector into n = 200 sized chunks 
# (the max number of ids the API can handle)
splitter <- seq(from = 1,
                to = record_count,
                by = 200)

# Create an empty list of length 'splitter'
splitter_list <- vector(mode = 'list',
                        length = length(splitter))

# Split the list of PMIDs, and paste each into a single string
for(i in seq_along(splitter)) {
    splitter_list[[i]] <- pmids[splitter[[i]]:(splitter[[i]] + 199)]
    splitter_list[[i]] <- paste(splitter_list[[i]],
                                collapse = ',')
}

# Create empty list of length 'splitter_list'
pubmed_query <- vector(mode = 'list',
                       length = length(splitter_list))

# Populate empty list with repeated PubMed query calls
for(i in seq_along(splitter_list)) {
    pubmed_query[[i]] <- paste0(base_url,
                                database_query2,
                                '&id=',
                                splitter_list[[i]],
                                '&retmode=xml&retmax=200')
}

# Fetch pubmed xml records
record <- map(pubmed_query,
              read_xml)

############################################################
#                                                          #
#      Make a user-defined function ('parse_record')       #
#               to extract date information                #
#                                                          #
############################################################

parse_record <- function(record) {
    
    # Packages to load when fucntion used outside this .Rmd script #
    ################################################################
    # library(dplyr)
    # library(xml2)
    # library(stringr)
    
    
    # Set XPaths to xml nodes #
    ###########################
    
    #-- Publisher -----------------------------------------------------------#
    
    publisher_path <- xml_path(
            xml_find_all(record,
                         './/ISSNLinking'))
    
    #-- Journal -------------------------------------------------------------#

    journal_path <- xml_path(
            xml_find_all(record,
                         './/ISOAbbreviation'))
    
    #-- Volume --------------------------------------------------------------#

    volume_path <- xml2::xml_path(
        xml2::xml_find_all(record,
                           './/Volume'))
    
    #-- Issue ---------------------------------------------------------------#

    issue_path <- xml2::xml_path(
        xml2::xml_find_all(record,
                           './/Issue'))
    
    #-- PMID ----------------------------------------------------------------#

    pmid_path <- xml_path(
        xml_find_all(record,
                     ".//ArticleId[@IdType = 'pubmed']"))

    #-- Publication status --------------------------------------------------#

    status_path <- xml_path(
        xml_find_all(record,
                     './/PublicationStatus'))

    #-- Year / month published ----------------------------------------------#

    year_published_path <- xml_path(
        xml_find_all(record,
                     './/PubDate/Year'))
    
     month_published_path <- xml_path(
        xml_find_all(record,
                     './/PubDate/Month'))

    #-- Year / month / day online -------------------------------------------#

    year_online_path <- xml_path(
        xml_find_all(record,
                     ".//ArticleDate[@DateType = 'Electronic']/Year"))
    
    month_online_path <- xml_path(
        xml_find_all(record,
                     ".//ArticleDate[@DateType = 'Electronic']/Month"))
    
    day_online_path <- xml_path(
        xml_find_all(record,
                     ".//ArticleDate[@DateType = 'Electronic']/Day"))
    
    #-- Year / month / day entrez -------------------------------------------#
    
    # PAIN stopped giving the 'ArticleDate' info in 2015, so also get 
    # 'PubMedPubDate[@PubStatus = 'entrez']', which is a close match. 
    
    year_entrez_path <- xml_path(
        xml_find_all(record,
                     ".//PubMedPubDate[@PubStatus = 'entrez']/Year"))
    
    month_entrez_path <- xml_path(
        xml_find_all(record,
                     ".//PubMedPubDate[@PubStatus = 'entrez']/Month"))
    
    day_entrez_path <- xml_path(
        xml_find_all(record,
                     ".//PubMedPubDate[@PubStatus = 'entrez']/Day"))

    # Extract information using XPaths #
    ####################################

    #-- Publisher -----------------------------------------------------------#
    
    # Define vector for publisher name
    publisher <- vector(mode = 'character',
                        length = length(publisher_path))

    for(i in 1:length(publisher_path)) {
        publisher[[i]] <- str_to_lower(
            xml_text(
                xml_find_first(record,
                               publisher_path[[i]])))
    }

    # Make article marker for joins
    ## Define vector for 'trimmed' publisher path
    publisher_path2 <- vector(mode = 'character',
                              length = length(publisher_path))

    for(i in 1:length(publisher_path)) {
        publisher_path2[[i]] <- 
            str_extract(publisher_path[[i]],
                        '/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
    }

    # Make dataframe
    publisher2 <- data.frame(article_node = publisher_path2,
                             publisher = publisher)
    
    #-- Journal -------------------------------------------------------------#

    # Define vector for journal name
    journal <- vector(mode = 'character',
                      length = length(journal_path))

    for(i in 1:length(journal_path)) {
        journal[[i]] <- str_to_lower(
            xml_text(
                xml_find_first(record,
                               journal_path[[i]])))
    }

    # Make article marker for joins
    ## Define vector for 'trimmed' journal path
    journal_path2 <- vector(mode = 'character',
                            length = length(journal_path))

    for(i in 1:length(journal_path)) {
        journal_path2[[i]] <- 
            str_extract(journal_path[[i]],
                        '/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
    }

    # Make dataframe
    journal2 <- data.frame(article_node = journal_path2,
                           journal = journal) %>%
        mutate(journal = str_replace_all(journal,
                                         pattern = '[.]',
                                         replacement = ''))
    
    #-- Volume ----------------------------------------------------------------#
    
        # Define vector for journal volume
        volume <- vector(mode = 'numeric',
                         length = length(volume_path))
    
        for(i in 1:length(volume_path)) {
            volume[[i]] <- xml2::xml_text(
                xml2::xml_find_first(record,
                                     volume_path[[i]]))
        }
    
        # Make article marker for joins
        ## Define vector for 'trimmed' volume path
        volume_path2 <- vector(mode = 'character',
                               length = length(volume_path))
    
        for(i in 1:length(volume_path)) {
            volume_path2[[i]] <- 
                stringr::str_extract(
                    volume_path[[i]],
                    '/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
        }
    
        # Make dataframe
        volume2 <- dplyr::data.frame(article_node = volume_path2,
                                     volume = volume) %>%
            separate(volume, 
                     into = c('volume', 'other'),
                     extra = 'merge') %>%
            mutate(volume = as.numeric(volume)) %>%
            select(article_node, volume)
    
    #-- Issue -------------------------------------------------------------------#
    
        # Define vector for journal issue
        issue <- vector(mode = 'numeric',
                        length = length(issue_path))
    
        for(i in 1:length(issue_path)) {
            issue[[i]] <- xml2::xml_text(
                xml2::xml_find_first(record,
                                     issue_path[[i]]))
        }
    
        # Make article marker for joins
        ## Define vector for 'trimmed' issue path
        issue_path2 <- vector(mode = 'character',
                              length = length(issue_path))
    
        for(i in 1:length(issue_path)) {
            issue_path2[[i]] <- 
                stringr::str_extract(
                    issue_path[[i]],
                    '/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
        }
    
        # Make dataframe
        issue2 <- dplyr::data.frame(article_node = issue_path2,
                                    issue = issue) %>%
            separate(issue, 
                     into = c('issue', 'other'),
                     extra = 'merge') %>%
            mutate(issue = as.numeric(issue)) %>%
            select(article_node, issue)
    
    #-- PMID ----------------------------------------------------------------#

    # Define vector for pmid
    pmid <- vector(mode = 'character',
                   length = length(pmid_path))

    for(i in 1:length(pmid_path)) {
        pmid[[i]] <- xml_text(
            xml_find_first(record,
                           pmid_path[[i]]))
    }

    # Make article marker for joins
    ## Define vector for 'trimmed' pmid path
    pmid_path2 <- vector(mode = 'character',
                         length = length(pmid_path))

    for(i in 1:length(pmid_path)) {
        pmid_path2[[i]] <- 
            str_extract(pmid_path[[i]],
                        '/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
    }

    # Make dataframe
    pmid2 <- data.frame(article_node = pmid_path2,
                        pmid = pmid)

    #-- Publication status --------------------------------------------------#

    # Define vector for publication status
    status <- vector(mode = 'character',
                     length = length(status_path))

    for(i in 1:length(status_path)) {
        status[[i]] <- xml_text(
            xml_find_first(record,
                           status_path[[i]]))
    }

    # Make article marker for joins
    ## Define vector for 'trimmed' year path
    status_path2 <- vector(mode = 'character',
                           length = length(status_path))

    for(i in 1:length(status_path)) {
        status_path2[[i]] <- 
            str_extract(status_path[[i]],
                        '/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
    }

    # Make dataframe
    status2 <- data.frame(article_node = status_path2,
                          publication_status = status) %>%
        # Edit text
        mutate(publication_status = ifelse(
            is.na(publication_status),
            yes = NA,
            no = ifelse(
                publication_status == 'ppublish',
                yes = 'print copy',
                no = 'ahead of print')))
    
    #-- Year / month / day published ----------------------------------------#

    # Define vector for publication year
    year_published <- vector(mode = 'character',
                             length = length(year_published_path))

    for(i in 1:length(year_published_path)) {
        year_published[[i]] <- xml_text(
            xml_find_first(record,
                           year_published_path[[i]]))
    }
    
    # Define vector for publication month
    month_published <- vector(mode = 'character',
                              length = length(month_published_path))

    for(i in 1:length(month_published_path)) {
        month_published[[i]] <- xml_text(
            xml_find_first(record,
                           month_published_path[[i]]))
    }
    
    # Define vector for publication day (default = 1st of the month)
    day_published <- rep('01', length(year_published_path))

    # Make article marker for joins
    ## Define vector for 'trimmed' year path
    year_published_path2 <- vector(mode = 'character',
                                   length = length(year_published_path))

    for(i in 1:length(year_published_path)) {
        year_published_path2[[i]] <- 
            str_extract(year_published_path[[i]],
                        '/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
    }

    # Make dataframe
    year_published2 <- data.frame(article_node = year_published_path2,
                                  year_published = year_published,
                                  month_published = month_published,
                                  day_published = day_published) %>%
        # Convert to date
        mutate(date_published = paste(year_published,
                                      month_published,
                                      day_published,
                                      sep = '-'),
               date_published = ymd(date_published)) %>%
        # Select required columns
        select(article_node, date_published)
    
    #-- Year / month / day online -------------------------------------------#

    # Define vector for online publication year
    year_online <- vector(mode = 'character',
                          length = length(year_online_path))

    for(i in 1:length(year_online_path)) {
        year_online[[i]] <- xml_text(
            xml_find_first(record,
                           year_online_path[[i]]))
    }
    
    # Define vector for online publication year
    month_online <- vector(mode = 'character',
                           length = length(month_online_path))

    for(i in 1:length(month_online_path)) {
        month_online[[i]] <- xml_text(
            xml_find_first(record,
                           month_online_path[[i]]))
    }
    
    # Define vector for online publication year
    day_online <- vector(mode = 'character',
                         length = length(day_online_path))

    for(i in 1:length(day_online_path)) {
        day_online[[i]] <- xml_text(
            xml_find_first(record,
                           day_online_path[[i]]))
    }

    # Make article marker for joins
    ## Define vector for 'trimmed' year path
    year_online_path2 <- vector(mode = 'character',
                                length = length(year_online_path))

    for(i in 1:length(year_online_path)) {
        year_online_path2[[i]] <- 
            str_extract(year_online_path[[i]],
                        '/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
    }

    # Make dataframe
    year_online2 <- data.frame(article_node = year_online_path2,
                               year_online = year_online,
                               month_online = month_online,
                               day_online = day_online) %>%
        # Convert to date
        mutate(date_online = paste(year_online,
                                   month_online,
                                   day_online,
                                   sep = '-'),
               date_online = ymd(date_online)) %>%
        # Select required columns
        select(article_node, date_online)
    
    #-- Year / month / day entrez -------------------------------------------#

    # Define vector for entrez publication year
    year_entrez <- vector(mode = 'character',
                          length = length(year_entrez_path))

    for(i in 1:length(year_entrez_path)) {
        year_entrez[[i]] <- xml_text(
            xml_find_first(record,
                           year_entrez_path[[i]]))
    }
    
    # Define vector for entrez publication year
    month_entrez <- vector(mode = 'character',
                           length = length(month_entrez_path))

    for(i in 1:length(month_entrez_path)) {
        month_entrez[[i]] <- xml_text(
            xml_find_first(record,
                           month_entrez_path[[i]]))
    }
    
    # Define vector for entrez publication year
    day_entrez <- vector(mode = 'character',
                         length = length(day_entrez_path))

    for(i in 1:length(day_entrez_path)) {
        day_entrez[[i]] <- xml_text(
            xml_find_first(record,
                           day_entrez_path[[i]]))
    }

    # Make article marker for joins
    ## Define vector for 'trimmed' year path
    year_entrez_path2 <- vector(mode = 'character',
                                length = length(year_entrez_path))

    for(i in 1:length(year_entrez_path)) {
        year_entrez_path2[[i]] <- 
            str_extract(year_entrez_path[[i]],
                        '/PubmedArticleSet/PubmedArticle\\[[0-9][0-9]?[0-9]?\\]')
    }

    # Make dataframe
    year_entrez2 <- data.frame(article_node = year_entrez_path2,
                               year_entrez = year_entrez,
                               month_entrez = month_entrez,
                               day_entrez = day_entrez) %>%
        # Convert to date
        mutate(date_entrez = paste(year_entrez,
                                   month_entrez,
                                   day_entrez,
                                   sep = '-'),
               date_entrez = ymd(date_entrez)) %>%
        # Select required columns
        select(article_node, date_entrez)
    
    # Put it all together #
    #######################
    
    #-- Make into dataframe ----------------------------------------------#

    # Join 'short' dataframes (<=100 entries)
    record <- pmid2 %>%
        left_join(publisher2,
                  by = 'article_node') %>%
        left_join(journal2,
                  by = 'article_node') %>%
        left_join(volume2,
                  by = 'article_node') %>%
        left_join(issue2,
                  by = 'article_node') %>%
        left_join(status2,
                  by = 'article_node') %>%
        left_join(year_online2,
                  by = 'article_node') %>%
        left_join(year_entrez2,
                  by = 'article_node') %>%
        left_join(year_published2,
                  by = 'article_node') %>%
        select(pmid,
               publisher,
               journal, 
               volume,
               issue,
               publication_status, 
               date_online,
               date_entrez,
               date_published) 
    
    #-- Output -----------------------------------------------------------#
    
    return(record)

}

############################################################
#                                                          #
#       Generate dataframe from downloaded xml record      #
#                                                          #
############################################################

df <- map_df(record,
             parse_record)

############################################################
#                                                          #
#                    Clean-up dataframe                    #
#                                                          #
############################################################

df <- df %>%
    # Remove 'date_online' column (use complete 'date_entrez' data instead)
    select(-date_online) %>%
    # Make a 'year_entrez' and 'year_published' column
    mutate(year_entrez = as.numeric(str_extract(date_entrez,
                                                pattern = '[0-9]{4}')),
           year_published = as.numeric(str_extract(date_published,
                                                   pattern = '[0-9]{4}'))) %>%
    # Fix journal names
    mutate(journal = fct_recode(as.factor(journal),
                                `Clin J Pain` = 'clin j pain',
                                `Eur J Pain` = 'eur j pain',
                                `J Pain` = 'j pain',
                                PAIN = 'pain'))

# Generate 'print copy data'
df_print <- df %>%
    # Only want papers that have completed the publication cycle
    filter(publication_status != 'ahead of print') %>%
    # Remove 'date_published' = NA 
    filter(!is.na(date_published)) %>%
    # Remove 'year_published' > 2016
    filter(year_published < 2017) %>%
    # Make an interval column ('time to print' in days)
    mutate(interval = as.numeric(date_published - date_entrez)) %>%
    # Remove interval values < 1
    filter(interval >= 1)

# Generate 'ahead of print' data
# df_ahead <- df %>%
    # filter(publication_status != 'print copy')

# Clean-up environment
rm(list = c('base_url',
            'database_query',
            'database_query2',
            'i',
            'parse_record',
            'pmid_path',
            'pmids',
            'pubmed_query',
            'record', 
            'record_count',
            'search_get',
            'search_query',
            'splitter',
            'splitter_list',
            'terms'))

# readr::write_rds(df, './_data/2017-02-05-publication-time/df.rds')
# readr::write_rds(df_print, './_data/2017-02-05-publication-time/df_print.rds')

Caveats

I mentioned at the start that this was a very crude analysis, and the primary reasons for this statement are as follows:

The PubMed database has errors, and I made no attempt to verify the data retrieved from PubMed against data available through the publishers.
PubMed xml records follow a template, but the template is not applied consistently across all records. These inconsistencies make programmatically extracting the data susceptible to errors and missing data. For example, the XPath for the print publication year and month is typically //PubDate/Year and //PubDate/Month, respectively. But, in some records these individual year and month nodes are missing and instead a single date string of the form ‘YEAR Month-Month’ is provided at the path //PubDate/MedlineDate. Similarly, all records include a //PubMedPubDate[@PubStatus = 'entrez'] node from which the date an article was added to the Entrez database can be extracted, but only some records provide information on the date the publisher first released the e-publication (//ArticleDate[@DateType = 'Electronic']).

Median ‘time to print’

The figure below shows the median time in days between an article being recorded as an ‘epub ahead of print’ and then as a ‘print copy’ on the PubMed database. The data are shown as a heatmap, with the colour getting darker (more purple) as the time between epub and print increases. A quick scan of the plot reveals that the Clinical Journal of Pain (Clin J Pain) and the European Journal of Pain (Eur J Pain) take the longest, while the time take by PAIN and the Journal of Pain (J Pain) is relatively short.

# Read in saved outputs from get_data chunk
df <- read_rds('./_data/2017-02-05-publication-time/df.rds')
df_print <- read_rds('./_data/2017-02-05-publication-time/df_print.rds')

############################################################
#                                                          #
#                      Plot heatmap                        #
#                                                          #
############################################################

# Summarise data for plotting 
## Median 'time to print' by journal and year
df_heat <- df_print %>%
    group_by(journal, year_entrez) %>%
    rename(year = year_entrez) %>%
    summarise(median = round(median(interval))) %>%
    # Add tooltip
    mutate(tooltip = paste0('<b>', journal, '</b> <br>',
                            '<em>Time to print:</em> ', median, ' days')) %>%
    ungroup() %>%
    mutate(journal = fct_relevel(journal, 
                                 'Clin J Pain', 
                                 'Eur J Pain', 
                                 'PAIN', 
                                 'J Pain'))

# ggplot
gg_heat <- ggplot(data = df_heat) +
    aes(x = year, 
        y = journal, 
        fill = median, 
        tooltip = tooltip, 
        data_id = tooltip) +
    geom_tile_interactive() +
    scale_fill_viridis_c(direction = -1, 
                         name = 'Days\n') +
    labs(caption = "(Interactive figure, 'hover' over plot elements for more detailed information)",
         x = '\nYear') +
    theme(panel.background = element_blank(),
          axis.ticks = element_blank(),
          axis.title.x = element_text(size = 14),
          axis.title.y = element_blank(),
          axis.text.y = element_text(size = 12),
          axis.text.x = element_text(size = 12),
          plot.caption = element_text(size = 8),
          panel.grid = element_blank()) 

gi_heat <- girafe(ggobj = gg_heat,
                  height_svg = 5,
                  width_svg = 6)

girafe_options(x = gi_heat,
               opts_tooltip(css = 'font-family:arial;background-color:#eaeaea;
                            padding:10px;border-radius:10px 20px 10px 20px;',
                            opacity = 1,
                            offx = 10, offy = -10), 
               opts_hover(css = 'color:#FFFFFF;opacity:0.4;'))

Variability in the ‘time to print’

The box-and-whisker plot below gives some idea about the spread of the ‘time to print’ for each of the journals over the past few years. Clearly there are errors in the PubMed database. I cannot believe that the Clinical Journal of Pain took 529 days to transition one article from epub to print in 2014. Nor can I believe that only one day was needed for the Journal of Pain to transition six epubs to print in 2013. But, pruning the data for ‘outliers’ didn’t shift the median time to publication substantially, so I decided to present the data warts and all.

############################################################
#                                                          #
#                      Summary stats                       #
#                                                          #
############################################################
# Generate boxplot summary stats for 'time to print' (days) interval 
summary_stats <- df_print %>%
    select(journal, 
           year_entrez, 
           interval,
           pmid) %>%
    rename(year = year_entrez) %>%
    group_by(journal, year) %>%
    summarise(median = round(median(interval)),
              Q25 = round(quantile(interval, 0.25)),
              Q75 = round(quantile(interval, 0.75)),
              lower_whisker = round(boxplot.stats(interval)$stats[1]),
              upper_whisker = round(boxplot.stats(interval)$stats[5]),
              min = min(interval),
              max = max(interval)) %>%
    mutate(tooltip = paste0(paste0('<b>Time to print (days): ', 
                                   journal, '</b> <br>',
                            '<em>Median:</em> ', 
                            median, '<br>',
                            '<em>Minimum / Maximum:</em> ', 
                            min, ' / ', max, '<br>',
                            '<em>Inter-quartile range:</em> ', 
                            Q25, ' to ', Q75, '<br>',
                            '<em>Whisker range:</em> ', 
                            lower_whisker, ' to ', 
                            upper_whisker, '<br>'))) %>% 
    ungroup() %>% 
    select(journal, year, tooltip)

df_print <- df_print %>% 
    rename(year = year_entrez) %>% 
    left_join(summary_stats)

############################################################
#                                                          #
#                           Plot                           #
#                                                          #
############################################################
gg_box <- df_print %>% 
    mutate(journal = fct_relevel(journal, 
                                 'Clin J Pain', 
                                 'Eur J Pain', 
                                 'PAIN', 
                                 'J Pain')) %>% 
    ungroup() %>% 
    ggplot(.) +
    aes(x = factor(year),
        y = interval,
        fill = journal,
        colour = journal,
        tooltip = tooltip,
        data_id = tooltip) +
    geom_boxplot_interactive() +
    labs(caption = "(Interactive figure, 'hover' over plot elements for more detailed information)\n",
         y = 'Time to print (days)\n',
         x = '\nYear') +
    scale_y_continuous(limits = c(-5, 605), 
                       breaks = c(0, 100, 200, 300, 400, 500, 600),
                       labels = c(0, 100, 200, 300, 400, 500, 600),
                       expand = c(0,0)) +
    scale_colour_manual(values = c('#000000', '#E69F00', '#0072B2', '#009E73')) +
    scale_fill_manual(values = c('#4c4c4c', '#edbb4c', '#4c9cc9', '#4cbb9d')) +
    facet_wrap(~ journal, ncol = 4) +
    theme(legend.position = 'none',
          panel.background = element_blank(),
          axis.ticks = element_blank(),
          axis.title = element_text(size = 20),
          axis.text.y = element_text(size = 18),
          axis.text.x = element_text(size = 18, 
                                     angle = 60,
                                     hjust = 1),
          strip.text = element_text(size = 18),
          plot.caption = element_text(size = 12),
          panel.grid.major = element_line(colour = '#999999',
                                          size = 0.1))

gi_box <- girafe(ggobj = gg_box,
                 height_svg = 7,
                 width_svg = 9)

girafe_options(x = gi_box,
               opts_tooltip(css = 'font-family:arial;background-color:#eaeaea;
                            padding:10px;border-radius:10px 20px 10px 20px;',
                            opacity = 1,
                            offx = 10, offy = -10), 
               opts_hover(css = 'color:#FFFFFF;opacity:0.4;'))

You could be mischievous with these data and say that the two top-ranked journals (by impact factor) are the top-ranked journals partially because they are streets ahead of the other two journals in terms of getting articles from electronic to print format. But another possibility is that the ‘lesser’ two journals have more papers to print compared to the Journal of Pain and PAIN. That is, does the elitism inherent in the impact factor system afford the Journal of Pain and PAIN greater scope to reject submissions (something I am a little too familiar with for my liking), giving them fewer articles to process?

# Calculate the median number of articles per issue
issue_no <- df_print %>%
    group_by(journal, year, volume, issue) %>%
    # Number of articles by journal/year/volume/issue
    summarise(article_no = n()) %>%
    # Average number of articles per issue per journal
    group_by(journal) %>%
    summarise(median = round(median(article_no)))

# Add to table 1
tab_df2 <- tab_df %>% 
    select(Journal, `Frequency (issues per year)`) %>%
    rename(`Issues per year (n)` = `Frequency (issues per year)`) %>%
    bind_cols(issue_no[2]) %>% # bind_cols(issue_no[2], ahead_no[2]) %>%
    mutate(`Articles per year (n; median)` = 
               median * `Issues per year (n)`) %>%
    rename(`Articles per issue (n; median)` = median) %>%
    select(Journal, 
           `Articles per year (n; median)`, 
           `Issues per year (n)`,
           `Articles per issue (n; median)`) # `'ahead of print' articles (n; 31 Dec 2016)`)

# Print table
kable(x = tab_df2,
      align = 'lrrr',
      caption = '<b>Table 2.</b> Journal outputs')

**Table 2.** Journal outputs
Journal	Articles per year (n; median)	Issues per year (n)	Articles per issue (n; median)
Clinical Journal of Pain	96	12	8
European Journal of Pain	140	10	14
Journal of Pain	108	12	9
PAIN	228	12	19

Other than the European Journal of Pain, which publishes 10 issues per year, the other three journals publish 12 issues per year (Table 2). Yet despite having the lowest issue frequency of the four journals, the European Journal of Pain publishes the second greatest number of articles per issue (median = 14 articles) and hence it is competitive with regards to total number of printed articles per year (median = 140 articles). The reduced number of issues per year does means that if you miss being published in an issue of the European Journal of Pain, there is a longer wait until the next issue compared to the other three journals. However this delay does not account for the magnitude in the difference in time to print between the European Journal of Pain, and the Journal of Pain and PAIN.

The long time to print for Clinical Journal of Pain isn’t easy to explain either. Comparing the data of the Clinical Journal of Pain and the Journal of Pain (Table 2), the two journals have comparable issue frequency (12 per year), median articles per issue, and hence total number of articles published per year, but vastly different time to print data. What ever the reason for the slow time to print speed of the Clinical Journal of Pain, they need to find a solution; increasing the number of articles per issue is the obvious solution.

Closing remarks

The unrefined nature of this analysis combined with the falliblity of the PubMed database when it comes to dates means that there may be some inaccuracies in the data presented here. Nevertheless, I think the data is strong enough to conclude that the two top-ranked journals (PAIN and Journal of Pain) are quicker at converting articles from ‘epub ahead of print’ to ‘print copy’ compared to the two lower-ranked journals. The reasons for the differences are not obvious, and I do not believe it is a publisher issue¹. What ever the reason, I don’t think it is acceptable for journals such as the European Journal of Pain and Clinical Journal of Pain to take on average half to three-quarters to of a year to bring an article out in ‘print’.

Comment

I have had excellent experiences with all four journals, and these data are in no way a reflection of the excellent work done by the editorial and copy-editing staff of these journals. Indeed, the issue of time to print for these four journals is in my experience not an indicator of the time it takes for an article to go from accepted to being available online with a DOI, and is only an issue for those of us with weird funding mechanisms.

Session information

sessionInfo()

## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gdtools_0.2.3    ggiraph_0.7.8    knitr_1.31       lubridate_1.7.10
##  [5] forcats_0.5.1    stringr_1.4.0    dplyr_1.0.5      purrr_0.3.4     
##  [9] readr_1.4.0      tidyr_1.1.3      tibble_3.1.0     ggplot2_3.3.3   
## [13] tidyverse_1.3.0  xml2_1.3.2      
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.0  xfun_0.22         bslib_0.2.4       haven_2.3.1      
##  [5] colorspace_2.0-0  vctrs_0.3.6       generics_0.1.0    viridisLite_0.3.0
##  [9] htmltools_0.5.1.1 yaml_2.2.1        utf8_1.2.1        rlang_0.4.10     
## [13] jquerylib_0.1.3   pillar_1.5.1      withr_2.4.1       glue_1.4.2       
## [17] DBI_1.1.1         dbplyr_2.1.0      uuid_0.1-4        modelr_0.1.8     
## [21] readxl_1.3.1      lifecycle_1.0.0   munsell_0.5.0     gtable_0.3.0     
## [25] cellranger_1.1.0  rvest_1.0.0       htmlwidgets_1.5.3 evaluate_0.14    
## [29] labeling_0.4.2    fansi_0.4.2       highr_0.8         broom_0.7.5      
## [33] Rcpp_1.0.6        backports_1.2.1   scales_1.1.1      jsonlite_1.7.2   
## [37] farver_2.1.0      systemfonts_1.0.1 fs_1.5.0          hms_1.0.0        
## [41] digest_0.6.27     stringi_1.5.3     grid_4.0.4        cli_2.3.1        
## [45] tools_4.0.4       magrittr_2.0.1    sass_0.3.1        crayon_1.4.1     
## [49] pkgconfig_2.0.3   ellipsis_0.3.1    reprex_1.0.0      assertthat_0.2.1 
## [53] rmarkdown_2.7     httr_1.4.2        rstudioapi_0.13   R6_2.5.0         
## [57] compiler_4.0.4

Clinical Journal of Pain (slowest ‘time to print’) and PAIN (second fastest ‘time to print’) are both publihsed by Lippincott Williams & Wilkins.↩︎