Tools to calculate SII and its extensions

This vignette introduces how to use siie package to calculate SII and its extensions introduced in the paper “Superior identification index: Quantifying the capability of academic journals to recognize good research”(https://doi.org/10.1007/s11192-022-04372-z). First, we construct a data set manually, suspecting that there are 10,000 papers from 26 journals with their citation counts.

set.seed(19960822)
nr_of_rows = 1e4
data.frame(
  Id = 1:1e4,
  Journal = sample(LETTERS,nr_of_rows,replace = TRUE),
  CiteCount = sample(1:100,nr_of_rows,replace = TRUE)
) -> journal_table

To get the SII (Superior Identification Index) and SIE (Superior Identification Efficiency) for the 26 journals (represented by letters), we can:

library(siie)
library(tidyfst)
#> Thank you for using tidyfst!
#> To acknowledge our work, please cite the package:
#> Huang et al., (2020). tidyfst: Tidy Verbs for Fast Data Manipulation. Journal of Open Source Software, 5(52), 2388, https://doi.org/10.21105/joss.02388

journal_table %>% siie(group = "Journal",index = "CiteCount")
#> Key: <Journal>
#>     Journal superior_no total_no        sii        sie
#>      <char>       <int>    <int>      <num>      <num>
#>  1:       A          44      393 0.04251208 0.11195929
#>  2:       B          44      380 0.04251208 0.11578947
#>  3:       C          39      381 0.03768116 0.10236220
#>  4:       D          46      385 0.04444444 0.11948052
#>  5:       E          43      358 0.04154589 0.12011173
#>  6:       F          38      372 0.03671498 0.10215054
#>  7:       G          43      415 0.04154589 0.10361446
#>  8:       H          42      386 0.04057971 0.10880829
#>  9:       I          42      376 0.04057971 0.11170213
#> 10:       J          41      368 0.03961353 0.11141304
#> 11:       K          37      390 0.03574879 0.09487179
#> 12:       L          37      392 0.03574879 0.09438776
#> 13:       M          38      372 0.03671498 0.10215054
#> 14:       N          28      397 0.02705314 0.07052897
#> 15:       O          42      384 0.04057971 0.10937500
#> 16:       P          51      415 0.04927536 0.12289157
#> 17:       Q          36      364 0.03478261 0.09890110
#> 18:       R          39      408 0.03768116 0.09558824
#> 19:       S          45      399 0.04347826 0.11278195
#> 20:       T          40      387 0.03864734 0.10335917
#> 21:       U          31      384 0.02995169 0.08072917
#> 22:       V          47      392 0.04541063 0.11989796
#> 23:       W          30      344 0.02898551 0.08720930
#> 24:       X          28      383 0.02705314 0.07310705
#> 25:       Y          40      401 0.03864734 0.09975062
#> 26:       Z          44      374 0.04251208 0.11764706
#>     Journal superior_no total_no        sii        sie

Note that the default superior cutoff (parameter p) is 10, indicating that top 10% papers are regarded as superior. If we want to use a different p, say 1, we can:

journal_table %>% siie(group = "Journal",index = "CiteCount",p = 1)

To get the PRP (Paper Rank Percentile) for the 26 journals, we can:

prp(journal_table,group = "Journal",index = "CiteCount")
#>     Journal total_no      prp
#>      <char>    <int>    <num>
#>  1:       X      383 53.53256
#>  2:       M      372 52.88790
#>  3:       U      384 51.88940
#>  4:       R      408 51.10132
#>  5:       H      386 51.09964
#>  6:       W      344 51.05587
#>  7:       G      415 50.99173
#>  8:       O      384 50.49888
#>  9:       N      397 50.40763
#> 10:       Q      364 50.40338
#> 11:       Y      401 49.54594
#> 12:       F      372 49.45449
#> 13:       K      390 49.19364
#> 14:       L      392 48.90227
#> 15:       V      392 48.76166
#> 16:       J      368 48.68158
#> 17:       S      399 48.64158
#> 18:       B      380 48.47558
#> 19:       C      381 48.46646
#> 20:       A      393 48.43221
#> 21:       D      385 48.41839
#> 22:       T      387 48.31010
#> 23:       Z      374 47.36270
#> 24:       E      358 47.31212
#> 25:       P      415 46.86055
#> 26:       I      376 46.53165
#>     Journal total_no      prp

Last, if we want to draw p-SIE curve for Journals A, B and C, we can:

library(ggplot2)

p_sie(journal_table,group = "Journal",
      index = "CiteCount",to_compare = c("A","B","C")) -> p_sie_df

p_sie_df
#>      Journal     p         sie
#>       <char> <int>       <num>
#>   1:       A     1 0.005089059
#>   2:       B     1 0.010526316
#>   3:       C     1 0.007874016
#>   4:       A     2 0.030534351
#>   5:       B     2 0.026315789
#>  ---                          
#> 296:       B    99 1.000000000
#> 297:       C    99 1.000000000
#> 298:       A   100 1.000000000
#> 299:       B   100 1.000000000
#> 300:       C   100 1.000000000

p_sie_df %>%
  ggplot(aes(p/100,sie,color = Journal)) +
  geom_point() +
  geom_line() +
  geom_abline(slope = 1,linetype = "dashed") +
  scale_x_continuous(labels = tidyfst::percent) +
  scale_y_continuous(labels = tidyfst::percent) +
  labs(x = "p",y = "SIE") +
  theme_bw() +
  theme(legend.position = c(0.8, 0.3),
        legend.background = element_rect(linewidth=0.5,
                                         color = "black",linetype="solid"))
#> Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
#> 3.5.0.
#> ℹ Please use the `legend.position.inside` argument of `theme()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.


Notice that we use the tidyfst::percent to change the scales of x and y.