This vignette introduces how to use siie package to calculate SII and its extensions introduced in the paper “Superior identification index: Quantifying the capability of academic journals to recognize good research”(https://doi.org/10.1007/s11192-022-04372-z). First, we construct a data set manually, suspecting that there are 10,000 papers from 26 journals with their citation counts.
set.seed(19960822)
nr_of_rows = 1e4
data.frame(
Id = 1:1e4,
Journal = sample(LETTERS,nr_of_rows,replace = TRUE),
CiteCount = sample(1:100,nr_of_rows,replace = TRUE)
) -> journal_table
To get the SII (Superior Identification Index) and SIE (Superior Identification Efficiency) for the 26 journals (represented by letters), we can:
library(siie)
library(tidyfst)
#> Thank you for using tidyfst!
#> To acknowledge our work, please cite the package:
#> Huang et al., (2020). tidyfst: Tidy Verbs for Fast Data Manipulation. Journal of Open Source Software, 5(52), 2388, https://doi.org/10.21105/joss.02388
journal_table %>% siie(group = "Journal",index = "CiteCount")
#> Key: <Journal>
#> Journal superior_no total_no sii sie
#> <char> <int> <int> <num> <num>
#> 1: A 44 393 0.04251208 0.11195929
#> 2: B 44 380 0.04251208 0.11578947
#> 3: C 39 381 0.03768116 0.10236220
#> 4: D 46 385 0.04444444 0.11948052
#> 5: E 43 358 0.04154589 0.12011173
#> 6: F 38 372 0.03671498 0.10215054
#> 7: G 43 415 0.04154589 0.10361446
#> 8: H 42 386 0.04057971 0.10880829
#> 9: I 42 376 0.04057971 0.11170213
#> 10: J 41 368 0.03961353 0.11141304
#> 11: K 37 390 0.03574879 0.09487179
#> 12: L 37 392 0.03574879 0.09438776
#> 13: M 38 372 0.03671498 0.10215054
#> 14: N 28 397 0.02705314 0.07052897
#> 15: O 42 384 0.04057971 0.10937500
#> 16: P 51 415 0.04927536 0.12289157
#> 17: Q 36 364 0.03478261 0.09890110
#> 18: R 39 408 0.03768116 0.09558824
#> 19: S 45 399 0.04347826 0.11278195
#> 20: T 40 387 0.03864734 0.10335917
#> 21: U 31 384 0.02995169 0.08072917
#> 22: V 47 392 0.04541063 0.11989796
#> 23: W 30 344 0.02898551 0.08720930
#> 24: X 28 383 0.02705314 0.07310705
#> 25: Y 40 401 0.03864734 0.09975062
#> 26: Z 44 374 0.04251208 0.11764706
#> Journal superior_no total_no sii sie
Note that the default superior cutoff (parameter p) is 10, indicating that top 10% papers are regarded as superior. If we want to use a different p, say 1, we can:
To get the PRP (Paper Rank Percentile) for the 26 journals, we can:
prp(journal_table,group = "Journal",index = "CiteCount")
#> Journal total_no prp
#> <char> <int> <num>
#> 1: X 383 53.53256
#> 2: M 372 52.88790
#> 3: U 384 51.88940
#> 4: R 408 51.10132
#> 5: H 386 51.09964
#> 6: W 344 51.05587
#> 7: G 415 50.99173
#> 8: O 384 50.49888
#> 9: N 397 50.40763
#> 10: Q 364 50.40338
#> 11: Y 401 49.54594
#> 12: F 372 49.45449
#> 13: K 390 49.19364
#> 14: L 392 48.90227
#> 15: V 392 48.76166
#> 16: J 368 48.68158
#> 17: S 399 48.64158
#> 18: B 380 48.47558
#> 19: C 381 48.46646
#> 20: A 393 48.43221
#> 21: D 385 48.41839
#> 22: T 387 48.31010
#> 23: Z 374 47.36270
#> 24: E 358 47.31212
#> 25: P 415 46.86055
#> 26: I 376 46.53165
#> Journal total_no prp
Last, if we want to draw p-SIE curve for Journals A, B and C, we can:
library(ggplot2)
p_sie(journal_table,group = "Journal",
index = "CiteCount",to_compare = c("A","B","C")) -> p_sie_df
p_sie_df
#> Journal p sie
#> <char> <int> <num>
#> 1: A 1 0.005089059
#> 2: B 1 0.010526316
#> 3: C 1 0.007874016
#> 4: A 2 0.030534351
#> 5: B 2 0.026315789
#> ---
#> 296: B 99 1.000000000
#> 297: C 99 1.000000000
#> 298: A 100 1.000000000
#> 299: B 100 1.000000000
#> 300: C 100 1.000000000
p_sie_df %>%
ggplot(aes(p/100,sie,color = Journal)) +
geom_point() +
geom_line() +
geom_abline(slope = 1,linetype = "dashed") +
scale_x_continuous(labels = tidyfst::percent) +
scale_y_continuous(labels = tidyfst::percent) +
labs(x = "p",y = "SIE") +
theme_bw() +
theme(legend.position = c(0.8, 0.3),
legend.background = element_rect(linewidth=0.5,
color = "black",linetype="solid"))
#> Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
#> 3.5.0.
#> ℹ Please use the `legend.position.inside` argument of `theme()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
Notice that we use the tidyfst::percent
to change
the scales of x and y.