Function er_group condenses the embedding by grouping identical or highly similar objects (rows).
er_group(embedding, method = "identity", threshold = 0.95, verbose = FALSE)a numeric matrix containing a text embedding.
a character string specifying the grouping method. One of c("identity","fuzzy"). Default is "identity".
a numeric specifying the threshold for method = "fuzzy". The threshold argument defines the quantile of the arccos similarity distribution that is used as the threshold for grouping embedding objects.
a logical specifying whether to show messages.
The function returns a matrix containing the grouped embedding. The matrix still has ncol(embedding) dimensions, but its rows have been reduced due to the grouping. With method = "identity", the matrix gains the attribute frequency containing the frequency table of each element in the original embedding. With method = "fuzzy", the matrix gains the new attributes group_size, which is analogue to frequency, group_texts, which contains the texts assigned to the group, and group_min_sim, which shows the minimum arccos similarity of texts in a group. Furthermore, with method = "fuzzy", the text column will be replaced with generic group labels.
Wulff, D. U., Aeschbach, S., Hussain, Z., & Mata, R. (2024). embeddeR. psyArXiv