Function er_group
condenses the embedding by grouping identical or highly similar objects (rows).
er_group(embedding, method = "identity", threshold = 0.95, verbose = FALSE)
a numeric
matrix containing a text embedding.
a character
string specifying the grouping method. One of c("identity","fuzzy")
. Default is "identity"
.
a numeric
specifying the threshold for method = "fuzzy"
. The threshold argument defines the quantile of the arccos similarity distribution that is used as the threshold for grouping embedding objects.
a logical
specifying whether to show messages.
The function returns a matrix
containing the grouped embedding. The matrix
still has ncol(embedding)
dimensions, but its rows have been reduced due to the grouping. With method = "identity"
, the matrix
gains the attribute frequency
containing the frequency table of each element in the original embedding
. With method = "fuzzy"
, the matrix
gains the new attributes group_size
, which is analogue to frequency
, group_texts
, which contains the texts assigned to the group, and group_min_sim
, which shows the minimum arccos similarity of texts in a group. Furthermore, with method = "fuzzy"
, the text
column will be replaced with generic group labels.
Wulff, D. U., Aeschbach, S., Hussain, Z., & Mata, R. (2024). embeddeR. psyArXiv