Function er_group condenses the embedding by grouping identical or highly similar objects (rows).

er_group(embedding, method = "identity", threshold = 0.95, verbose = FALSE)

Arguments

embedding

a numeric matrix containing a text embedding.

method

a character string specifying the grouping method. One of c("identity","fuzzy"). Default is "identity".

threshold

a numeric specifying the threshold for method = "fuzzy". The threshold argument defines the quantile of the arccos similarity distribution that is used as the threshold for grouping embedding objects.

verbose

a logical specifying whether to show messages.

Value

The function returns a matrix containing the grouped embedding. The matrix still has ncol(embedding) dimensions, but its rows have been reduced due to the grouping. With method = "identity", the matrix gains the attribute frequency containing the frequency table of each element in the original embedding. With method = "fuzzy", the matrix gains the new attributes group_size, which is analogue to frequency, group_texts, which contains the texts assigned to the group, and group_min_sim, which shows the minimum arccos similarity of texts in a group. Furthermore, with method = "fuzzy", the text column will be replaced with generic group labels.

References

Wulff, D. U., Aeschbach, S., Hussain, Z., & Mata, R. (2024). embeddeR. psyArXiv

Examples

if (FALSE) {
# get and group embedding vectors
embedding <- er_embed(neo$text) %>%
  er_group()
}