The embedR
package is an open-source R package to generate and analyze text embeddings. It gives access to state-of-the-art open and paid APIs from Hugging Face, OpenAI, and Cohere to gnerate text embeddings and offers methods to group, project, relabel, and visualize them. The following provides an overview of the package's functions:
er_set_tokens
sets access tokens for the APIs of Hugging Face, OpenAI, and Cohere.
er_get_tokens
shows tokens that have been set during the current session.
er_embed
generates state-of-the-art text embeddings using the APIs from Hugging Face, OpenAI, and Cohere.
er_group
groups identical or highly similar embedding vectors to produce group-based embeddings.
er_project
projects embeddings into smaller dimensional spaces using MDS, UMAP, or PaCMAP.
er_compare_vectors
computes a similarity matrix containing the similarities of all pairs of embedding vectors.
er_compare_embeddings
computes the representational similarity of pairs of embeddings.
er_cluster
clusters the embedding vectors into larger groups using hierarchical clustering, dbscan, or louvain clustering.
er_frame
generates a tibble from the embedding objects including potential attributes.
er_infer_labels
uses state-of-the-art generative models from Hugging Face and OpenAI to generate category labels for groups of texts.
plot
produces a 2D scatter plot of embedding vectors (typically after projection) with options for customization.
neo
data set containing 300 items of the personality questionnaire NEO.
ai
data set containing 2,500 free associations of artificial intelligence provided by laypeople.
if (FALSE) {
# load package
library(embedR)
# set api tokens
er_set_token("openai" = "TOKEN",
"huggingface" = "TOKEN",
"cohere" = "TOKEN")
# generate embedding
embedding = neo$text %>%
# generate text embedding
er_embed(api = "openai")
# analyze embedding
result = embedding %>%
# group similar texts
er_group(method = "fuzzy") %>%
# generate 2D projection
er_project(method = "umap") %>%
# cluster projection
er_cluster(method = "louvain") %>%
# produce data frame
er_frame()
# re-label text groups
result = embedding %>%
# relabel groups
er_mutate(labels = label(group_texts,
api = "openai"))
# visualize
result %>% plot()
}