The embedR package is an open-source R package to generate and analyze state-of-the-art text embeddings. Providing access to free and paid APIs from Hugging Face, OpenAI, and Cohere, the package offers functions to generate, group, project, label, and visualize text embeddings.

General Information

The embedR package is developed by Dirk U. Wulff, with contributions from Samuel Aeschbach, Zak Hussain, and Rui Mata. It is published under the GNU General Public License.

An overview of the package can found here or accessed from within R using ?embedR.


The development version can be installed via devtools::install_github("dwulff/embedR"). This requires prior installation of the devtools package.


Use of this package can result in data protection violations. It contains functions that send data to external servers of Hugging Face, OpenAI, or Cohere.


# load package

# vector of texts
texts = c("This is text 1", "This is text 2", ...)

# set api tokens
er_set_token("openai" = "TOKEN",
             "huggingface" = "TOKEN",
             "cohere" = "TOKEN")

# generate embedding
embedding = texts %>% 

  # generate text embedding
  er_embed(api = "openai") 

# analyze embedding  
result = embedding %>% 

  # group similar texts
  er_group(method = "fuzzy") %>% 
  # generate 2D projection
  er_project(method = "umap") %>% 
  # cluster projection
  er_cluster(method = "louvain") %>% 
  # produce data frame
# re-label text groups
result = embedding %>% 

  # relabel groups
  er_mutate(labels = label(group_texts, 
                           api = "openai"))
# visualize
result %>% plot()


