Infer category labels — er_infer

er_infer_labels infers category labels using generative large language models.

er_infer_labels(
  labels,
  api = "huggingface",
  model = NULL,
  role = "assistant",
  instruct = NULL,
  system = NULL,
  verbose = FALSE
)

Arguments

labels: a list of character vectors.
api: a character string specifying the api One of c("huggingface","openai","cohere"). Default is "openai".
model: a character string specifying the model label. Must match the model names on the corresponding APIs. See, huggingface.co/models and platform.openai.com/docs/models/embeddings. Defaults to "meta-llama/Llama-2-70b-chat-hf" for api = "huggingface" and to "gpt-4" for api = "openai".
role: a character string specifying the systems role in place of role in the general system instruction to the model. Default is "assistant".
instruct: a character string specifying the instruction for the model. Must contain the placeholder "{examples}". Default is "Generate a specific and accurate one or two word category label that captures the common meaning of the following examples: {examples}. Place '@' before and after the category label.".
system: a character string specifying the general system instruction to the model. Default is "You are a helpful {role} who provides short, specific, and accurate category labels.".
verbose: a logical specifying whether to show messages.'

Value

The function returns a character vector of category labels.

Details

The models recommended for label inferences, including the default models, are not free for use and using them can result in significant costs. Costs will depend on the size of input texts and the number of labels inferred. The default Hugging Face model, meta-llama/Llama-2-70b-chat-hf, requires a PRO subscription at a monthly price. The OpenAI models, including the default gpt-4 model, incur costs based on the number of tokens in the input and output.

To obtain the best possible labels it is recommended to adjust the prompt arguments role, system, and instruct.

References

Wulff, D. U., Aeschbach, S., Hussain, Z., & Mata, R. (2024). embeddeR. psyArXiv

Examples

if (FALSE) {
# get labeled results
result <- er_embed(neo$text) %>%
  er_group() %>%
  er_project() %>%
  er_frame() %>%
  dplyr::mutate(group_labels = er_infer_labels(group_texts))
}