Working with agents¶

Agents are the building blocks of every agentic workflow. Each one performs a specific task—like retrieving documents, calling a language model, or filtering results—and passes its output to the next step.

This tutorial walks through how to work with agents in detail, including how to:

Add and configure agents on the canvas.
Define prompts and context.
Apply guardrails to enforce safety and compliance.
Connect inputs and outputs between agents.

Whether you're running a simple LLM call or a full retrieval-augmented pipeline, every workflow is built from agents like these.

Add agents to your canvas¶

When you create a new workflow, the canvas opens with a Start and End node already placed. In the Nodes panel on the left, you’ll find the default agent types available in your workspace—for example, retrievers like Azure AI Search and LLMs like OpenAI or vLLM.

Agents fall into two broad categories today: retrievers that bring in relevant information, and large language models (LLMs) that generate or transform text. Most workflows use a combination of the two—for example, retrieving documents with Azure AI Retriever and then summarizing them with an LLM. Additional categories of agents may be added in the future.

The Nodes panel shows the default agent types, plus any preconfigured data connectors or LLMs shared with your workspace.

If your admin has shared preconfigured integrations with your workspace, you’ll also see additional data connectors and LLMs listed. These may have some fields locked depending on how the integration was created.

To add an agent, drag it from the Nodes panel onto the canvas. Select the agent to open its toolbar: use the icon to configure settings, or the icon to enable guardrails.

Note

For preconfigured agents, most fields are locked and uneditable, depending on how the integration was created.

Once agents are on the canvas, the next step is to connect them so they work together as a flow. You can configure each agent’s behavior at any point, but many users find it easier to sketch the flow first and fine-tune settings afterward.

Connect agents in a flow¶

After adding agents to the canvas, you’ll need to connect their ports so data can flow between them:

Drag from the output port of one node to the input port of the next.
You can reference outputs from one node inside another. For example, you might take the output of the Start node or an Azure AI Retriever node and insert it into a downstream agent’s prompt or input field.

You can connect agents either before or after configuring them—the platform doesn’t enforce a strict order. Many users find it easiest to:

Place agents on the canvas.
Connect them to sketch the overall flow.
Configure each agent’s settings and prompts.

This way, you have both the structure (data flow) and behavior (agent configuration) in place before running the workflow.

Once you know how to connect agents, you can also combine multiple agents of the same or different types. This makes workflows more flexible and allows you to tailor them to complex tasks.

Combine agents¶

You can use multiple integrations of the same type or mix and match to create flexible, agentic workflows.

If you want to...	Use this approach
Pull documents or results from a search index	A single data connector (retriever)
Ask questions, generate summaries, or reason over text	A single LLM
Retrieve context and reason over it	Data connector + LLM in the same workflow
Use different data sources for different tasks	Multiple data connectors
Chain multiple reasoning steps or compare outputs	Multiple LLMs

Examples

Retrieve HR and Finance data from two indexes → summarize with an LLM.
Use OpenAI for generation, vLLM for classification.
Filter one retriever by region, another by department.

Configure agent behavior¶

Now that you’ve seen how to connect and combine agents, the next step is to configure each one. Each agent type includes a configuration panel tailored to its role. Broadly, agents fall into two categories:

Retrievers (data connectors) bring external information into your workflow (e.g., Azure AI Retriever).
Language models (LLMs) generate or transform text based on prompts and context (e.g., OpenAI Service or vLLM Service).

Together, these agents cover the core building blocks of most workflows: retrievers supply relevant context, and LLMs process that context into useful outputs.

Tip

Use a retriever when your workflow needs external knowledge, and use an LLM when you want to generate or transform text. Most workflows combine both.

Azure AI Retriever agent¶

The Azure AI Retriever agent queries an Azure Cognitive Search index and returns ranked documents for use in downstream steps. Configure the following fields:

Name: A custom label for this node in your workflow (e.g., Search Customer Records).
API key: The authentication key for your Azure Search index.
API version (optional): Defaults to the latest supported version; override if you need compatibility with an earlier release.
Index name: The specific Azure Search index to query.
Search service name: The name of the Azure Search service that hosts your index.
TopK results (optional): The number of top-ranked results the retriever should return. A smaller value (e.g., 5) returns fewer, more focused results; a larger value (e.g., 20) may improve recall but can include more noise.
Record Filter (optional): Apply a filter expression to narrow search results (e.g., by metadata field).

The Query input is not set in this panel. Instead, it appears as a port on the node. Connect an upstream output (for example, Start.output) to the Query port to supply the search text. At runtime, the retriever uses that input to query the index and returns ranked results through its output port.

The configuration panel for the Azure AI Search Retriever agent.

OpenAI Service agent¶

The OpenAI Service agent connects to OpenAI-hosted models (such as GPT-4 or GPT-5) for text generation and processing. Configure the following fields:

Node name: A custom label for this node in your workflow (e.g., Clinical Research Summarizer).
Model name: Select which OpenAI model to call (e.g., gpt-4).
Temperature (optional): Controls variability in the response. Lower values (0.0–0.3) make outputs more deterministic; higher values (0.7+) allow more creative or varied responses.
Max tokens (optional): Sets the maximum length of the model’s output. Larger values allow longer answers but may increase latency and cost.
API key: Authentication key for your OpenAI account.
Context prompt: A set of instructions that defines how the model should behave. This is prepended to every request sent to the model.

Example context prompt

You are a helpful assistant. Summarize the provided input into three clear bullet points.

Ports

Prompt: Connect an upstream node (e.g., Start node or a retriever) to provide the main content the model will process.
Context (optional input): Optional. Use this if you want to pass additional dynamic context alongside the fixed context prompt.
Output: Returns the generated text, which can be connected to the next node in the workflow.

The configuration panel for the OpenAI Service agent.

Tips on writing effective prompts

Context prompts strongly influence how agents behave. They’re especially important for LLM agents like OpenAI Service or vLLM, where small changes can lead to very different outputs. In general:

Be explicit about the agent’s role and tone.
Set clear rules ("Use only the provided context," "Avoid speculation").
Use formatting like numbered instructions or caps for emphasis.
Add fallback behavior ("Say 'I don’t know' if unsure").
Keep prompts short enough to avoid truncation, especially with long user inputs.

vLLM Service Agent¶

The vLLM Service agent sends requests to a private inference endpoint, typically hosting open-source models such as LLaMA 2 or Mistral. Configure the following fields:

Node name: A custom label for this node in your workflow (e.g., Private Model Summarizer).
Model name: The name of the model exposed by your vLLM deployment (e.g., llama-2-13b).
Temperature (optional): Controls variability in the response. Lower values (0.0–0.3) make outputs more deterministic; higher values (0.7+) allow more creative or varied responses.
Max Tokens (optional): Sets the maximum length of the model’s output. Larger values allow longer answers but may increase latency.
API URL: The full endpoint URL of your vLLM service.
Context prompt: Instructions that define how the model should behave. This prompt is prepended to every request sent to the model.

The configuration panel for the vLLM Service agent.

Example context prompt

You are a helpful assistant. Rewrite the input text in plain language that a general audience can understand.

Ports

Prompt: Connect an upstream node to provide the main content the model will process.
Context: Optional. Use this if you want to pass additional dynamic context alongside the fixed context prompt.
Output: Returns the generated text, which can be connected to the next node in the workflow.

Add guardrails for safety¶

Guardrails let you enforce safety, policy, or formatting constraints for each agent. Every agent type uses the same guardrails panel, so once you learn it, you can apply it anywhere.

Opaque guardrails are powered by NeMo Guardrails. You’ll define:

Configuration (YAML): Includes the model your rails use.
Input rails (Colang): Logic that runs before the node executes.
Output rails (Colang): Logic that runs after the node executes.

Note

In this release, rails block execution if they return anything other than the original text or an empty string. This behavior may expand in future releases.

To enable guardrails:

Select an agent node and click the icon on the agent node.

The guardrails panel is the same for every agent. You can configure YAML once, and add Colang rules for input and output as needed.
Toggle Enable guardrails.
Enter your config in one or more of the following sections:
- Configuration: NeMo YAML config, including the LLM used by the rails.
- Input rails: Colang code to validate or transform inputs.
- Output rails: Colang code to validate or transform outputs.

You can chain multiple nodes with guardrails. Each node runs its rails independently as the workflow progresses.

Example: Configuration (YAML)

config:
  models:
    - type: main
      engine: openai
      model: gpt-4
      parameters:
        api_key: ${OPENAI_API_KEY}
colang_version: "2.x"

Note

Guardrails are not supported with GPT-5 (model limitation). Use GPT-4 when rails are required.

Example: Input rails (Colang) — block PII

import core
import llm

flow main
  activate llm continuation

flow input rails $input_text
  $contains_pii = await check user utterance $input_text
  if $contains_pii
    bot say "Input blocked: PII detected."
    abort
  bot say $input_text
  abort

flow check user utterance $input_text -> $contains_pii
  $contains_pii = ... "Return True if the text contains PII, else False."
  return $contains_pii

Example: Output rails (Colang) — require JSON

import core
import llm

flow main
  activate llm continuation

flow output rails $model_output
  $is_json = ... "Return True if $model_output is valid JSON, else False."
  if not $is_json
    bot say "Output must be valid JSON."
    abort
  bot say $model_output
  abort

Note

Colang syntax and flow control are defined by NeMo Guardrails. The examples above show simple patterns, but for more advanced use cases, refer to the NeMo documentation.

Best practices¶

Use a dedicated guardrails model (e.g., GPT-4) separate from your task LLM. Guardrails are not supported on GPT-5.
Start simple: add either input or output rails first, then expand.
Keep rails concise—long prompts slow the workflow.
Log blocked inputs/outputs in production for auditing.

Troubleshooting¶

Issue	Resolution
Rails never fire	Make sure your Colang defines a `flow main` and that code is placed in the correct box (Input or Output).
Everything is blocked	Remember: returning anything other than the original or an empty string aborts execution. Echo the original text to allow it to pass.
Timeouts	Lower the LLM’s temperature or increase the SDK timeout.

Notes

Guardrails apply per agent. You can combine them across the full workflow so every step meets your safety and compliance needs.