Fixing the Agent Handoff Problem in LlamaIndex's AgentWorkflow System
The position bias in LLMs is the root cause of the problem

LlamaIndex AgentWorkflow, as a brand-new multi-agent orchestration framework, still has some shortcomings. The most significant issue is that after an agent hands off control, the receiving agent fails to continue responding to user requests, causing the workflow to halt.
In today's article, I'll explore several experimental solutions to this problem with you and discuss the root cause behind it: the positional bias issue in LLMs.
I've included all relevant source code at the end of this article. Feel free to read or modify it without needing my permission.
Introduction
My team and I have been experimenting with LlamaIndex AgentWorkflow recently. After some localization adaptations, we hope this framework can eventually run in our production system.
During the adaptation, we encountered many obstacles. I've documented these problem-solving experiences in my article series. You might want to read them first to understand the full context.
Today, I'll address the issue where after the on-duty agent hands off control to the next agent, the receiving agent fails to continue responding to the user's most recent request.
Here's what happens:

After the handoff, the receiving agent doesn't immediately respond to the user's latest request - the user has to repeat their question.
Why should I care?
In this article, we'll examine this unique phenomenon and attempt to solve it from multiple perspectives, including developer recommendations and our own experience.
During this process, we'll intentionally review AgentWorkflow's excellent source code, having a cross-temporal conversation with its authors through code to better understand Agentic AI design principles.
We'll also touch upon LLM position bias for the first time, understanding how position bias in chat history affects LLM responses.
These insights aren't limited to LlamaIndex - they'll help us handle similar situations when working with other multi-agent orchestration frameworks.
Let's go.
The Developer-Recommended Solution
First, let's see what the developers say
Before we begin, if you need background on LlamaIndex AgentWorkflow, feel free to read my previous article:

In short, LlamaIndex AgentWorkflow builds upon the excellent LlamaIndex Workflow framework, encapsulating agent function calling, handoff, and other cutting-edge Agentic AI developments. It lets you focus solely on your agent's business logic.
In my previous article, I first mentioned the issue where agents fail to continue processing user requests after handoff.
Others have noticed this too. In this thread, someone referenced my article's solution when asking the developers about it. I'm glad I could help:
Developer Logan M proposed including the original user request in the handoff
method's output to ensure the receiving agent continues processing.
Unfortunately, as of this writing, LlamaIndex's release version hasn't incorporated this solution yet.
So today's article starts with the developer's response - we'll try rewriting the handoff
method implementation ourselves to include the original user request in the handoff
output.
First attempt
Since this solution modifies the handoff
method implementation, we don't need to rewrite FunctionAgent
code. Instead, we'll modify AgentWorkflow's
implementation.
The handoff
method is core to AgentWorkflow's handoff capability. It identifies which agent the LLM wants to hand off to and sets it in the context's next_agent
. During workflow execution, this method merges with the agent's tools and gets called via function calling when the LLM needs to hand off.
This is how AgentWorkflow implements multi-agent handoff.
In the original code, after handoff sets the next_agent
, it returns a prompt as the tool call result to the receiving agent. The prompt looks like this:
DEFAULT_HANDOFF_OUTPUT_PROMPT = """
Agent {to_agent} is now handling the request due to the following reason: {reason}.
Please continue with the current request.
"""
This prompt includes {to_agent}
and {reason}
fields. But since the prompt goes to the receiving agent, {to_agent}
isn't very useful. Unless {reason}
contains the original user request, the receiving agent can't get relevant information from this prompt. That's why the developer suggested including the user request in the prompt output.

Let's modify this method first.
We'll create an enhanced_agent_workflow.py
file and write the modified HANDOFF_OUTPUT_PROMPT
:
ENHANCED_HANDOFF_OUTPUT_PROMPT = """
Agent {to_agent} is now handling the request.
Check the previous chat history and continue responding to the user's request: {user_request}.
"""
Compared to the original, I added a requirement for the LLM to review chat history and included the user's most recent request.

Next, I rewrote the handoff
method to return the new prompt:
async def handoff(ctx: Context, to_agent: str, user_request: str):
"""Handoff control of that chat to the given agent."""
agents: list[str] = await ctx.get('agents')
current_agent_name: str = await ctx.get("current_agent_name")
if to_agent not in agents:
valid_agents = ", ".join([x for x in agents if x != current_agent_name])
return f"Agent {to_agent} not found. Please select a valid agent to hand off to. Valid agents: {valid_agents}"
await ctx.set("next_agent", to_agent)
handoff_output_prompt = PromptTemplate(ENHANCED_HANDOFF_OUTPUT_PROMPT)
return handoff_output_prompt.format(to_agent=to_agent, user_request=user_request)
The rewrite is simple - I just changed the reason
parameter to user_request
and returned the new prompt. The LLM will handle everything else.
Since we modified handoff's
source code, we also need to modify AgentWorkflow's
code that calls this method.
The _get_handoff_tool
method in AgentWorkflow
calls handoff
, so we'll implement an EnhancedAgentWorkflow
subclass of AgentWorkflow
and override _get_handoff_tool
:
class EnhancedAgentWorkflow(AgentWorkflow):
def _get_handoff_tool(
self, current_agent: BaseWorkflowAgent
) -> Optional[AsyncBaseTool]:
"""Creates a handoff tool for the given agent."""
agent_info = {cfg.name: cfg.description for cfg in self.agents.values()}
configs_to_remove = []
for name in agent_info:
if name == current_agent.name:
configs_to_remove.append(name)
elif (
current_agent.can_handoff_to is not None
and name not in current_agent.can_handoff_to
):
configs_to_remove.append(name)
for name in configs_to_remove:
agent_info.pop(name)
if not agent_info:
return None
handoff_prompt = PromptTemplate(ENHANCED_HANDOFF_PROMPT)
fn_tool_prompt = handoff_prompt.format(agent_info=str(agent_info))
return FunctionTool.from_defaults(
async_fn=handoff, description=fn_tool_prompt, return_direct=True
)
Our modifications are complete. Now let's write test code in example_2.py
to verify our changes. (example_1.py
contains the original AgentWorkflow test.)
I'll base the code on this user's scenario to recreate the situation.
We'll create two agents: search_agent
and research_agent
. search_agent
searches the web and records notes, then hands off to research_agent
, who writes a research report based on the notes.
search_agent
:
search_agent = FunctionAgent(
name="SearchAgent",
description="You are a helpful search assistant.",
system_prompt="""
You're a helpful search assistant.
First, you'll look up notes online related to the given topic and recorde these notes on the topic.
Once the notes are recorded, you should hand over control to the ResearchAgent.
""",
tools=[search_web, record_notes],
llm=llm,
can_handoff_to=["ResearchAgent"]
)
research_agent
:
research_agent = FunctionAgent(
name="ResearchAgent",
description="You are a helpful research assistant.",
system_prompt="""
You're a helpful search assistant.
First, you'll look up notes online related to the given topic and recorde these notes on the topic.
Once the notes are recorded, you should hand over control to the ResearchAgent.
""",
llm=llm
)
search_agent
is a multi-tool agent that uses search_web
and record_notes
methods:
search_web
:
async def search_web(ctx: Context, query: str) -> str:
"""
This tool searches the internet and returns the search results.
:param query: user's original request
:return: Then return the search results.
"""
tavily_client = AsyncTavilyClient()
search_result = await tavily_client.search(str(query))
return str(search_result)
record_notes
:
async def record_notes(ctx: Context, notes: str, notes_title: str) -> str:
"""
Useful for recording notes on a given topic. Your input should be notes with a title to save the notes under.
"""
return f"{notes_title} : {notes}"
Finally, we'll use EnhancedAgentWorkflow
to create a workflow and test our modifications:
workflow = EnhancedAgentWorkflow(
agents=[search_agent, research_agent],
root_agent=search_agent.name
)
async def main():
handler = workflow.run(user_msg="What is LLamaIndex AgentWorkflow, and what problems does it solve?")
async for event in handler.stream_events():
if isinstance(event, AgentOutput):
print("=" * 70)
print(f"🤖 {event.current_agent_name}")
if event.response.content:
console.print(Markdown(event.response.content or ""))
else:
console.print(event.tool_calls)
if __name__ == "__main__":
asyncio.run(main())

After research_agent
takes over, it recognizes the user request but still doesn't respond. Our attempt failed. 😭
My Proposed Solution
How I view this issue
In my previous article, I speculated about the cause:

As shown, FunctionAgent
stores all chat messages in a MemoryBuffer
- essentially a FIFO queue where user requests enter first.
After completing function calling based on user requests, FunctionAgent
saves both tool_call
and tool_call_result
as chat messages in memory.
Each function call generates two messages. Multiple tool calls create even more messages.
This pushes the original user request deeper into the queue - either far from the latest message or, due to MemoryBuffer's
After token limit, completely out of the queue.
Consequently, the LLM struggles to perceive the original request from chat history. I'll explain the technical reasons in the position bias section.
When the next agent takes over, it can't immediately respond to the user request.
So I tried a simple fix: After each handoff, I copy the original user request to the queue's end, ensuring the LLM notices it.

Second attempt
This attempt's code is in reordered_function_agent.py
.
The implementation is simple: I subclass FunctionAgent
as ReorderedFunctionAgent
and override take_step
.
class ReorderedFunctionAgent(FunctionAgent):
@override
async def take_step(
self,
ctx: Context,
llm_input: List[ChatMessage],
tools: Sequence[AsyncBaseTool],
memory: BaseMemory,
) -> AgentOutput:
last_msg = llm_input[-1] and llm_input[-1].content
state = await ctx.get("state", None)
if "handoff_result" in last_msg:
for message in llm_input[::-1]:
if message.role == MessageRole.USER:
last_user_msg = message
llm_input.append(last_user_msg)
break
return await super().take_step(ctx, llm_input, tools, memory)
When I detect the last message in llm_input
is a handoff tool_call_result
, I traverse backward to find the user's last request and append it to the queue's end.
To identify handoff tool_call_result
messages, I manually pass a handoff_output_prompt
during AgentWorkflow
initialization, adding a "handoff_result:" string as a marker. The test code is in example_3.py
:
workflow = AgentWorkflow(
agents=[search_agent, research_agent],
root_agent=search_agent.name,
handoff_output_prompt=(
"handoff_result: Due to {reason}, the user's request has been passed to {to_agent}."
"Please review the conversation history immediately and continue responding to the user's request."
),
)
Let's run the test:

This time, research_agent
successfully detects and responds to the user request after taking over. But the result isn't perfect - it doesn't realize the web search and note-taking already happened. It thinks research isn't complete, so the output is just a summary of partial notes rather than a final research report.
I believe this happens because after appending the user request to ChatMemory's
end, previous tool_call
information gets pushed to the front, causing the LLM to lose critical information.
Next, we'll examine the theoretical basis of this problem and propose an ultimate solution.
Theoretical Cause: Position Bias of LLMs
This issue of messages at the queue's front being ignored relates to a rarely discussed topic: position bias.
Since this isn't an academic discussion, I won't cite many research papers or delve deep into theory. If interested, search for "position bias of large language model."
I'll explain this phenomenon in simple terms:

Our text instructions to LLMs typically include two segments: system_prompt
an average and chat history - collectively called the LLM's context.
LLMs have an attention weight decay mechanism. As context expands, attention weights for earlier information naturally decay.
When knowledge sits at chat history's front, its influence diminishes rapidly with new dialogue turns. Experiments show that in an 8k token context window, tokens in the first 10% positions see an average over 60% influence weight drop. (Large Language Model Agent: A Survey on Methodology, Applications and Challenges - Junyu Luo et al., 2025)
System prompts are designed as global control signals, with information there having higher confidence (about 3- 5x weight difference).

Imagine entering a restaurant. You first notice the menu cover (system prompt) featuring special dishes and chef introductions, then seasonal items (latest chat history), and finally regular dishes. Delicacies hidden in regular dishes often get overlooked.
Understanding the cause leads us to the ultimate solution.
My Final Attempt
What I plan to do
Next, I'll walk you through my final attempt. First, here's what our project output looks like after implementing it:

After taking over, ResearchAgent
not only continues processing the user request but fully perceives the search notes, ultimately producing a perfect research report.
My solution approach: