Integrating LlamaIndex and DeepSeek-R1 for reasoning_content and Function Call Features

Empowering AgentWorkflow with the strong boost from DeepSeek-R1

Integrating LlamaIndex and DeepSeek-R1 for Reasoning_Content and Function Call Features
Integrating LlamaIndex and DeepSeek-R1 for Reasoning_Content and Function Call Features. Image by Author

In today's article, I'll show you how to use LlamaIndex's AgentWorkflow to read the reasoning process from DeepSeek-R1's output and how to enable function call features for DeepSeek-R1 within AgentWorkflow.

All the source code discussed is available at the end of this article for you to read and modify freely.


Introduction

DeepSeek-R1 is incredibly useful. Most of my work is now done with its help, and major cloud providers support its deployment, making API access even easier.

Most open-source models support OpenAI-compatible API interfaces, and DeepSeek-R1 is no exception. However, unlike other generative models, DeepSeek-R1 includes a reasoning_content key in its output, which stores the Chain-of-Thought (CoT) reasoning process.

Additionally, DeepSeek-R1 doesn't support function calls, which makes developing agents with it quite challenging. (The official documentation claims that DeepSeek-R1 doesn't support structured output, but in my tests, it does. So, I won't cover that here.)

Why should I care?

While you can find a DeepSeek client on llamahub, it only supports DeepSeek-V3 and doesn't read the reasoning_content from DeepSeek-R1.

Moreover, LlamaIndex is now fully focused on building AgentWorkflow. This means that even if you only use the LLM as the final step in your workflow for content generation, it still needs to support handoff tool calls.

Every agent should have the ability to hand off control to another agent.
Every agent should have the ability to hand off control to another agent. Image by Author

In this article, I'll show you two different ways to extend LlamaIndex to read reasoning_content. We'll also figure out how to make DeepSeek-R1 support function calls so it can be used in AgentWorkflow.

Let's get started.


Modifying OpenAILike Module for reasoning_content Support

Building a DeepSeek client

To allow AgentWorkflow to read reasoning_content, we first need to make LlamaIndex's model client capable of reading this key. Let's start with the client module and see how to read reasoning_content.

Previously, I wrote about connecting LlamaIndex to privately deployed models:

How to Connect LlamaIndex with Private LLM API Deployments
When your enterprise doesn’t use public models like OpenAI

In short, you can't directly use LlamaIndex's OpenAI module to connect to privately or cloud-deployed large language model APIs, as it restricts you to GPT models.

OpenAILike extends the OpenAI module by removing model type restrictions, allowing you to connect to any open-source model. We'll use this module to create our DeepSeek-R1 client.

OpenAILike supports several methods, like complete, predict, and chat. Considering DeepSeek-R1's CoT reasoning, we often use the chat method with stream output. We'll focus on extending the chat scenario today.

Since reasoning_content appears in the model's output, we need to find methods handling model output. In OpenAILike, _stream_chat and _astream_chat handle sync and async output, respectively.

We can create a DeepSeek client module, inheriting from OpenAILike, and override the _stream_chat and _astream_chat methods. Here's the code skeleton:

class DeepSeek(OpenAILike):
    @llm_retry_decorator
    def _stream_chat(
            self, messages: Sequence[ChatMessage], **kwargs: Any
    ) -> ChatResponseGen:
        ...

    @llm_retry_decorator
    async def _astream_chat(
            self, messages: Sequence[ChatMessage], **kwargs: Any
    ) -> ChatResponseAsyncGen:
        ...

We'll also need to inherit ChatResponse and create a ReasoningChatResponse model to store the reasoning_content we read:

class ReasoningChatResponse(ChatResponse):
    reasoning_content: Optional[str] = None

Next, we'll implement the _stream_chat and _astream_chat methods. In the parent class, these methods are similar, using closures to build a Generator for structuring streamed output into a ChatResponse model.

The ChatResponse model contains parsed message, delta, and raw information. Since OpenAILike doesn't parse reasoning_content, we'll extract this key from raw later.

We can first call the parent method to get the Generator, then build our own closure method:

class DeepSeek(OpenAILike):
    @llm_retry_decorator
    def _stream_chat(
            self, messages: Sequence[ChatMessage], **kwargs: Any
    ) -> ChatResponseGen:
        responses = super()._stream_chat(messages, **kwargs)

        def gen() -> ChatResponseGen:
            for response in responses:
                if processed := self._build_reasoning_response(response):
                    yield processed
        return gen()

Since both methods handle processing similarly, we'll create a _build_reasoning_response method to handle response.raw.

In streamed output, reasoning_content is found in delta, at the same level as content, as incremental output:

class DeepSeek(OpenAILike):
    ...

    @staticmethod
    def _build_reasoning_response(response: ChatResponse) \
            -> ReasoningChatResponse | None:
        if not (raw := response.raw).choices:
            return None
        if (delta := raw.choices[0].delta) is None:
            return None

        return ReasoningChatResponse(
            message=response.message,
            delta=response.delta,
            raw=response.raw,
            reasoning_content=getattr(delta, "reasoning_content", None),
            additional_kwargs=response.additional_kwargs
        )

With this, our DeepSeek client supporting reasoning_content streamed output is complete. The full code is available at the end of this article.

Next, let's use Chainlit to create a simple chatbot to verify our modifications.

Verifying reasoning_content output with Chainlit

Using Chainlit to display DeepSeek-R1's reasoning process is simple. Chainlit supports step objects, collapsible page blocks perfect for displaying reasoning processes. Here's the final result:

The reasoning process of DeepSeek-R1.
The reasoning process of DeepSeek-R1. Image by Author

First, we use dotenv to read environment variables, then use our DeepSeek class to build a client, ensuring is_chat_model is set to True:

load_dotenv("../.env")

client = DeepSeek(
    model="deepseek-reasoner",
    is_chat_model=True,
    temperature=0.6
)

Now, let's call the model and use step to continuously output the reasoning process.

Since we're using a Generator to get reasoning_content, we can't use the @step decorator to call the step object. Instead, we use async with to create and use a step object:

@cl.on_message
async def main(message: cl.Message):
    responses = await client.astream_chat(
        messages=[
            ChatMessage(role="user", content=message.content)
        ]
    )

    output = cl.Message(content="")
    async with cl.Step(name="Thinking", show_input=False) as current_step:
        async for response in responses:
            if (reasoning_content := response.reasoning_content) is not None:
                await current_step.stream_token(reasoning_content)
            else:
                await output.stream_token(response.delta)

    await output.send()

As shown in the previous result, our DeepSeek client successfully outputs reasoning_content, allowing us to integrate AgentWorkflow with DeepSeek-R1.


Integrating AgentWorkflow and DeepSeek-R1

In my previous article, I detailed LlamaIndex's AgentWorkflow, an excellent multi-agent orchestration framework:

Diving into LlamaIndex AgentWorkflow: A Nearly Perfect Multi-Agent Orchestration Solution
And fix the issue where the agent can’t continue with past requests

In that article, I mentioned that if the model supports function calls, we can use FunctionAgent; if not, we need to use ReActAgent.

Today, I'll show you how to use LlamaIndex's ReActAgent to enable function call support for the DeepSeek-R1 model.

Enabling function call with ReActAgent

Why insist on function call support? Can't DeepSeek just be a reasoning model for generating final answers?

Unfortunately, no, because of the handoff tool.

AgentWorkflow's standout feature is its agent handoff capability. When an agent decides it can't handle a user's request and needs to hand over control to another agent, it uses the handoff tool.

This means an agent must support function call to enable handoff, even if it doesn't need to call any tools.

You might point out that the AgentWorkflow source code explains if an agent is configured with can_handoff_to but the list is empty, it won't concatenate the handoff tool, so no function call is needed.

When can_handoff_to is an empty list, the handoff_tool won't be concatenated.
When can_handoff_to is an empty list, the handoff_tool won't be concatenated. Image by Author

That's not the case. Under normal circumstances, we create agents based on FunctionAgent. FunctionAgent's take_step method calls the model client's astream_chat_with_tools and get_tool_calls_from_response methods, both related to function call. If our model doesn't support function calls, it'll throw an error:

When you use DeepSeek-R1 in FunctionAgent, it will throw an error.
When you use DeepSeek-R1 in FunctionAgent, it will throw an error. Image by Author

So what should we do?

Luckily, besides FunctionAgent, AgentWorkflow offers ReActAgent, designed for non-function call models.

ReActAgent implements a "think-act-observe" loop, suitable for complex tasks. But today, our focus is on using ReActAgent to enable DeepSeek-R1's function call feature.

Let's demonstrate with a simple search agent script:

First, we define a method for web search:

async def search_web(query: str) -> str:
    """
    A tool for searching information on the internet.
    :param query: keywords to search
    :return: the information
    """
    client = AsyncTavilyClient()
    return str(await client.search(query))

Next, we configure an agent using ReActAgent, adding the search_web method to tools. Since we only have one agent in our workflow, we won't configure can_handoff_to:

search_agent = ReActAgent(
    name="SearchAgent",
    description="A helpful agent",
    system_prompt="You are a helpful assistant that can answer any questions",
    tools=[search_web],
    llm=llm
)

We'll set up AgentWorkflow with the agent and write a main method to test it:

workflow = AgentWorkflow(
    agents=[search_agent],
    root_agent=search_agent.name
)


async def main():
    handler = workflow.run(user_msg="If I use a $1000 budget to buy red roses in bulk for Valentine's Day in 2025, how much money can I expect to make?")
    async for event in handler.stream_events():
        if isinstance(event, AgentStream):
            print(event.delta, end="", flush=True)
ReActAgent uses the search_web tool to fetch information.
ReActAgent uses the search_web tool to fetch information. Image by Author

As seen, ReActAgent provides the method name search_web and parameters in Action and Action Input. The method executes successfully, indicating DeepSeek-R1 now supports function calls through ReActAgent. Hooray.

Enabling reasoning_content output with ReActAgent

If you run the previous code, you'll notice that it takes a long time from execution to result return. This is because, as a reasoning model, DeepSeek-R1 undergoes a lengthy CoT reasoning process before generating the final result.

But we extended our DeepSeek client for reasoning_content, and configured it in AgentWorkflow. So why no reasoning output?

ReActAgent uses AgentStream for streaming model results. However, AgentStream lacks a reasoning_content attribute, preventing client-returned reasoning from being packaged in AgentStream.

Should we modify ReActAgent's source code, like we did with OpenAILike?

Actually, no. Earlier, we modified OpenAILike because the model client is a foundational module, and we needed it to parse reasoning_content from raw text.

But AgentStream has a raw property that includes the original message text, including reasoning_content. So, we just need to parse reasoning_content from raw after receiving the AgentStream message.

We just need to parse the reasoning_content from the AgentStream.
We just need to parse the reasoning_content from the AgentStream. Image by Author

We'll create a ReasoningStreamAdapter class, accepting an AgentStream instance in its constructor and exposing AgentStream attributes with @property, adding a reasoning_content attribute:

class ReasoningStreamAdapter:
    def __init__(self, event: AgentStream):
        self.event = event

    @property
    def delta(self) -> str:
        return self.event.delta

    @property
    def response(self) -> str:
        return self.event.response

    @property
    def tool_calls(self) -> list:
        return self.event.tool_calls

    @property
    def raw(self) -> dict:
        return self.event.raw

    @property
    def current_agent_name(self) -> str:
        return self.event.current_agent_name

    @property
    def reasoning_content(self) -> str | None:
        raw = self.event.raw
        if raw is None or (not raw['choices']):
            return None
        if (delta := raw['choices'][0]['delta']) is None:
            return None

        return delta.get("reasoning_content")

Usage is similar to this:

async for event in handler.stream_events():
    if isinstance(event, AgentStream):
        adapter = ReasoningStreamAdapter(event)
        if adapter.reasoning_content:
            print(adapter.reasoning_content, end="", flush=True)
    if isinstance(event, AgentOutput):
        print(event)

Testing the final result with chat UI

After modifying DeepSeek-R1 for function call and reasoning_content, it's time to test our work. We'll use the Chainlit UI to observe our modifications:

Now, AgentWorkflow can smoothly output the reasoning_content.
Now, AgentWorkflow can smoothly output the reasoning_content. Image by Author

As shown, we ask the agent a complex task, and DeepSeek performs chain-of-thought reasoning while using the search_web tool to gather external information and return a solution.

Due to space constraints, the complete project code isn't included here, but you'll find it at the end of this article.


Further Reading: ReActAgent Implementation and Simplification

If we review the earlier implementation, we'll notice that agents based on ReActAgent generate a lot of output. This is normal because ReActAgent's reasoning-reflection mechanism requires multiple function calls to gather enough information to generate an answer.

But our goal was simply to find a way to make DeepSeek-R1 support function calls. So, is there a way to simplify the agent's output? To achieve this, we should first understand how ReActAgent works.

The parsing process of ReActAgent.
The parsing process of ReActAgent. Image by Author