Skip to content

Conversation

vacmar01
Copy link
Contributor

@vacmar01 vacmar01 commented Sep 11, 2025

Add dspy.Stateful Module for Automatic History Management

Problem

Managing conversation history with dspy.History currently requires significant boilerplate:

  • Manually adding history: dspy.History = dspy.InputField() to signatures
  • Creating and maintaining history instances
  • Manually appending conversation turns after each call

Solution

dspy.Stateful is a zero-modification wrapper that automatically handles history management for any DSPy module.

Key Features:

  • Works with any module: Predict, ChainOfThought, ReAct, custom modules
  • Automatic signature enhancement: Uses signature.prepend() to add history fields
  • Transparent state management: Automatically maintains and updates conversation history

Usage

# Before: Manual history management
class QA(dspy.Signature):
    question: str = dspy.InputField()
    history: dspy.History = dspy.InputField()  # Manual addition
    answer: str = dspy.OutputField()

predict = dspy.Predict(QA)
history = dspy.History(messages=[])
outputs = predict(question=question, history=history)
history.messages.append({"question": question, **outputs})  # Manual update

# After: Automatic history management
qa = dspy.Predict("question -> answer")  # Original signature unchanged
stateful_qa = dspy.Stateful(qa)

response1 = stateful_qa(question="What's Python?")
response2 = stateful_qa(question="Is it fast?")  # Automatically has context

Implementation

The wrapper:

  • Deep copies the module to avoid modifying the original
  • Enhances all predictor signatures with history fields
  • Automatically injects and updates history on each forward pass

This eliminates boilerplate while maintaining full compatibility with existing DSPy modules and patterns.

@ianyu93
Copy link

ianyu93 commented Sep 11, 2025

I can see how this is a quality of life feature, but also confusing. It basically is a chat app wrapper, where typically application decides how much history to send in programmatically. But for many other patterns of dspy program, say rewrite query before a search, etc, it is not exactly helpful. Of course user has to choose when to apply statefulness vs. statelessness, but I wonder how needed is this.

That said, if it's extensible, so basically being able to add other modules or MCPs? Then maybe it's something. Not sure

@NirantK
Copy link
Contributor

NirantK commented Sep 11, 2025

I really like this as a quality of life feature!

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@okhat I am not entirely sure about if we should add this. My personal opinion is I want users to manage the conversation history because this is a customizable, e.g., how to truncate the history, how to compress the conversation and so on, but maybe you have seen this request from the community?

@rawwerks
Copy link

as much as i like being able to brag about writing my own modules because there were no chat adapters back in the day, i think i can safely say that there is sufficient community interest around easier abstractions for stateful agents: https://x.com/raw_works/status/1965459911686631866

even if this PR isn't the right approach, why do we have to pass history back in manually as an input? https://deepwiki.com/search/how-does-dspy-handle-chat-hist_4be30063-0511-4977-bd04-f61a08b95af1

if you don't like the style of this PR, maybe there can just be a "stateful" boolean flag built in?

from my perspective, this should be as easy for the user as setting cache=true/false.

lot's of good reasons you would or would not want cache. lot's of good reasons you would or would not want chat history.

(2025 is the year of agents, btw)

@MaximeRivest
Copy link
Contributor

I echo rawwerks here.

I encourage to work even more on it to remove some of the mentioned concerns through innovation and technical solutions, but if chainofthought has its place, I don't see why this one would not.

Unless, DSPy decides to stay low level. That is valid, but if so, it should be explicitly stated, in many places, so that it help the community know where to go contribute such things in the dspyverse.

@joelgrus
Copy link

  1. in general I think that making modules stateful is a bad idea.

  2. in particular a stateful implementation of conversation history presumes that there is only one conversation happening at a time and that it's always talking to the same instance of the module. these may be fine assumptions if you're building a small demo application, but if you're building horizontally scaled agents it doesn't work. or even if you're just mixing up different calls to the same module.

  3. how does this feature interplay with dump_state / save / load / etc? is the history part of the state? does it get saved with the module? should it be?

  4. there are a lot of ways for a module to be "stateful", this is one I guess, but there could be a lot of others, so that makes me iffy about the name

  5. there's nothing here that really needs to be in the core library, I don't think? like it would be easy to publish something like this as dspy-stateful-history and then people can opt into it if they like it. (I have been thinking about a similar pattern for some of my quality-of-life tools).

  6. I would welcome tools to make managing history easier (although I don't find it particularly onerous the way it is), but I'm not convinced it's a good idea to have the module magically do it behind the scenes ("explicit is better than implicit")

@vacmar01
Copy link
Contributor Author

Oh, I love this discussion :) really appreciate it.

The PR was more meant to gauge the interest in a feature like this. I'm sure that the implementation can be further enhanced.

But here are my two cents, why I think something like this is a good idea and actually in line with DSPy's goal/philosophy.

  • While this implementation abstracts away only one way to manage the conversation history, I'd argue that it's the most common one for conversational applications. I implemented this pattern over and over manually and always thought that this is too much boilerplate and too much ceremony to make multi-turn interactions work.
  • While something like Stateful may be specific to these conversation type of interactions, there are other highly specific modules in DSPy, like ProgramOfThought or ReAct which is also a very specific way of building agents. Analogous to this I don't see why a module that enables multi-turn interactions would be too "specific" or "high level".
  • Of course you wouldn't use this module everywhere. It is specialized towards a certain use case, just like the modules mentioned above or even something like Refine.
  • While I see the merit of publishing something like this as a seperate library and I'm always very conscious about scope creep, you can opt in our out to this feature also if it is part of the main library. If you don't need it, just don't use it. That's the beauty of Modules - they are per definition modular and you can pick and choose whatever you need or like.

That being said, I can absolutely understand if this implementation is too specific or non-scalable or if the DSPy team wants to keep the history management more low level. In the end the core maintainers are the people knowing the library, the direction and the overall philosophy of the library best.

To address some points raised by @joelgrus:

@1: I imagine it being used by initiating a new instance of the class for every new interaction, like you would initiate a new dspy.History instance for every conversation.
@2: Currently the history is not being saved by save - I would rather keep the history seperate from the underlying LLM module. Maybe methods like .load_history and .save_history would be a good idea that enable saving and loading the history as json or to a database.
@3: You are absolutely right here. There are tons of ways modules can be stateful as there are tons of ways of managing history. But the module is not meant to be a fit for every use case (see my points further above).
@4: This is an absolutely valid point. If the maintainers decide this is no fit for the main library I can totally understand (although I think there are arguments why this may be a good fit for the main library - as stated above)
@5: DSPy itself is highly implicit in a lot of ways and there is happening a lot of stuff behind the scenes (signatures becoming prompts, responses being parsed into Prediction instances, ChainOfThought altering the underlying signature by adding a reasoning field). I can absolutely understand why one would prefer a more explicit approach, but I do think that an implicit approach like here aligns with the library's existing patterns.

Again, I really value this constructive discussion here :)

@joelgrus
Copy link

so here's a very toy example of multiplayer, multi-turn QA running as an API (which seems like a very reasonable / common use case):

https://gist.github.com/joelgrus/eb144fda2d9b94429ba1ed1ca48e2861

I think that implementing conversation history as "stateful" modules precludes this kind of application, you'd have to somehow maintain / persist a copy of the module for every conversation (which seems bad!). And if you want to scale to multiple backend servers, then what?

and the non-"database" parts of history are actually very little code.


having written that out, there are two parts of the history that do feel unergonomic:

  1. history = dspy.History() feels like it should give you a clean history with no messages yet, but instead it gives an error
  2. history.messages.append(...) is awkward and a little abstraction breaking, I'd be happier if there were like history.append() and history.extend (or if you don't want people to pretend it's a list, history.add_message` or whatever, the point is that if you're going to have a first-class History, make it a first-class history, not just a container for a list, allow max size, etc, etc, etc)

@ianyu93
Copy link

ianyu93 commented Sep 12, 2025

So to echo @joelgrus a bit more (sorry), in a chat system, the following could also happen:

  • When a list of messages come in, separate calls can be used to determine what could be short term vs long term memory, this is in parallel of response generation
  • There's also compaction, when messages are at certain length, there would be a summarization call, and what's passed is [summarized_message + current message]

A proper chat system is actually a lot more complicated if we want intelligence, and the better practice is usually make APIs stateless. I do think that while ProgramOfThought or ReAct are specialized LM interaction, they remain stateless, and programmatically that's a big difference, so I don't think it's exactly in the same category, since this takes abstraction to a different level. ProgramOfThought and ReAct are still simulating how LMs can be invoked, whereas this is more of an Application.

While painful and cumbersome to keep writing boilerplates (like passing in messages), I don't think application developers are usually that opposed to write them, because usually you'd want more control on application side. But I can see how there maybe a class of things (as opposed to just modules) like this one that becomes "lightweight applications" (rather than just programs), especially for prototyping. I don't see how easy it is to transition these to production though.

@armoucar-neon
Copy link

Really cool discussion here. This is a real pain point I have today and while I love DSPy, I particularly hate the dspy.ReAct implementation.

This isn't specific to dspy.ReAct, but having to pass chat history as a signature parameter feels wrong to me - it's like a hack.

Another point is how the history is built for dspy.ReAct specifically. By using 2 internal modules, the history looks messy. I don't know if this is just my OCD, but deviating from the standard format of how DSPy builds context and passing the trajectory as a JSON/dict is really ugly. Plus we're also passing garbage in the context window, where we have fields being passed as None (see attached image).

This personally bothers me a lot and I don't feel comfortable using DSPy in a multi-turn agent architecture. For any other LLM task, it's my library of choice, but for this specific use case, I still don't use it because I feel it lacks maturity.

image

@ianyu93
Copy link

ianyu93 commented Sep 13, 2025

Someone was sharing their work on Discord: https://www.modaic.dev/
To quote them specifically:

we decided to create a medium for it, similar to Hugging Face, where the primitives like metrics and optimizers are first class citizens for the community to share.

I don't know if this is going to be it, but I do think while there's interests on the application side, it probably can be outside of core

@MaximeRivest
Copy link
Contributor

Sorry in advance—I might sound a little grumpy. My motivation here is to defend the less experienced developers and beginners out there (the very folks I’ve spent the past few months introducing to dspy).

Many of you are questioning whether this should be in the core—without explaining why raising that question is even worth it. What are the real drawbacks of including it in the core, and why are those drawbacks more important than the benefits seen by those who want it? Please remember, not all Python users are app or software developers. There are many types of Python users, and it’s likely that other groups need the very things you dismiss. Please explain how its presence in the core would actively harm the Python users you’re most familiar with, and why that harm outweighs the benefits it might provide to other users you may be less familiar with.

Personally, I love dspy because it boldly removes boilerplate and distills AI programming to its essential parts. The history and current approaches to statefulness are very much at odds with that vision, and I dislike arguments like “it’s not that hard” or “you get more control”. If dspy were about control and ease, we would still be making raw HTTP requests. Also, there was no suggestion to remove dspy.History you can reach for the control and you can code with dspy.History if you need/want to.

It can be quite discouraging to work on improving the statefulness problem in modules and programs when the conversation isn’t focused on how to solve it and the possible pitfalls, but instead on “it doesn’t belong here.” I keep telling everyone that dspy is open to pull requests and exists for everyone’s use. But the conversation here—and the one around template adapters—makes me reevaluate that belief.

I take the time to write that because I am 100% convinced that those questioning dspy.Stateful have very very valid experiences, knowledge and perspectives that would help us reach the 'ideal' design of that module and you have shared some of that and this is great! But I think it would help dspy and its community more if we all focused on identifying the problems in dspy.Stateful module and brainstorm potential solutions and decide afterwards to give up if it is too hard for us. Instead of shutting it down right out of the gate because dspy.History give us control and it's 'not that hard'.

I have had to decide 3 times now to make libraries in the dspyverse but not in the core: ovllm, functai and attachments. I can see that not everything goes in core but addressing the statefullness / multiturness seems very core to me and as a proof of that dspy.History exists and is in core. I simply believe we still have work to do and we should encourage that work and it appears to me that the solution will be in some sort of module modifier, which this is very much into that direction.

@ianyu93
Copy link

ianyu93 commented Sep 13, 2025

@MaximeRivest Very fair points and sorry for making so much assumptions over the threads. Also thank you for holding the thread accountable. I'm going to take a stab at this.
Also honestly, @vacmar01 thank you for invoking this thread.

First, this obviously is still up for the maintainers, as opinionated / taste matters. Personally, I see the value of the motivation itself, even as part of dspy, but I'm unsure about its current form.

Complexing

In general, it comes down to complexing. Just because something is easy to write (usually the motivation to get rid of boilerplates), doesn't mean it's not complex. Complexing is to intertwine, and in design sense, it's intertwining different things together. For example, Streamlit and Gradio are libraries that make it easy, but they are complex by nature, because they complex so many different things (context, states, etc etc..) into single loops, making them not extensible for more sophisticated work.

DSPy at its core, as you mentioned, breaks down AI programming to right level of primitives. Each primitive (Signature, Module, Adaptors, ..) has clear separation of concern, somehow captures the right interaction point, and at the core of it, they're simple, because they're one thing for one job, which is what makes them composable.

Modules like ProgramOfThought or ReAct are still stateless, still align with how modules make up workflows, just a specific implementation of modules, like Transformer block. The current PR is complexing, or at least feels so, because having modules to manage history is akin to remembering inputs. The current form is a complexing act, or at least it feels so, because there's an extra dimension of history management, so my initial thought was more around, maybe it's not another module but a new class, or make history management extensible.

Another reason why the allergic reactions showed up is because it seems to force interaction with LMs in a very specific way, chat app. Examples given are also geared towards that. I have a feeling ProgramOfThought and ReAct are largely ignored because I'm not sure how often they would be used in applications, and they are treated more like implementation of their papers. Current form of Statefulness is not in the same way, but more like a lightweight application. Therefore, it seems more of enforcing how a chat app pattern would look like. The reason why it may feel harmful is because what's available at core would affect how developers think of patterns, and chat app patterns are usually not just passing past history.

Counter Points

Now, in the spirit of examining ourselves as well, I'm going to present some counter points.

We could argue, maybe in certain programs statefulness could be helpful. For example:

class ContextRewrite(dspy.Module):
    def __init__(self):
        super().__init__()
        self.predictor = dspy.Predict(dspy.Signature("new_context: str, current_context: str | None -> merged_context: str"))

    def forward(self, summary: str):
        current_context = None
        if self.history:
            current_context = self.predictor.history[-1]["response"].choices[0].message.content
        return self.predictor(current_context=current_context, new_context=summary)   

class LongPDFAnalyzer(dspy.Module):
    def __init__(self):
        super().__init__()
        self.summarizer = dspy.Predict(dspy.Signature("chunk: str, context: str | None-> summary: str"))
        self.context_rewriter = ContextRewrite()
        self.context = None

    def forward(self, pdf_chunk: str):
        summary = self.summarizer(chunk=pdf_chunk, context=self.context)
        merged_context = self.context_rewriter(summary)
        self.context = merged_context.merged_context
        return dspy.Prediction(summary=summary, context=merged_context)

This particular program doesn't work well, but you can see how maybe keeping context for long form extraction may be helpful, and having statefulness could be helpful, so I do think there's a wider argument that statefulness isn't just for a chat app.

Hmmm

Ok, as I type out more, maybe this is a fine module, and I don't think having a separate class to represent application makes sense anymore, because this implementation doesn't have to restrict to chat apps.

So then, maybe having more methods on managing history in different ways, such as take_n history, or compact to summarize previous history, would work?

@vacmar01 what are your thoughts on that

@vacmar01
Copy link
Contributor Author

vacmar01 commented Sep 13, 2025

To restate the motivation why I proposed something like Stateful:

  • One of the benefits of using DSPy is its awesome developer ergonomics. @MaximeRivest has a great example in his latest video on how little code you need to do stuff in DSpy compared to other "libraries". I find the ergonomics around dspy.History to be much worse compared to many other parts of DSPy and this was one idea on how to improve it. I think we should absolutely discuss the implementation details and I'm sure there are many improvements to my implementation still possible. For what it's worth, I took some inspiration from the awesome Claudette library by answer.ai, which provides similar ergonomics in their Chat class.
  • The proposed approach is a module. You can use it or not. It does not change any other behaviour of the library and its current implementation is < 50 LoC (without tests), so it wouldn't add much bloat to the library anyway. Again, it's a pure add-on and you can use dspy.History manually like you did before. In that sense it's similar to dspy.ReAct, which can also be implemented manually and there are merits to do it manually as stated in the tutorial on tool use.

@ianyu93 To extend the implementation with additional functionality like loading histories, saving histories, compacting (maybe with a DSPy module) or a max_history_length (maybe as a parameter) are all great ideas.

Much of the discussion above is focused on my specific implementation (which is fair considering this is a concrete PR after all). But I think we need to focus the discussion more on whether any form of abstraction around history management is something that DSPy wants to provide or not.

In my opinion an ergonomics improvement would be very much in line with the overall philosophy of the library, but I'm open to other opinions. And I also think that a module (in whatever form) is the way to go.


Edit: On further thought: Maybe renaming this to something like dspy.Chat would make the scope and intended use of this module clearer.

@ziv-bakhajian
Copy link

I tend to agree with @MaximeRivest and think about it from an ecosystem perspective. It is a DX + ecosystem building discussion. As the framework mature, the DX and the ecosystem promotion need to be taken into account while preferring philosophy over over-opinionated closeness. To build an ecosystem, one need to take the newcomer into account while introducing them to the philosophy driving the ecosystem.

The way I interpret it, dspy is the programatic (declarative) framework for AI software. From a broader perspective, this discussion should be on the necessity of a State primitive and statefulness of AI software.

As a heavy user of the framework (and first time contributor to the discussions here), building pipelines and AI interfaces regularly using dspy, statefulness is a core for most sophisticated applications of the framework. The framework aims to provide the primitives for AI software and implement common modules in a robust and "best-practice" manner. Thus, adding state related primitives should be considered.

to quote @joelgrus

there are a lot of ways for a module to be "stateful", this is one I guess, but there could be a lot of others

I agree, there are.
Hence, the underlying implementation need to consider state as a primitive, while strategy of state management shall be implemented in an injective manner by the developer and common patterns and abstractions shall be implemented in the framework.
The example introduced by @ianyu93 in the comment above is yet another example proving how smart DX utilizing state primitives can make lives easier.

Then, a simple history should be implemented as in the PR and can be used in examples for newcomers to start with stateful AI software and allow the community to support easy adoption for use cases requiring it.

BTW, ReAct and some of the core modules implement their own state in hand crafted manner. meaning, state is a way to promote the frameworks own philosophy.


@vacmar01 thanks for promoting the need

@ianyu93
Copy link

ianyu93 commented Sep 13, 2025

Yeah thanks for helping to wrap my head around. @vacmar01 I actually think Stateful is better now than Chat 😂 because there are non chat ways to utilize it

@ziv-bakhajian so if I understand correctly, you're thinking of 1. There could have a new primitive State, while this PR's implementation can be a simple example to start with?

@ziv-bakhajian
Copy link

@ianyu93 Yes.

I did not finished to wrap my head around the ergonomics, but a state base class with a parameter-based invocation at the beginning and an update at the end (as signatures to be implemented) of a stateful module call might be a start.

the parameter-based invocation means to invoke from the state the inference state based on parameters forwarded by the stateful modules forward method.

To take @joelgrus's multi conversation example,

  1. the conversation_id will be forwarded to the module with the signature inputs.
  2. the (stateful) module in turn will invoke the state for the right history (the inference state) based on the conversation_id parameter.
  3. it will then feed it as additional parameter to the underlying module.
  4. the result with the parameters from the forward call, will be then provided to an update method that will implement the boilerplate.
  5. and finally the result will be returned.

notice that I did not assume that the state is managed by the module as one might want to forward the state between modules and might prefer to provide the state as a parameter to the forward method rather than manage it in the module. IMO, both should be supported.

There are a lot of signature manipulations and ergonomics to be decided. this is an initial trajectory to develop.

@rawwerks
Copy link

rawwerks commented Sep 13, 2025

Another reason why the allergic reactions showed up is because it seems to force interaction with LMs in a very specific way, chat app.

it doesn't force anyone to use dspy as a chat app if you default to dspy being stateless.

despite the last three years of people saying "chat isn't the right ux for ai" - there are about a trillion dollars of capital that disagrees with this. not to mention that 99.9% of the general public equates "ai" with "chat".

i don't understand why it's so painful to the dspy team to acknowledge that a lot of people want chat, and that managing a message history array simply isn't that hard (for you), but is just hard enough (for devs) to turn them away from dspy.

i think you are smart enough to implement this in a way that doesn't accidentally blow up the rest of dspy

it ultimately boils down to this: who is dspy for?

@joelgrus
Copy link

(obvious caveat: I am not part of the dspy team, just an enthusiast, so all this is just like my opinion, man. but also I was for several years a core engineer working on the allennlp library, so I do have a lot of hard-won experience about the importance of being very thoughtful and deliberate about adding new features and abstractions to widely-used open-source libraries)

It seems like there are a few different things going on here:

  1. Should dspy.History be easier to use / should it be easier to make multi-turn chat apps? It seems like a lot of people think the answer to these questions is yes. There are a huge number of ways these could be solved, some in the core library, some as companion libraries. (again keeping in mind as per my example above that in many real applications you would need to manage / persist multiple conversation histories at the same time, possibly in a distributed way)

  2. Should there be a more explicit notion of "state" in dspy? (By way of comparison, a langgraph graph is fundamentally designed around a State object that gets passed around the graph and modified). To me this seems more at odds with the design / philosophy of dspy. One key idea of dspy is that our primitives are Signatures. (question, history -> answer) rather than "state transformations". Obviously as in our examples there are a lot of times where we have some kind of state that we maintain / update as we call Modules, but is there really a missing abstraction there? That's not obvious to me. If we think there is a missing abstraction, let's try to figure out exactly what it is, and what's the right way to build it, and how it fits in with the rest of the dspy model.

  3. Is it a good idea to make Modules stateful? Here my instincts tell me no. Another key idea behind dspy is that a Module is something you optimize using labeled examples. What would this mean when the Module is stateful? (whether that state is "conversation history" or something else)? What happens when you try to optimize this Stateful module? What should happen when you optimize a "stateful" model? What happens when you save or load a stateful module? How does statefulness interplay with async? etc etc. I think these are deep, hard library design (and maybe philosophical) questions that should be thought through deliberately. This is a lot of added complexity for (potentially) a little bit of developer ergonomics.

@joelgrus
Copy link

joelgrus commented Sep 13, 2025

Here's a toy stateless version of Predict that automatically handles history, via an injected HistoryRepository. This is not a perfect solution, but something like this sidesteps a number of the problems listed above:

from collections import defaultdict
from abc import ABC

import dspy

dspy.configure(lm=dspy.LM('gemini/gemini-2.5-flash-lite'))


class HistoryRepository(ABC):
    """
    Abstract base class for history repositories.
    You could store it in memory, a database, or any other storage system.
    """
    def __getitem__(self, key: str) -> dspy.History:
        raise NotImplementedError()
    
    def __setitem__(self, key: str, value: dspy.History) -> None:
        raise NotImplementedError()
    

class InMemoryHistoryRepository(HistoryRepository):
    """
    defaultdict by key, grows without bound!
    """
    def __init__(self) -> None:
        self.histories = defaultdict(lambda: dspy.History(messages=[]))
    
    def __getitem__(self, key: str) -> dspy.History:
        return self.histories[key]

    def __setitem__(self, key: str, value: dspy.History) -> None:
        self.histories[key] = value


class PredictWithHistory(dspy.Predict):
    """
    A Predict module that automatically manages conversation history.
    Stateless and allows for injectable history repository.
    """
    def __init__(
            self, 
            signature,
            history_repository: HistoryRepository,
            history_key_field: str = "user_id",
            callbacks = None,            
            **config) -> None:
        super().__init__(signature, callbacks=callbacks, **config)
        self.history_repository = history_repository
        self.history_key_field = history_key_field

        self.signature = self.signature.prepend(
            name="history",
            field=dspy.InputField(),
            type_=dspy.History
        )

    def forward(self, **kwargs):
        if self.history_key_field not in kwargs:
            raise ValueError(f"Missing history key field '{self.history_key_field}' in input arguments.")

        history_key = kwargs.pop(self.history_key_field)
        history = self.history_repository[history_key]
        kwargs["history"] = history

        res = super().forward(**kwargs)

        # Build history entry
        turn = {k: v for k, v in kwargs.items() if k != "history"}
        if isinstance(res, dspy.Prediction):
            turn.update(dict(res))
        elif isinstance(res, dict):
            turn.update(res)
        else:
            turn["output"] = res

        history.messages.append(turn)
        self.history_repository[history_key] = history

        return res

    def aforward(self, **kwargs):
        raise NotImplementedError("Asynchronous version is not implemented yet.")

# Your code starts here    

repo = InMemoryHistoryRepository()

qa = PredictWithHistory(
    "question -> answer",
    history_repository=repo
)

response_a_1 = qa(question="What's Python the programming language?", user_id="user_a")
print(response_a_1)

response_b_1 = qa(question="What's a Porsche?", user_id="user_b")
print(response_b_1)

response_a_2 = qa(question="Is it fast?", user_id="user_a")
print(response_a_2)

response_b_2 = qa(question="Is it fast?", user_id="user_b")
print(response_b_2)

(it creates other problems once you start optimizing, as its signature now has "secret" fields that you have to account for, but maybe that's ok)

@MaximeRivest
Copy link
Contributor

I’m still exploring, but it may be useful for others who are also experimenting to note that dspy already “tracks” traces and histories — though not to be confused with dspy.History. Instead, this happens through:

  • dspy.clients.base_lm.GLOBAL_HISTORY

  • <program>.history

  • <lm>.history

  • dspy.settings.trace

trace.pop(0)

self.update_history(entry)

My intuition is that the “best” solution should take these into account. It seems to me like the statefulness that most of us desire is in fact a 'automatic' selection -> cloning -> ?modifying? -> growing of a previously existing trace.

ps: seems like those could also by mined for creating tooling that would help in the production of synthetic training set.

@MaximeRivest
Copy link
Contributor

And Refine.py shows us how to scope traces.

image

@BenMcH
Copy link
Contributor

BenMcH commented Sep 13, 2025

I'd like to throw a few opinions out here as someone who has been watching from the sidelines and has a few thoughts. My initial knee-jerk reaction was "Why would I want this?" and I was against modules that add statefulness because I believed it was the wrong direction in a lot of cases, but I took a step back and evaluated why I would not use it.

Statefulness has been ingrained into a core principal of building APIs and is what I have adhered to while building AI solutions with clients. The reason for statefulness in API design is a few reasons:

  1. It allows for (easier) load balancing and scaling. Can you load balance with stateful applications? Sure. Sticky sessions exist but are more challenging to implement successfully.
  2. Stateful APIs tend to leak abstractions in a way that a freshly deployed application would behave differently to one with built up state.

These 2 issues are not identical but are closely related.

Now here's where I had a realization and a change of heart. Just because I might not use this feature in its current form does not mean that it should be rejected outright. It solves a real problem that others face that I think is worth talking about. Persistence and history can be a challenging problem. As an example, Langchain and LangGraph have at least 2 methods of saving and loading chat histories. In langchain, there are message stores that simply store previous messages in history, effectively what this PR is aiming to make a drop-in feature. In langgraph, the persistence of threads is much more heavy handed but can store all of your programs' state at a given point in time, encompassing more than just the messages but also internal graph state (in DSPy, this would be module state)

I would love to see what this feature could become given enough time and attention from the community. Perhaps we have a generic dspy.Storage or some other root pattern that gives us the ability to create plugins for any underlying storage, whether it be the file system, postgres, or any other data store.

An Idea for the Community

Does it make sense for the community surrounding DSPy to create something akin to langchain_community? Essentially a community driven set of experiments, modules, etc that, for whatever reason, are not yet ready to enter the main repo? @MaximeRivest has done some great work with things like the attachments library, but by having a bunch of external libraries hosted across many other repos, it becomes more challenging to find those tools to solve your problems if you aren't involved in the community at a deep level.

@chenmoneygithub
Copy link
Collaborator

Thanks all for the discussion, this is very informative and engaging.

Why we chose dspy.History instead of auto management?

First I want to share why we went with dspy.History to manage the conversation history. With dspy.History we get the following functionality:

  • History becomes part of the input of the module, so it's visible to DSPy optimizers. If that's not an input, then the built few-shot examples will not have the history in it.
  • DSPy can format history across different adapters.
  • Allow users to customize history management, and keep DSPy staying low level.

Should we add automatic conversation management, and how?

Given the strong community interest, I think it’s worth exploring. My view is that this belongs at the dspy.Predict level (since that’s our LM interface), rather than at the dspy.Module level. We can maybe proceed with something like dspy.Predict(Signature, auto_history=False), and when auto_history=True, the history is automatically managed.

cc @okhat

dspy-community, dspyverse, or something similar?

To me, this isn’t really about repo ownership - it’s about ecosystem building. We’ve discussed spinning up a repo (e.g. dspy-agent) with higher-level APIs, while keeping dspy itself lean and low-level, which is similar to the relationship between Keras and TensorFlow.

Progress has been a bit slow due to bandwidth, but this direction is definitely on our radar. Please stay tuned!

@vacmar01
Copy link
Contributor Author

vacmar01 commented Sep 16, 2025

I think the analogy between TensorFlow and Keras (or also Fast.ai and PyTorch) is a solid one and I see merit in keeping DSPy on the "TensorFlow" or "PyTorch" level.

Considering the discussion here in the thread, I vote for a separate repo like dspy-agent to incorporate higher level utils for building (stateful) agents using DSPy (maybe not analogous to Keras but more to torchvision or similar). I think there are many more commonly used agent patterns (human in the loop as an example) that currently need to be built from lower level abstractions in DSPy. I see a need for higher level abstractions for LLM agents and would like to help and build this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants