
Your Simple Guide to RAG in Spring AI: Give Your App a "Memory"
Build an AI with memory using Spring AI and RAG. Learn how to connect ChatClient to a Vector Database like PGVector for context-aware answers.
Munaf Badarpura
October 23, 2025
8 min read
You've built your first GenAI app with Spring AI in this article . You're using the ChatClient to tell jokes and answer questions. But you quickly hit a wall: it's stateless.
It knows about the world, but it knows nothing about you. It can't answer questions about your company's technical documents, your product catalog, or your personal notes. If you ask it a question and then ask a follow-up, it has no idea what you're talking about.
So, how do you give your AI a "memory"? How can you make it an expert on your specific data?
The answer is Retrieval-Augmented Generation (RAG), powered by a Vector Database.
If you've ever felt that building AI "memory" involved complex data pipelines and Python-heavy tools, you're not alone. The Spring team saw this and, once again, they simplified it.
In this guide, we'll pick up where we left off. We'll show you how to take your simple ChatClient and connect it to a "memory" so it can answer questions about your own documents. No data science degree needed—just your Spring Boot app and the VectorStore interface.
Let's build.
1. What is RAG? (And What's a "Vector Database"?)#
First, let's clear up the terms.
- Vector Database (or Vector Store): Think of this as your AI's long-term memory. A normal database finds exact matches (like
WHERE name = 'John'). A vector database finds similar ideas. It stores "embeddings"—the numerical fingerprints for text we talked about in the last guide. - Retrieval-Augmented Generation (RAG): This is the process of giving the AI the right memories before it answers a question.
The RAG process is a simple, three-step "cheat sheet" you give the AI:
- Retrieve (Search): When you ask a question (e.g., "How do I configure a
pom.xml?"), your app first searches the vector database. It asks, "Find me the document 'fingerprints' that are most similar to this question's 'fingerprint'." - Augment (Stuff): The database returns the matching text (e.g., "A 'pom.xml' file manages a project's dependencies..."). Your app then "stuffs" this text into your prompt using a
PromptTemplate. - Generate (Answer): You send the new, combined prompt to the
ChatClient: "Using this information: [A 'pom.xml' file manages...]... please answer this question: [How do I configure apom.xml?]"
The AI then generates an answer using the exact context you provided. It's no longer guessing; it's answering based on your data.
2. Why Use Spring AI for This?#
You guessed it: abstraction.
Think about Spring Data. You don't write code for PostgreSQL, then different code for MySQL. You just use JpaRepository.
Spring AI provides the VectorStore interface.
You write your code against this one interface (vectorStore.add(...), vectorStore.similaritySearch(...)), and Spring AI handles the rest. You can start with a simple in-memory store for testing, then switch to a production-grade database like PostgreSQL/PGVector, Chroma, or Weaviate just by changing a dependency and your application.properties.
Your Java code doesn't change. That's the power of Spring.
3. Prerequisites: What You Need#
Let's get our tools ready. This guide assumes you already have a basic Spring AI project set up with a ChatClient (like in the last guide).
- Java 21+ & Spring Boot 3.2+: Same as before.
- An AI Model API Key: You still need your OpenAI key (or Ollama setup). The
VectorStoreneeds anEmbeddingClientto turn your documents into "fingerprints," and theChatClientneeds to generate the final answer. - A Vector Database: You need a place to store your vectors.
- For a quick start: Spring AI provides a
SimpleVectorStorethat runs in-memory. It's the "H2" of vector databases—great for tests, but it forgets everything on restart. - For a real app (what we'll use): Let's use PostgreSQL with the PGVector extension. It's a robust, production-ready choice that many Java developers already use.
- For a quick start: Spring AI provides a
If you have Docker, you can run PGVector in seconds:
4. The Core Setup: Dependencies & Configuration#
Let's tell our project to use PGVector.
pom.xml Dependencies#
You need to add the starter for the vector database you chose. We'll add spring-ai-pgvector-starter.
application.properties Configuration#
Now, open src/main/resources/application.properties and add the new configuration.
That's it! When you run your app, Spring Boot will auto-configure a VectorStore bean that's connected to your PostgreSQL database.
5. Step 1: Loading Your Data (The "Store" Step)#
We have an empty "memory." Let's add some documents to it.
Spring AI provides DocumentReader utilities to load data from files. For this example, let's just create a Document object manually. We can use a CommandLineRunner to load our data once on application startup.
Run your app. You'll see the "Documents loaded" message. Your data is now "fingerprinted" and saved in the vector_store table in PostgreSQL!
6. Step 2: Asking a "Memory-Aware" Question (The Manual RAG)#
Now let's build a controller that uses this "memory." We'll do the RAG process manually first so you can see every step.
Now, run your app and go to http://localhost:8080/ai/rag?query=what is a pom.xml.
The AI will give you a perfect answer: "A 'pom.xml' file manages a project's dependencies using Maven." It's not guessing; it's using the exact document we gave it. Now try asking http://localhost:8080/ai/rag?query=how do I kill port 8080. It will give you the specific Windows commands we loaded!
7. The Real Spring Way: Simplifying with RetrievalAugmentor#
The code above works, but it's a bit verbose. We're manually wiring together the VectorStore and the ChatClient. This is Spring—there must be a simpler way.
And there is.
Spring AI provides a RetrievalAugmentor that can be plugged directly into your ChatClient bean. It does the RAG "Retrieve" and "Augment" steps for you automatically.
Let's refactor.
1. Create a Configuration class for your ChatClient:
2. Now, rewrite your Controller to be much simpler:
If you run this and call http://localhost:8080/ai/simple-rag?query=what is a pom.xml, you get the exact same context-aware answer.
We've hidden all the complexity behind Spring's auto-configuration, just as it should be.
Conclusion: Your App Now Has a Memory#
We've come a long way. You've now gone from a simple, stateless chatbot to an intelligent assistant that can answer specific questions about your data.
- You learned that RAG is the process of retrieving data to "augment" a prompt.
- You learned that a Vector Database (like PGVector) is the "memory" that stores document "fingerprints."
- You used the
VectorStoreinterface to add your own documents and search them. - You saw the "manual" RAG process (Retrieve, Augment, Generate).
- Finally, you used a
RetrievalAugmentorto let Spring AI do all the heavy lifting for you.
You can now build applications that provide real value: customer support bots that know your manuals, research assistants that know your private notes, or e-commerce bots that know your product catalog.
You're a Spring developer, and now you're building truly intelligent, context-aware applications. Go build something that knows things.
Want to Master Spring Boot and Land Your Dream Job?
Struggling with coding interviews? Learn Data Structures & Algorithms (DSA) with our expert-led course. Build strong problem-solving skills, write optimized code, and crack top tech interviews with ease
Learn more
