Your Simple Guide to RAG in Spring AI: Give Your App a "Memory"

    Your Simple Guide to RAG in Spring AI: Give Your App a "Memory"

    Build an AI with memory using Spring AI and RAG. Learn how to connect ChatClient to a Vector Database like PGVector for context-aware answers.

    default profile

    Munaf Badarpura

    October 23, 2025

    8 min read

    You've built your first GenAI app with Spring AI in this article . You're using the ChatClient to tell jokes and answer questions. But you quickly hit a wall: it's stateless.

    It knows about the world, but it knows nothing about you. It can't answer questions about your company's technical documents, your product catalog, or your personal notes. If you ask it a question and then ask a follow-up, it has no idea what you're talking about.

    So, how do you give your AI a "memory"? How can you make it an expert on your specific data?

    The answer is Retrieval-Augmented Generation (RAG), powered by a Vector Database.

    If you've ever felt that building AI "memory" involved complex data pipelines and Python-heavy tools, you're not alone. The Spring team saw this and, once again, they simplified it.

    In this guide, we'll pick up where we left off. We'll show you how to take your simple ChatClient and connect it to a "memory" so it can answer questions about your own documents. No data science degree needed—just your Spring Boot app and the VectorStore interface.

    Let's build.

    1. What is RAG? (And What's a "Vector Database"?)#

    First, let's clear up the terms.

    • Vector Database (or Vector Store): Think of this as your AI's long-term memory. A normal database finds exact matches (like WHERE name = 'John'). A vector database finds similar ideas. It stores "embeddings"—the numerical fingerprints for text we talked about in the last guide.
    • Retrieval-Augmented Generation (RAG): This is the process of giving the AI the right memories before it answers a question.

    The RAG process is a simple, three-step "cheat sheet" you give the AI:

    1. Retrieve (Search): When you ask a question (e.g., "How do I configure a pom.xml?"), your app first searches the vector database. It asks, "Find me the document 'fingerprints' that are most similar to this question's 'fingerprint'."
    2. Augment (Stuff): The database returns the matching text (e.g., "A 'pom.xml' file manages a project's dependencies..."). Your app then "stuffs" this text into your prompt using a PromptTemplate.
    3. Generate (Answer): You send the new, combined prompt to the ChatClient: "Using this information: [A 'pom.xml' file manages...]... please answer this question: [How do I configure a pom.xml?]"

    The AI then generates an answer using the exact context you provided. It's no longer guessing; it's answering based on your data.

    2. Why Use Spring AI for This?#

    You guessed it: abstraction.

    Think about Spring Data. You don't write code for PostgreSQL, then different code for MySQL. You just use JpaRepository.

    Spring AI provides the VectorStore interface.

    You write your code against this one interface (vectorStore.add(...), vectorStore.similaritySearch(...)), and Spring AI handles the rest. You can start with a simple in-memory store for testing, then switch to a production-grade database like PostgreSQL/PGVector, Chroma, or Weaviate just by changing a dependency and your application.properties.

    Your Java code doesn't change. That's the power of Spring.

    3. Prerequisites: What You Need#

    Let's get our tools ready. This guide assumes you already have a basic Spring AI project set up with a ChatClient (like in the last guide).

    1. Java 21+ & Spring Boot 3.2+: Same as before.
    2. An AI Model API Key: You still need your OpenAI key (or Ollama setup). The VectorStore needs an EmbeddingClient to turn your documents into "fingerprints," and the ChatClient needs to generate the final answer.
    3. A Vector Database: You need a place to store your vectors.
      • For a quick start: Spring AI provides a SimpleVectorStore that runs in-memory. It's the "H2" of vector databases—great for tests, but it forgets everything on restart.
      • For a real app (what we'll use): Let's use PostgreSQL with the PGVector extension. It's a robust, production-ready choice that many Java developers already use.

    If you have Docker, you can run PGVector in seconds:

    docker run -d --name pgvector -p 5432:5432 \\ -e POSTGRES_DB=mydb \\ -e POSTGRES_USER=myuser \\ -e POSTGRES_PASSWORD=mypass \\ ankane/pgvector

    4. The Core Setup: Dependencies & Configuration#

    Let's tell our project to use PGVector.

    pom.xml Dependencies#

    You need to add the starter for the vector database you chose. We'll add spring-ai-pgvector-starter.

    <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-pgvector-starter</artifactId> </dependency> <dependency> <groupId>org.postgresql</groupId> <artifactId>postgresql</artifactId> <scope>runtime</scope> </dependency> </dependencies> <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-bom</artifactId> <version>1.0.0-M1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement>

    application.properties Configuration#

    Now, open src/main/resources/application.properties and add the new configuration.

    # Your OpenAI API Key (for Chat and Embeddings) spring.ai.openai.api-key=${OPENAI_API_KEY} # --- This is the new part for our Vector Store --- # 1. Standard Spring Data Source config spring.datasource.url=jdbc:postgresql://localhost:5432/mydb spring.datasource.username=myuser spring.datasource.password=mypass spring.datasource.driver-class-name=org.postgresql.Driver # 2. Tell Spring AI to use this data source for PGVector # The table 'vector_store' will be created automatically spring.ai.vectorstore.pgvector.index-type=HNSW

    That's it! When you run your app, Spring Boot will auto-configure a VectorStore bean that's connected to your PostgreSQL database.

    5. Step 1: Loading Your Data (The "Store" Step)#

    We have an empty "memory." Let's add some documents to it.

    Spring AI provides DocumentReader utilities to load data from files. For this example, let's just create a Document object manually. We can use a CommandLineRunner to load our data once on application startup.

    import org.springframework.ai.document.Document; import org.springframework.ai.vectorstore.VectorStore; import org.springframework.boot.CommandLineRunner; import org.springframework.stereotype.Component; import java.util.List; @Component public class DataLoader implements CommandLineRunner { private final VectorStore vectorStore; public DataLoader(VectorStore vectorStore) { this.vectorStore = vectorStore; } @Override public void run(String... args) throws Exception { System.out.println("Loading documents into the vector store..."); // In a real app, you'd load this from a file or S3 List<Document> documents = List.of( new Document("Spring Boot is a Java framework for microservices."), new Document("The 'spring-boot-starter-web' dependency is for building REST APIs."), new Document("A 'pom.xml' file manages a project's dependencies using Maven."), new Document("To kill port 8080 on Windows, use 'netstat -aon | findstr 8080' and then 'taskkill /PID <PID> /F'.") ); // This one call turns text into embeddings and saves them this.vectorStore.add(documents); System.out.println("Documents loaded."); } }

    Run your app. You'll see the "Documents loaded" message. Your data is now "fingerprinted" and saved in the vector_store table in PostgreSQL!

    6. Step 2: Asking a "Memory-Aware" Question (The Manual RAG)#

    Now let's build a controller that uses this "memory." We'll do the RAG process manually first so you can see every step.

    import org.springframework.ai.chat.client.ChatClient; import org.springframework.ai.chat.prompt.Prompt; import org.springframework.ai.chat.prompt.PromptTemplate; import org.springframework.ai.document.Document; import org.springframework.ai.vectorstore.VectorStore; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; import java.util.List; import java.util.Map; import java.util.stream.Collectors; @RestController public class RagController { private final ChatClient chatClient; private final VectorStore vectorStore; public RagController(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) { this.chatClient = chatClientBuilder.build(); this.vectorStore = vectorStore; } @GetMapping("/ai/rag") public String ask(@RequestParam String query) { // 1. RETRIEVE (Search) // Find documents similar to the user's query List<Document> similarDocs = vectorStore.similaritySearch(query); // 2. AUGMENT (Stuff) // Get the text content from the documents String context = similarDocs.stream() .map(Document::getContent) .collect(Collectors.joining(System.lineSeparator())); // Create the "cheat sheet" prompt String templateString = """ Using the information provided below, please answer the user's question. If the information is not sufficient, say so. Information: {context} Question: {query} """; PromptTemplate promptTemplate = new PromptTemplate(templateString); Prompt prompt = promptTemplate.create(Map.of( "context", context, "query", query )); // 3. GENERATE (Answer) // Send the combined prompt to the AI return chatClient.call(prompt).getResult().getOutput().getContent(); } }

    Now, run your app and go to http://localhost:8080/ai/rag?query=what is a pom.xml.

    The AI will give you a perfect answer: "A 'pom.xml' file manages a project's dependencies using Maven." It's not guessing; it's using the exact document we gave it. Now try asking http://localhost:8080/ai/rag?query=how do I kill port 8080. It will give you the specific Windows commands we loaded!

    7. The Real Spring Way: Simplifying with RetrievalAugmentor#

    The code above works, but it's a bit verbose. We're manually wiring together the VectorStore and the ChatClient. This is Spring—there must be a simpler way.

    And there is.

    Spring AI provides a RetrievalAugmentor that can be plugged directly into your ChatClient bean. It does the RAG "Retrieve" and "Augment" steps for you automatically.

    Let's refactor.

    1. Create a Configuration class for your ChatClient:

    import org.springframework.ai.chat.client.ChatClient; import org.springframework.ai.chat.client.advise.RetrievalAugmentor; import org.springframework.ai.vectorstore.VectorStore; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; @Configuration public class AiConfig { // This creates the "automatic" RAG-enabled ChatClient @Bean public ChatClient chatClient(ChatClient.Builder builder, VectorStore vectorStore) { // 1. Create the Augmentor var augmentor = new RetrievalAugmentor(vectorStore); // 2. Add it to the ChatClient's builder return builder .defaultAdvisors(augmentor) // This is the magic! .build(); } }

    2. Now, rewrite your Controller to be much simpler:

    import org.springframework.ai.chat.client.ChatClient; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; @RestController public class SimpleRagController { private final ChatClient chatClient; // The ChatClient bean we defined in AiConfig is injected // This client *already knows* how to do RAG public SimpleRagController(ChatClient chatClient) { this.chatClient = chatClient; } @GetMapping("/ai/simple-rag") public String ask(@RequestParam String query) { // That's it. // Spring AI automatically finds similar docs from the // VectorStore and "augments" the prompt before sending it. return chatClient.prompt() .user(query) .call() .content(); } }

    If you run this and call http://localhost:8080/ai/simple-rag?query=what is a pom.xml, you get the exact same context-aware answer.

    We've hidden all the complexity behind Spring's auto-configuration, just as it should be.

    Conclusion: Your App Now Has a Memory#

    We've come a long way. You've now gone from a simple, stateless chatbot to an intelligent assistant that can answer specific questions about your data.

    • You learned that RAG is the process of retrieving data to "augment" a prompt.
    • You learned that a Vector Database (like PGVector) is the "memory" that stores document "fingerprints."
    • You used the VectorStore interface to add your own documents and search them.
    • You saw the "manual" RAG process (Retrieve, Augment, Generate).
    • Finally, you used a RetrievalAugmentor to let Spring AI do all the heavy lifting for you.

    You can now build applications that provide real value: customer support bots that know your manuals, research assistants that know your private notes, or e-commerce bots that know your product catalog.

    You're a Spring developer, and now you're building truly intelligent, context-aware applications. Go build something that knows things.

    Want to Master Spring Boot and Land Your Dream Job?

    Struggling with coding interviews? Learn Data Structures & Algorithms (DSA) with our expert-led course. Build strong problem-solving skills, write optimized code, and crack top tech interviews with ease

    Learn more
    Spring AI
    Spring Boot
    RAG
    Vector Database
    PGVector
    Java AI
    Generative AI

    Subscribe to our newsletter

    Read articles from Coding Shuttle directly inside your inbox. Subscribe to the newsletter, and don't miss out.

    More articles