Unlocking CrewAI Knowledge Feature: A Practical Guide with Examples

CrewAI’s Knowledge system is a game-changer for developers and businesses looking to enhance AI agents with contextual, domain-specific data. This blog dives into how to leverage this feature effectively, complete with real-world examples and actionable insights.

Contents

What is Knowledge in CrewAI?
Supported Knowledge Sources
Setting Up Knowledge Sources

Basic Configuration

Advanced Configuration

1. Chunking & Embeddings
Custom Knowledge Sources

Quickstart Example: Using a String-Based Knowledge Source
Expanding Your Horizons: File-Based Knowledge Sources
Custom Knowledge Source: PDF Source Example
Final Thoughts

What is Knowledge in CrewAI?

The Knowledge system allows AI agents to access and utilize external data sources—like PDFs, CSVs, or APIs—during task execution. Think of it as equipping your agents with a dynamic reference library, enabling them to ground responses in factual information and improve decision-making .

Key Benefits:

Domain-Specific Expertise: Agents can access specialized data (e.g., product manuals, financial reports) .
Real-Time Context: Maintain continuity across interactions, such as customer support conversations .
Flexibility: Supports structured (CSV, JSON) and unstructured (PDF, text) data .

Supported Knowledge Sources

CrewAI supports a wide range of knowledge sources, which can be broadly categorized as follows:

Text Sources: Raw strings, text files, and PDFs.
Structured Data: CSV, Excel, and JSON documents.
Custom Sources: Easily extendable to incorporate APIs or any other data by inheriting from the base knowledge source class.

This versatility means you can choose the right type of content for your agents’ tasks, whether you’re building a support agent or a research assistant.

Setting Up Knowledge Sources

Basic Configuration

Folder Structure: Create a knowledge directory in your project root and place files there (e.g., knowledge/report.pdf) .
Define Sources: Use built-in classes like PDFKnowledgeSource or CSVKnowledgeSource to load documents.

from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource

# Load a PDF from the knowledge directory
pdf_source = PDFKnowledgeSource(
    file_path="report.pdf",  # Relative to the knowledge folder
    chunk_size=4000,         # Split into 4000-character chunks
    chunk_overlap=200        # Overlap chunks for context retention
)

# Add to your Crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[task],
    knowledge_sources=[pdf_source]
)

Note: If you encounter metadata errors (e.g., Expected metadata to be a non-empty dict), add dummy metadata like metadata={"title": "dummy"} .

Advanced Configuration

1. Chunking & Embeddings

Chunking: Adjust chunk_size and chunk_overlap to balance context retention and processing efficiency 12.
Embeddings: Use providers like Google (text-embedding-004) or OpenAI for vector storage.

Example: Custom Embeddings

crew = Crew(
    ...
    embedder={
        "provider": "google",
        "config": {"model": "text-embedding-004", "api_key": "YOUR_KEY"}
    }
)

Custom Knowledge Sources

Extend BaseKnowledgeSource to integrate real-time data.

Example: Space News API Integration

from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
import requests

class SpaceNewsKnowledgeSource(BaseKnowledgeSource):
    def load_content(self):
        response = requests.get("https://api.spaceflightnewsapi.net/v4/articles")
        articles = response.json()["results"]
        return self._format_articles(articles)
    
    def _format_articles(self, articles):
        return "\n".join([f"{article['title']}: {article['summary']}" for article in articles])

# Assign to an agent
agent = Agent(
    role="Space News Analyst",
    knowledge_sources=[SpaceNewsKnowledgeSource()]
)

Quickstart Example: Using a String-Based Knowledge Source

Let’s start with a simple example. Imagine you have a snippet of text about a user, and you want your agent to answer questions using that information. The following code demonstrates how to set up a string-based knowledge source:

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create a knowledge source with user data
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(content=content)

# Initialize an LLM with a deterministic setting
llm = LLM(model="gpt-4o-mini", temperature=0)

# Create an agent that leverages this knowledge
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="You are a master at understanding people and their preferences.",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)

# Define a task where the agent answers a user question
task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

# Create a crew and attach the knowledge source
crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source],
)

# Kick off the crew with a specific question
result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

Expanding Your Horizons: File-Based Knowledge Sources

Beyond raw strings, CrewAI supports various file formats to suit different data needs:

Text Files: Use the TextFileKnowledgeSource to load data from .txt files.
PDFs: The PDFKnowledgeSource helps your agent extract information from PDF documents.
CSV, Excel, and JSON: Use their respective knowledge sources to integrate structured data seamlessly.

For instance, if you want to extract information from a CSV file containing product details, simply instantiate the CSVKnowledgeSource with the path to your file and add it to your crew’s knowledge sources.

Custom Knowledge Source: PDF Source Example

from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource

# Initialize the PDF knowledge source with a file path
pdf_source = PDFKnowledgeSource(
    file_paths=["meta_quest_manual.pdf"]
)

@CrewBase
class MetaQuestKnowledge():
    """MetaQuestKnowledge crew"""

    # Configurations for agents and tasks are stored in external YAML files.
    agents_config = 'config/agents.yaml'
    tasks_config = 'config/tasks.yaml'

    @agent
    def meta_quest_expert(self) -> Agent:
        # Create an agent using the configuration for the meta quest expert.
        # The agent will leverage the PDF knowledge source during tasks.
        return Agent(
            config=self.agents_config['meta_quest_expert'],
            verbose=True
        )

    @task
    def answer_question_task(self) -> Task:
        # Define a task that is responsible for answering user questions.
        # Task details are provided in the YAML configuration.
        return Task(
            config=self.tasks_config['answer_question_task'],
        )

    @crew
    def crew(self) -> Crew:
        """Creates the MetaQuestKnowledge crew"""
        # Assemble the crew by collecting all agents and tasks.
        # The PDF knowledge source is added to allow agents to use the content
        # of the PDF when processing queries.
        return Crew(
            agents=self.agents,  # Automatically populated by the @agent decorator
            tasks=self.tasks,    # Automatically populated by the @task decorator
            process=Process.sequential,
            verbose=True,
            knowledge_sources=[
                pdf_source
            ]
        )

Final Thoughts

By integrating a PDF knowledge source, you empower your AI agents with the ability to extract and use real-world data from documents. This example illustrates how to set up a clean, modular Crew using CrewAI—leveraging external configurations and the power of a PDF knowledge source. Whether you’re building a support bot, a research assistant, or any task-specific agent, this approach ensures that your agents remain well-informed and contextually accurate.

Happy coding, and may your AI projects be ever more knowledgeable!

Must Read

250 LangGraph Interview Questions & Answers (2026)

UiPath Maestro Case: The Complete Step-by-Step Tutorial (2026)

16 Reasons Why Agentic Automation Programs Fail – And How to Never Repeat Them

How to Build an Agentic Workflow with n8n and an LLM (2026 Tutorial)

Building with Google Agent Studio: The Complete Guide to Gemini Enterprise Agent Platform

Unlocking CrewAI Knowledge Feature: A Practical Guide with Examples

What is Knowledge in CrewAI?

Supported Knowledge Sources

Setting Up Knowledge Sources

Basic Configuration

Advanced Configuration

1. Chunking & Embeddings

Custom Knowledge Sources

Quickstart Example: Using a String-Based Knowledge Source

Expanding Your Horizons: File-Based Knowledge Sources

Custom Knowledge Source: PDF Source Example

Final Thoughts

Leave a Reply Cancel reply

You Might also Like

How MCP Servers Transform RPA Workflows: Business Value & Use Cases

Agent Memory and RAG: The Complete Developer Guide to Building AI Agents That Remember

The Universal Commerce Protocol: Google’s Open-Source Standard for the Agentic Commerce Era

From Zero to Deep Agent: A Step-by-Step Guide Using LangGraph

Agent Harness vs. Context Engineering: The Next Evolution of AI Agent Architecture with LangGraph

Mastering UiPath Agent Evaluations: A Structured Approach to Quality Assurance

Must Read

What is Knowledge in CrewAI?

Supported Knowledge Sources

Setting Up Knowledge Sources

Basic Configuration

Advanced Configuration

1. Chunking & Embeddings

Custom Knowledge Sources

More Read

Quickstart Example: Using a String-Based Knowledge Source

Expanding Your Horizons: File-Based Knowledge Sources

Custom Knowledge Source: PDF Source Example

Final Thoughts

Leave a Reply Cancel reply

You Might also Like

Get Insider Tips