CrewAIโsย Knowledgeย system is a game-changer for developers and businesses looking to enhance AI agents with contextual, domain-specific data. This blog dives into how to leverage this feature effectively, complete with real-world examples and actionable insights.
What is Knowledge in CrewAI?
Theย Knowledgeย system allows AI agents to access and utilize external data sourcesโlike PDFs, CSVs, or APIsโduring task execution. Think of it as equipping your agents with a dynamic reference library, enabling them to ground responses in factual information and improve decision-makingย .
Key Benefits:
- Domain-Specific Expertise: Agents can access specialized data (e.g., product manuals, financial reports)ย .
- Real-Time Context: Maintain continuity across interactions, such as customer support conversationsย .
- Flexibility: Supports structured (CSV, JSON) and unstructured (PDF, text) dataย .
Supported Knowledge Sources
CrewAI supports a wide range of knowledge sources, which can be broadly categorized as follows:
- Text Sources: Raw strings, text files, and PDFs.
- Structured Data: CSV, Excel, and JSON documents.
- Custom Sources: Easily extendable to incorporate APIs or any other data by inheriting from the base knowledge source class.
This versatility means you can choose the right type of content for your agentsโ tasks, whether youโre building a support agent or a research assistant.
Setting Up Knowledge Sources
Basic Configuration
- Folder Structure: Create aย
knowledge
ย directory in your project root and place files there (e.g.,ยknowledge/report.pdf
)ย . - Define Sources: Use built-in classes likeย
PDFKnowledgeSource
ย orยCSVKnowledgeSource
ย to load documents.
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
# Load a PDF from the knowledge directory
pdf_source = PDFKnowledgeSource(
file_path="report.pdf", # Relative to the knowledge folder
chunk_size=4000, # Split into 4000-character chunks
chunk_overlap=200 # Overlap chunks for context retention
)
# Add to your Crew
crew = Crew(
agents=[researcher, writer],
tasks=[task],
knowledge_sources=[pdf_source]
)
Note: If you encounter metadata errors (e.g.,ย Expected metadata to be a non-empty dict
), add dummy metadata likeย metadata={"title": "dummy"}
ย .
Advanced Configuration
1. Chunking & Embeddings
- Chunking: Adjustย
chunk_size
ย andยchunk_overlap
ย to balance context retention and processing efficiencyย 12. - Embeddings: Use providers like Google (
text-embedding-004
) or OpenAI for vector storage.
Example: Custom Embeddings
crew = Crew(
...
embedder={
"provider": "google",
"config": {"model": "text-embedding-004", "api_key": "YOUR_KEY"}
}
)
Custom Knowledge Sources
Extendย BaseKnowledgeSource
ย to integrate real-time data.
Example: Space News API Integration
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
import requests
class SpaceNewsKnowledgeSource(BaseKnowledgeSource):
def load_content(self):
response = requests.get("https://api.spaceflightnewsapi.net/v4/articles")
articles = response.json()["results"]
return self._format_articles(articles)
def _format_articles(self, articles):
return "\n".join([f"{article['title']}: {article['summary']}" for article in articles])
# Assign to an agent
agent = Agent(
role="Space News Analyst",
knowledge_sources=[SpaceNewsKnowledgeSource()]
)
Quickstart Example: Using a String-Based Knowledge Source
Letโs start with a simple example. Imagine you have a snippet of text about a user, and you want your agent to answer questions using that information. The following code demonstrates how to set up a string-based knowledge source:
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Create a knowledge source with user data
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(content=content)
# Initialize an LLM with a deterministic setting
llm = LLM(model="gpt-4o-mini", temperature=0)
# Create an agent that leverages this knowledge
agent = Agent(
role="About User",
goal="You know everything about the user.",
backstory="You are a master at understanding people and their preferences.",
verbose=True,
allow_delegation=False,
llm=llm,
)
# Define a task where the agent answers a user question
task = Task(
description="Answer the following questions about the user: {question}",
expected_output="An answer to the question.",
agent=agent,
)
# Create a crew and attach the knowledge source
crew = Crew(
agents=[agent],
tasks=[task],
verbose=True,
process=Process.sequential,
knowledge_sources=[string_source],
)
# Kick off the crew with a specific question
result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})
Expanding Your Horizons: File-Based Knowledge Sources
Beyond raw strings, CrewAI supports various file formats to suit different data needs:
- Text Files: Use the
TextFileKnowledgeSource
to load data from.txt
files. - PDFs: The
PDFKnowledgeSource
helps your agent extract information from PDF documents. - CSV, Excel, and JSON: Use their respective knowledge sources to integrate structured data seamlessly.
For instance, if you want to extract information from a CSV file containing product details, simply instantiate the CSVKnowledgeSource
with the path to your file and add it to your crewโs knowledge sources.
Custom Knowledge Source: PDF Source Example
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
# Initialize the PDF knowledge source with a file path
pdf_source = PDFKnowledgeSource(
file_paths=["meta_quest_manual.pdf"]
)
@CrewBase
class MetaQuestKnowledge():
"""MetaQuestKnowledge crew"""
# Configurations for agents and tasks are stored in external YAML files.
agents_config = 'config/agents.yaml'
tasks_config = 'config/tasks.yaml'
@agent
def meta_quest_expert(self) -> Agent:
# Create an agent using the configuration for the meta quest expert.
# The agent will leverage the PDF knowledge source during tasks.
return Agent(
config=self.agents_config['meta_quest_expert'],
verbose=True
)
@task
def answer_question_task(self) -> Task:
# Define a task that is responsible for answering user questions.
# Task details are provided in the YAML configuration.
return Task(
config=self.tasks_config['answer_question_task'],
)
@crew
def crew(self) -> Crew:
"""Creates the MetaQuestKnowledge crew"""
# Assemble the crew by collecting all agents and tasks.
# The PDF knowledge source is added to allow agents to use the content
# of the PDF when processing queries.
return Crew(
agents=self.agents, # Automatically populated by the @agent decorator
tasks=self.tasks, # Automatically populated by the @task decorator
process=Process.sequential,
verbose=True,
knowledge_sources=[
pdf_source
]
)
Final Thoughts
By integrating a PDF knowledge source, you empower your AI agents with the ability to extract and use real-world data from documents. This example illustrates how to set up a clean, modular Crew using CrewAIโleveraging external configurations and the power of a PDF knowledge source. Whether youโre building a support bot, a research assistant, or any task-specific agent, this approach ensures that your agents remain well-informed and contextually accurate.
Happy coding, and may your AI projects be ever more knowledgeable!