Unleashing the Power of Low Code

CrewAIโ€™sย Knowledgeย system is a game-changer for developers and businesses looking to enhance AI agents with contextual, domain-specific data. This blog dives into how to leverage this feature effectively, complete with real-world examples and actionable insights.


What is Knowledge in CrewAI?

Theย Knowledgeย system allows AI agents to access and utilize external data sourcesโ€”like PDFs, CSVs, or APIsโ€”during task execution. Think of it as equipping your agents with a dynamic reference library, enabling them to ground responses in factual information and improve decision-makingย .

Key Benefits:

  • Domain-Specific Expertise: Agents can access specialized data (e.g., product manuals, financial reports)ย .
  • Real-Time Context: Maintain continuity across interactions, such as customer support conversationsย .
  • Flexibility: Supports structured (CSV, JSON) and unstructured (PDF, text) dataย .

Supported Knowledge Sources

CrewAI supports a wide range of knowledge sources, which can be broadly categorized as follows:

  • Text Sources: Raw strings, text files, and PDFs.
  • Structured Data: CSV, Excel, and JSON documents.
  • Custom Sources: Easily extendable to incorporate APIs or any other data by inheriting from the base knowledge source class.

This versatility means you can choose the right type of content for your agentsโ€™ tasks, whether youโ€™re building a support agent or a research assistant.

Setting Up Knowledge Sources

Basic Configuration

  1. Folder Structure: Create aย knowledgeย directory in your project root and place files there (e.g.,ย knowledge/report.pdf)ย .
  2. Define Sources: Use built-in classes likeย PDFKnowledgeSourceย orย CSVKnowledgeSourceย to load documents.
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource

# Load a PDF from the knowledge directory
pdf_source = PDFKnowledgeSource(
    file_path="report.pdf",  # Relative to the knowledge folder
    chunk_size=4000,         # Split into 4000-character chunks
    chunk_overlap=200        # Overlap chunks for context retention
)

# Add to your Crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[task],
    knowledge_sources=[pdf_source]
)

Note: If you encounter metadata errors (e.g.,ย Expected metadata to be a non-empty dict), add dummy metadata likeย metadata={"title": "dummy"}ย .


Advanced Configuration

1. Chunking & Embeddings

  • Chunking: Adjustย chunk_sizeย andย chunk_overlapย to balance context retention and processing efficiencyย 12.
  • Embeddings: Use providers like Google (text-embedding-004) or OpenAI for vector storage.

Example: Custom Embeddings

crew = Crew(
    ...
    embedder={
        "provider": "google",
        "config": {"model": "text-embedding-004", "api_key": "YOUR_KEY"}
    }
)

Custom Knowledge Sources

Extendย BaseKnowledgeSourceย to integrate real-time data.

Example: Space News API Integration

from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
import requests

class SpaceNewsKnowledgeSource(BaseKnowledgeSource):
    def load_content(self):
        response = requests.get("https://api.spaceflightnewsapi.net/v4/articles")
        articles = response.json()["results"]
        return self._format_articles(articles)
    
    def _format_articles(self, articles):
        return "\n".join([f"{article['title']}: {article['summary']}" for article in articles])

# Assign to an agent
agent = Agent(
    role="Space News Analyst",
    knowledge_sources=[SpaceNewsKnowledgeSource()]
)

Quickstart Example: Using a String-Based Knowledge Source

Letโ€™s start with a simple example. Imagine you have a snippet of text about a user, and you want your agent to answer questions using that information. The following code demonstrates how to set up a string-based knowledge source:

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create a knowledge source with user data
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(content=content)

# Initialize an LLM with a deterministic setting
llm = LLM(model="gpt-4o-mini", temperature=0)

# Create an agent that leverages this knowledge
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="You are a master at understanding people and their preferences.",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)

# Define a task where the agent answers a user question
task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

# Create a crew and attach the knowledge source
crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source],
)

# Kick off the crew with a specific question
result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

Expanding Your Horizons: File-Based Knowledge Sources

Beyond raw strings, CrewAI supports various file formats to suit different data needs:

  • Text Files: Use the TextFileKnowledgeSource to load data from .txt files.
  • PDFs: The PDFKnowledgeSource helps your agent extract information from PDF documents.
  • CSV, Excel, and JSON: Use their respective knowledge sources to integrate structured data seamlessly.

For instance, if you want to extract information from a CSV file containing product details, simply instantiate the CSVKnowledgeSource with the path to your file and add it to your crewโ€™s knowledge sources.

Custom Knowledge Source: PDF Source Example

from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource

# Initialize the PDF knowledge source with a file path
pdf_source = PDFKnowledgeSource(
    file_paths=["meta_quest_manual.pdf"]
)

@CrewBase
class MetaQuestKnowledge():
    """MetaQuestKnowledge crew"""

    # Configurations for agents and tasks are stored in external YAML files.
    agents_config = 'config/agents.yaml'
    tasks_config = 'config/tasks.yaml'

    @agent
    def meta_quest_expert(self) -> Agent:
        # Create an agent using the configuration for the meta quest expert.
        # The agent will leverage the PDF knowledge source during tasks.
        return Agent(
            config=self.agents_config['meta_quest_expert'],
            verbose=True
        )

    @task
    def answer_question_task(self) -> Task:
        # Define a task that is responsible for answering user questions.
        # Task details are provided in the YAML configuration.
        return Task(
            config=self.tasks_config['answer_question_task'],
        )

    @crew
    def crew(self) -> Crew:
        """Creates the MetaQuestKnowledge crew"""
        # Assemble the crew by collecting all agents and tasks.
        # The PDF knowledge source is added to allow agents to use the content
        # of the PDF when processing queries.
        return Crew(
            agents=self.agents,  # Automatically populated by the @agent decorator
            tasks=self.tasks,    # Automatically populated by the @task decorator
            process=Process.sequential,
            verbose=True,
            knowledge_sources=[
                pdf_source
            ]
        )

Final Thoughts

By integrating a PDF knowledge source, you empower your AI agents with the ability to extract and use real-world data from documents. This example illustrates how to set up a clean, modular Crew using CrewAIโ€”leveraging external configurations and the power of a PDF knowledge source. Whether youโ€™re building a support bot, a research assistant, or any task-specific agent, this approach ensures that your agents remain well-informed and contextually accurate.

Happy coding, and may your AI projects be ever more knowledgeable!

Share This Article
Follow:
Hey there, I'm Satish Prasad, and I've got a Master's Degree (MCA) from NIT Kurukshetra. With over 12 years in the game, I've been diving deep into Data Analytics, Delaware House, ETL, Production Support, Robotic Process Automation (RPA), and Intelligent Automation. I've hopped around various IT firms, hustling in functions like Investment Banking, Mutual Funds, Logistics, Travel, and Tourism. My jam? Building over 100 Production Bots to amp up efficiency. Let's connect! Join me in exploring the exciting realms of Data Analytics, RPA, and Intelligent Automation. It's been a wild ride, and I'm here to share insights, stories, and tech vibes that'll keep you in the loop. Catch you on the flip side
Leave a Comment