Introduction to local knowledge graph

Mini lab - Building a local Knowledge Graph RAG with Neo4j, LangChain, and Ollama 🦙

In this lab we will be building a small knowledge graph from Wikipedia, loading it into Neo4j, and querying it with natural language using Langchain library and a local LLM (Ollama).

We will try to replicate the following architecture from the official neo4j blog with a local LLM through ollama app.

Learning goals 🎯

use an external api
extract entities and relationships from text into a knowledge graph
load graph data into Neo4j and inspect it with Cypher
play with Cypher and Langchain

👉 You will be uploading your results inside your personnal PUBLIC github repository, as usal this is a personnal project.

You can deliver this lab in a jupyter notebook format (and optionnal python scripts, but you will need to have a notebook at the root of you github repo to demonstrate your solution)

Prerequisites & setup

Python 3.10+
Neo4j available locally (via Docker)
Ollama running locally with some model pulled (e.g., llama3:8b)
A FREE Diffbot API key for entity/relationship extraction (NO CB REQUIRED, ⚠️ do not put your cb ⚠️). You can set it as an environment variable DIFFBOT_API_KEY, or edit the notebook cell that initializes DiffbotGraphTransformer.

1) Create and activate a virtual environment : 2) Install the given dependencies :

pip install -U langchain langchain-experimental langchain-openai langchain-neo4j neo4j wikipedia langchain-community langchain-text-splitters

3) Start Neo4j with the docker-compose file below

services:
  neo4j:
    image: neo4j:5.20
    environment:
      NEO4J_AUTH: neo4j/password
      NEO4J_PLUGINS: '["apoc","graph-data-science","bloom"]'
      NEO4J_dbms_security_procedures_unrestricted: "apoc.*,gds.*"
      NEO4J_server_config_strict__validation_enabled: "false"
      NEO4J_dbms_default__database: shop
    ports: ["7474:7474","7687:7687"]
    volumes:
      - ./neo4j/data:/data
      - ./neo4j/import:/import

4) Start Ollama and pull a model:

ollama serve
# in another terminal
ollama pull llama3:8b

Walkthrough

Ensure core packages are installed inside the notebook environment (be aware of dep conflict)
Load a Wikipedia articles with this lanchain module from some names or famous character and extract the documents with Diffbot api

You can choose any person or topic for example WikipediaLoader(query="Satoshi Nakamoto")
Connect to Neo4j and ingest the graph: clean the DB first it's not empty, and add the extracted graph documents from diffbot.
Then you can query relationships around Satoshi (or your topic/person)

Run a Cypher query to inspect relationships from :Person nodes relating to “Satoshi Nakamoto” for example.
Do some graph exploration and analysis : list node labels, inspect raw relationships, graph visualization
Write graph exploration queries (top connections, interests, employment, etc.)Run several Cypher snippets to explore the graph (e.g., most connected people, interests, employment, locations, competitors, founded-by).
Verify that the target entity exists and see how it’s represented (if multiple “Satoshi Nakamoto” nodes exist, how would you choose the "one"?)
Write a python reusable function to query all relationships for a person.
Create a GraphCypherQAChain with a custom Cypher prompt and ChatOllama model.

What elements in the custom prompt help the model generate correct Cypher for your schema?
Wrap the chain invocation to simplify later usage with a ask_graph(question) helper function.
Implement get_entity_subgraph and summarize_entity functions to build structured context and prompt an LLM for summaries.

What information from the graph would you add/remove to improve the quality and factuality of summaries?
Question : Does the LLM hallucinate dates or facts? How would you constrain it to avoid this?

Some tips

Dependency warnings: If you see langchain-core incompatibilities, align versions across langchain, langchain-community, langchain-openai, langchain-neo4j, langchain-experimental, and langchain-text-splitters. Pinning versions in a requirements.txt file will help you 🥸
Neo4j schema: If queries return empty, verify that labels and properties (:Person, .id, .name) match the ones created by your extractor you can install the vscode plugin
Relationship unions: Prefer MATCH (p)-[:EMPLOYEE_OR_MEMBER_OF|:WORK_RELATIONSHIP]->(o)
Ollama: ensure ollama serve is running and your model is pulled. If responses are weak, try a larger model or adjust the prompt

Happy coding and remeber : RTFM 🤗