Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems

AI-generated keywords: Graph-based Retrieval Augmented Generation Enterprise Environments Knowledge Graph Construction Lightweight Subgraph Retrieval Large Language Models

AI-generated Key Points

Proposed a scalable and cost-efficient framework for deploying GraphRAG in enterprise environments
Introduced two core innovations:
Dependency-based knowledge graph construction pipeline that eliminates reliance on large language models (LLMs)
Lightweight graph retrieval strategy for high-recall, low-latency subgraph extraction
Evaluated the framework on SAP datasets for legacy code migration with strong empirical performance improvements over traditional RAG baselines
Dependency-based construction approach achieved comparable performance to LLM-generated knowledge graphs while reducing costs and improving scalability
Highlighted scalability by eliminating dependence on large language models for knowledge graph construction
Future investigations needed to address limitations such as missing context-dependent or implicit relations not directly expressed in surface syntax
Plan to evaluate generalizability of the method to other settings beyond SAP-specific domains by testing on broader public benchmarks like HotpotQA
Study presents a promising path for scaling GraphRAG systems in real-world enterprise applications without prohibitive resource requirements, enabling practical, explainable, and domain-adaptable retrieval-augmented reasoning systems.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Congmin Min, Rhea Mathew, Joyce Pan, Sahil Bansal, Abbas Keshavarzi, Amar Viswanathan Kannan

arXiv: 2507.03226v1 - DOI (cs.AI)

License: CC BY-NC-SA 4.0

Abstract: We propose a scalable and cost-efficient framework for deploying Graph-based Retrieval Augmented Generation (GraphRAG) in enterprise environments. While GraphRAG has shown promise for multi-hop reasoning and structured retrieval, its adoption has been limited by the high computational cost of constructing knowledge graphs using large language models (LLMs) and the latency of graph-based retrieval. To address these challenges, we introduce two core innovations: (1) a dependency-based knowledge graph construction pipeline that leverages industrial-grade NLP libraries to extract entities and relations from unstructured text completely eliminating reliance on LLMs; and (2) a lightweight graph retrieval strategy that combines hybrid query node identification with efficient one-hop traversal for high-recall, low-latency subgraph extraction. We evaluate our framework on two SAP datasets focused on legacy code migration and demonstrate strong empirical performance. Our system achieves up to 15% and 4.35% improvements over traditional RAG baselines based on LLM-as-Judge and RAGAS metrics, respectively. Moreover, our dependency-based construction approach attains 94% of the performance of LLM-generated knowledge graphs (61.87% vs. 65.83%) while significantly reducing cost and improving scalability. These results validate the feasibility of deploying GraphRAG systems in real-world, large-scale enterprise applications without incurring prohibitive resource requirements paving the way for practical, explainable, and domain-adaptable retrieval-augmented reasoning.

Submitted to arXiv on 04 Jul. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2507.03226v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In this study, we propose a scalable and cost-efficient framework for deploying in enterprise environments. The adoption of GraphRAG has been limited due to the high computational cost of constructing knowledge graphs using large language models (LLMs) and the latency of graph-based retrieval. To address these challenges, we introduce two core innovations: a dependency-based knowledge graph construction pipeline that eliminates reliance on LLMs by leveraging industrial-grade NLP libraries for entity and relation extraction from unstructured text, and a lightweight graph retrieval strategy that combines hybrid query node identification with efficient one-hop traversal for high-recall, low-latency subgraph extraction. We evaluate our framework on two SAP datasets focused on legacy code migration and demonstrate strong empirical performance. Our system achieves significant improvements over traditional RAG baselines based on LLM-as-Judge and RAGAS metrics. Additionally, our dependency-based construction approach attains comparable performance to LLM-generated knowledge graphs while reducing costs and improving scalability. Furthermore, we highlight the scalability of our approach by eliminating dependence on large language models for knowledge graph construction. However, future investigations are needed to address limitations such as missing context-dependent or implicit relations not directly expressed in surface syntax. We also plan to evaluate the generalizability of our method to other settings beyond SAP-specific domains by testing it on broader public benchmarks like HotpotQA. In conclusion, our study presents a promising path for scaling GraphRAG systems in real-world enterprise applications without prohibitive resource requirements. By combining efficient knowledge graph construction from unstructured text with lightweight subgraph retrieval strategies, we pave the way for practical, explainable, and domain-adaptable retrieval-augmented reasoning systems in large-scale enterprise environments.

- Proposed a scalable and cost-efficient framework for deploying GraphRAG in enterprise environments
- Introduced two core innovations:
- Dependency-based knowledge graph construction pipeline that eliminates reliance on large language models (LLMs)
- Lightweight graph retrieval strategy for high-recall, low-latency subgraph extraction
- Evaluated the framework on SAP datasets for legacy code migration with strong empirical performance improvements over traditional RAG baselines
- Dependency-based construction approach achieved comparable performance to LLM-generated knowledge graphs while reducing costs and improving scalability
- Highlighted scalability by eliminating dependence on large language models for knowledge graph construction
- Future investigations needed to address limitations such as missing context-dependent or implicit relations not directly expressed in surface syntax
- Plan to evaluate generalizability of the method to other settings beyond SAP-specific domains by testing on broader public benchmarks like HotpotQA
- Study presents a promising path for scaling GraphRAG systems in real-world enterprise applications without prohibitive resource requirements, enabling practical, explainable, and domain-adaptable retrieval-augmented reasoning systems.

Summary- A new way to use GraphRAG in big companies was suggested. - Two important ideas were introduced: making a graph without needing big language models and quickly finding small parts of the graph. - The new method was tested on old SAP data and worked better than older methods. - The new way of making graphs worked as well as using big language models but cost less and could be used more easily. - More research is needed to make sure the new method works in different situations. Definitions- Scalable: Able to grow or change size easily. - Cost-efficient: Saving money by using resources wisely. - Framework: A structure or plan for doing something. - Graph: A visual representation of connections between things. - Enterprise environments: Big businesses or organizations where many people work together.

GraphRAG (Graph-Augmented Reasoning) is a powerful framework for deploying knowledge graphs in enterprise environments. However, its adoption has been limited due to the high computational cost of constructing knowledge graphs using large language models (LLMs) and the latency of graph-based retrieval. In this research paper, titled "Scalable and Cost-Efficient Framework for Deploying GraphRAG in Enterprise Environments", the authors propose a solution to these challenges by introducing two core innovations. The first innovation is a dependency-based knowledge graph construction pipeline that eliminates reliance on LLMs. This is achieved by leveraging industrial-grade NLP libraries for entity and relation extraction from unstructured text. By doing so, the authors are able to reduce costs and improve scalability while still achieving comparable performance to LLM-generated knowledge graphs. The second innovation is a lightweight graph retrieval strategy that combines hybrid query node identification with efficient one-hop traversal. This approach allows for high-recall, low-latency subgraph extraction, addressing one of the major limitations of traditional RAG baselines based on LLM-as-Judge and RAGAS metrics. To evaluate their framework, the authors use two SAP datasets focused on legacy code migration. The results show significant improvements over traditional RAG baselines, further highlighting the effectiveness of their proposed approach. One key aspect of this study is its focus on real-world enterprise applications. By eliminating dependence on large language models for knowledge graph construction and implementing an efficient retrieval strategy, the authors pave the way for practical, explainable, and domain-adaptable reasoning systems in large-scale enterprise environments. However, there are some limitations that need to be addressed in future investigations. For example, there may be missing context-dependent or implicit relations not directly expressed in surface syntax that could affect performance. Additionally, it would be beneficial to test this method on broader public benchmarks like HotpotQA to evaluate its generalizability beyond SAP-specific domains. In conclusion, this research paper presents a promising path for scaling GraphRAG systems in real-world enterprise applications without prohibitive resource requirements. By combining efficient knowledge graph construction from unstructured text with lightweight subgraph retrieval strategies, the authors have demonstrated the potential for practical and scalable deployment of GraphRAG in large-scale enterprise environments. This study opens up new possibilities for utilizing knowledge graphs in various industries and domains, paving the way for more advanced reasoning systems that can handle complex data and tasks.

Created on 01 Aug. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

65.9%

Revolutionizing Retrieval-Augmented Generation with Enhanced PDF Structure Re…

cs.AI

64.4%

Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs

cs.AI

63.0%

Customized Information and Domain-centric Knowledge Graph Construction with L…

cs.AI

62.5%

Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Com…

cs.AI

62.1%

GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment …

cs.AI

60.8%

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Fram…

cs.AI

60.3%

Survey on Memory-Augmented Neural Networks: Cognitive Insights to AI Applicat…

cs.AI

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.