Disentangling Knowledge Acquisition of LLMs through Direct Corpus Exploration

Bibliographic Information

Other Title
  • 事前学習コーパスの直接検索による LLM の知識獲得の構造理解

Description

<p>While Large Language Models (LLMs) have demonstrated impressive knowledge acquisition during pre-training, the mechanisms of this process remain poorly understood. Previous research has established a correlation between the frequency of knowledge instances in training corpus and the degree of knowledge acquisition. However, existing methodologies suffer from two key limitations: insufficient experimental validation of frequency, and inadequate consideration of conflicting knowledge within training data. To address these gaps, we conduct a direct investigation of pre-training corpus to unravel the knowledge acquisition process in LLMs. Our experiments demonstrate that higher frequency of knowledge leads to more robust knowledge acquisition. Furthermore, we discover that conflicting knowledge instances within the corpus impact the degree of knowledge acquisition. Notably, our analysis suggests the existence of latent conflicts that may hinder knowledge acquisition even in cases where conflicts are not immediately apparent on the surface level.</p>

Journal

Details 詳細情報について

Report a problem

Back to top