CiteAgent

Traditional social science research (SSR) faces several limitations, including restricted experimentation and limited applicability across diverse contexts. Experiments are often confined to tightly controlled lab environments or lack comprehensive environmental control in real-world settings, restricting the ability to gain nuanced insights into the causal mechanisms underlying human behaviors.

To address these challenges, we introduce a novel LLM-agent-based simulation platform, the CiteAgent framework, designed to simulate academic behavior with a focus on modeling the formation and evolution of citation networks. CiteAgent offers the following advantages: (1) Realistic Citation Network Modeling: Captures real-world phenomena in citation dynamics. (2) Controlled Experimental Environment: Allows researchers to systematically adjust academic environments. (3) Scalable Simulations for Social Science Research: Supports extensive and reproducible simulations that facilitate hypothesis testing and validation.

Figure 1: CiteAgent Framework Workflow

🛠️ Setup

Before we get started, please configure your OpenAI API keys in the file located at LLMGraph\llms\default_model_configs.json. The format should be as follows:

 {
        "model_type": "openai_chat",
        "config_name": "gpt-3.5-turbo-0125",
        "model_name": "gpt-3.5-turbo-0125",
        "api_key": "sk-.*",
        "generate_args": {
            "max_tokens": 2000,
            "temperature": 0.8
        },
        "client_args":{
            "base_url":""
        }
    }

Next, create the experiment and install the necessary packages by running: pip install -i "requirements.txt"

📦 Usage

We offer three seed networks enriched with text features for author and paper: Cora, Citeseer, and LLM_Agent.

To begin constructing a citation graph, please specify the task_name and config_name:

config_name: Control the academic environment setup in CiteAgent"
task_name: Choose from "cora", "citeseer", or "llm_agent_*" (where you specify the corresponding seed network).

Then, execute the following commands:

# Build the citation graph using the Cora dataset
python main.py --task cora --config <template_config_name> --build 

# Build the citation graph using the Citeseer dataset
python main.py --task citeseer --config <template_config_name> --build 

# Build the citation graph using the LLM_Agent dataset
python main.py --task llm_agent_1 --config <template_config_name> --build

Make sure to adjust the task_name according to the seed network you wish to use.

Template Configuration

To customize the simulation, adjust the configuration file found at LLMGraph\tasks\llm_agent_1\configs\template_*.

We offer support for multiple scholarly search engines, including Generated Papers, Arxiv, and Google Scholar. Change the online_retriever_kwargs field to specify the search engine you wish to use.

🧪 Experiments

For the experiments outlined in the paper, we provide a script for execution.

Download the Datasets:

citation

Format it like:

tasks/
├── citeseer/
│   ├── data/
│   ├── configs/
├── citeseer_1/
├── cora/
├── cora_1/
├── llm_agent/
├── llm_agent_*/

Run Simulation Experiments:

Start launchers in one terminal
```
python start.py --start_server
```
Then run simulation experiments in another terminal
```
python start.py 
```
Run Evaluation Metrics for Simulation Experiments:
```
python evaluate.py
```
Visualize Experimental Results: Please refer to evaluate/Graph/readme.md for detailed instructions.

✅ Results

The CiteAgent paper simulates key phenomena in citation networks, including power-law distribution and citational distortion. To analyze the mechanisms underlying these observed phenomena, we propose two LLM-based SSR research paradigms for examining human referencing behavior: LLM-SA (Synthetic Analysis) and LLM-CA (Counterfactual Analysis). Additional simulations and analyses of other phenomena are provided in the paper.

Power Law Distribution

The degree distribution of citation networks often follows a power-law distribution[1], reflecting a scale-free characteristic. Citation networks generated by the CiteAgent framework replicate this property, exhibiting realistic scale-free behavior that closely mirrors real-world citation dynamics.

Figure 2: Power Law Distribution

Citational Distortion

This phenomenon, which captures biases in citation practices[2], is effectively simulated within the CiteAgent framework. Through interactions among LLM-based agents, CiteAgent reproduces this distortion phenomena.

Figure 3: Citational Distortion

References

Barabási A L, Albert R. Emergence of scaling in random networks[J]. science, 1999, 286(5439): 509-512.
Gomez C J, Herman A C, Parigi P. Leading countries in global science increasingly receive more citations than other countries doing similar research[J]. Nature Human Behaviour, 2022, 6(7): 919-929.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
LLMGraph		LLMGraph
evaluate		evaluate
figures		figures
.gitignore		.gitignore
README.md		README.md
chop_pdf.py		chop_pdf.py
evaluate.py		evaluate.py
main.py		main.py
requirements.txt		requirements.txt
shuffle_seed.py		shuffle_seed.py
start.py		start.py
start_config.sh		start_config.sh
start_launcher.sh		start_launcher.sh
start_launchers.py		start_launchers.py
test_vllm.sh		test_vllm.sh
vllm.sh		vllm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CiteAgent

🛠️ Setup

📦 Usage

Template Configuration

🧪 Experiments

✅ Results

Power Law Distribution

Citational Distortion

References

About

Uh oh!

Releases

Packages

Languages

RUC-ALGO/CiteAgent

Folders and files

Latest commit

History

Repository files navigation

CiteAgent

🛠️ Setup

📦 Usage

Template Configuration

🧪 Experiments

✅ Results

Power Law Distribution

Citational Distortion

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages