Research Assistant
Long-horizon LLM Memory and Personalization Benchmark
Generated large-scale synthetic user logs from simple personal profiles using LLMs, with a focus on maintaining consistency and naturalness across long horizons. Designed evaluation pipelines to assess agent capabilities in memory, reasoning, and information aggregation, and conducted experiments across multiple baselines.