Towards Efficient Long-context Modeling in Large Language Models

Chien Nguyen

Effectively processing and utilizing information across long text sequences is a fundamental challenge in advancing Natural Language Processing (NLP). Tasks like document-level information extraction inherently require models to understand context that spans beyond single sentences, often throughout an entire document. Our recent work on Document-level Event Argument Extraction (EAE) achieved state-of-the-art performance by leveraging contextualized soft prompts and aggregating relevant document context. However, this method, like many other powerful NLP models, relies on underlying Large Language Models (LLMs) with fixed and relatively short context windows, typically limited to a few thousand tokens. This report presents this state-of-the-art EAE work and highlights how its success, despite these context constraints, reveals the critical need for LLM architectures capable of handling significantly longer contexts efficiently. We then introduce Taipan, a novel hybrid LLM architecture we developed that combines the efficiency of State Space Models (Mamba-2) with the expressive power of Selective Attention Layers. Taipan is designed to efficiently model dependencies and retrieve information across context lengths up to one million tokens. We demonstrate Taipan’s superior performance on longcontext retrieval and extrapolation tasks, showing its potential to overcome the context bottleneck faced by current SOTA models and enable future advancements in tasks like documentlevel EAE on unprecedented scales.