The Boniuk Institute at Rice University conducts research, teaches, and produces public programming dedicated to advancing religious tolerance in Houston and beyond.
Event start: 2026-06-26T18:00:00Z
Event end: 2026-06-26T19:00:00Z
Location: Duncan Hall
  Speaker: Abhinav Jain Doctoral Candidate Thesis Defense Department: Computer Science Location: Duncan Hall DH 3076 Foundation models have demonstrated remarkable generalisation capabilities; however, transitioning them from broad pre-training to rigorous, domain-specific deployment exposes severe systemic bottlenecks. This dissertation addresses four critical challenges limiting their utility in specialised domains: insufficient grounding in formal domains, the prohibitive computational overhead of concurrently adapting to multiple domains, the inability to iteratively learn from experience, and the context constraints associated with high-dimensional multimodal perception. To systematically overcome these challenges, the dissertation presents complementary mechanisms spanning the model lifecycle, each targeting a distinct bottleneck in efficient domain-specific deployment.   First, immediately following pre-training, this work introduces a coarse-tuning stage via Reinforcement Learning with Coordinated Feedback (RLCF). This stage strictly grounds the model in the syntactic and semantic rules of formal domains without requiring costly verification of generated responses through domain-specific logical evaluators. Second, transitioning to the fine-tuning phase, the research tackles the server-side memory bottlenecks inherent in customising these grounded models for millions of concurrent users. By formulating Low-Rank Prompt Adaptation (LoPA), the computational and storage burdens of scaling highly personalised, domain-specific deployments are significantly mitigated. Third, shifting to active inference in domains with dynamic environments, the dissertation details RAG-Modulo. This framework facilitates continuous, post-deployment learning by constructing an experiential memory of model-environment interactions and formal verifier feedback, enabling the model to iteratively refine its decision-making. Finally, as environments and experiential memories expand to include high-dimensional visual data, standard uniform frame sampling quickly exhausts the context limits of foundation models, rendering them ineffective for long videos. To overcome this inference bottleneck, SIFT (Selective Indexing for Filtered Temporalities), a video indexing framework, is proposed. By dynamically sampling from a query-conditioned index, SIFT isolates highly relevant temporal evidence, enabling systems to filter extensive multimodal contexts while strictly adhering to computational budgets. Zoom Link: