Pronouns: she/her
My name in Chinese: 施 惟佳
Email: swj0419@uw.edu
Hi!
I am Weijia Shi, a PhD student in Computer Science at the University of Washington advised by Prof. Luke Zettlemoyer and Prof. Noah A. Smith. I have been a visiting research at Meta AI, working with Scott Yih.
Research Interests
My main research focuses on natural language processing and machine learning. I am particularly interested in retrieval-augmented LMs and trustworthy AI. My goal is to build LMs that are able to communicate with external knowledge and personal data securely and robustly.
What’s NEW
Office hours: Starting November 2023, I will be holding office hours (1~2 hours a week) dedicated to offering mentorship and advice to undergraduate/master students. If you want to chat about research and grad school application, please fill out the form
Honored to be selected as 2023 Machine Learning Rising Star
Two workshops accepted to *CL conferences. Stay tuned!
The 3rd Workshop on Knowledge Augmented Methods for NLP (ACL 2024)
Workshop on Customizable NLP (EMNLP 2024)
Organized 2nd Workshop on Knowledge Augmented Methods for NLP (KDD 2023)
Selected Publications
Please see my Google Scholar or Semantic Scholar profiles for the full list.
(*: equal contribution)
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Shangbin Feng, Weijia Shi, Yuyang Bai, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov.
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis
Detecting Pretraining Data from Large Language Models
Weijia Shi*, Anirudh Ajith*, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke Zettlemoyer
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min*, Suchin Gururangan*, Eric Wallace, Weijia Shi, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer.
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding.
Weijia Shi*, Xiaochuang Han*, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, Scott Wen-tau Yih.
REPLUG: Retrieval-Augmented Black-Box Language Models
Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Scott Wen-
tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu
Toward Human Readable Prompt Tuning: Kubrick’s The Shining is a good movie, and a good prompt too?
Weijia Shi*, Xiaochuang Han*, Hila Gonen, Ari Holtzman, Yulia Tsvetkov, Luke Zettlemoyer
EMNLP, 2023. [paper]
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu*, Yushi Hu*, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
kNN-Prompt: Nearest neighbor zero-shot inference.
Weijia Shi, Julian Michael, Suchin Gururangan, Luke Zettlemoyer
Invited Talks
2024/03: Meta AI, AI reading group
Title: In-Context Pretraining: Language Modeling Beyond Document Boundaries
2024/02: Google Research
Title: Detecting Pretraining Data from Large Language Models
2024/01: Google, NLP reading group
Title: In-Context Pretraining: Language Modeling Beyond Document Boundaries
2024/02: Cohere
Title: In-Context Pretraining: Language Modeling Beyond Document Boundaries
2023/12: KAIST, IBS Data Science Group
Title: Detecting Pretraining Data from Large Language Model
2023/03: Microsoft Cognitive Service Research Group
Title: REPLUG: Retrieval-Augmented Black-Box Language Models
Research Experience
University of Washington, 09/2020–PresentPh.D. student, supervised by Luke Zettlemoyer and Noah A. Smith
Meta AI, 06/2022–PresentVisiting Researcher, supervised by Scott Yih
University of Pennsylvania, 05/2019–09/2019Research Intern, supervised by Dan Roth
UCLA, 04/2018–06/2020Research assistant, supervised by Kai-Wei Chang and Adnan Darwiche