Meta GenAI Evaluations
I founded and lead the centralized Evaluations org of research scientists and engineers that is responsible for evaluation strategy, development, tooling, and execution for all GenAI foundation models (e.g. Llama).
Our work spans all modalities (text, image, audio, video, etc.), capabilities (safety, reasoning, intelligence, agentic behaviors, etc.), and evaluation types (automated, AI-assisted, human, etc.).
|
|
Snorkel AI
Building Snorkel Flow, the AI data development platform for the enterprise.
For more details on the company, visit snorkel.ai. For more details on my role at the company, visit my LinkedIn profile. |
|
Snorkel: Research Library
Snorkel
is a system for rapidly creating, modeling, and
managing training data. It is the flagship implementation
of the data programming paradigm for supporting weak
supervision resources. Collaborators and active users
include over a forty major technical and medical organizations (e.g., Google, Microsoft, Intel, Toshiba, JPL, Alibaba, Stanford Medicine, etc.).
VLDB 2018 (oral)
"Best of VLDB"
|
|
Snorkel MeTaL: Weak Supervision
for Multi-Task Learning
We extend Snorkel to multi-tasking supervision and multi-task
learning (MTL). In particular, we are interested in the
massive multi-task learning regime where we
have large numbers of tasks and labels of varying types,
granularities, and label accuracies. Using Snorkel MeTaL,
we achieved new state-of-the-art scores on the GLUE Benchmark
and four of its component tasks.
AAAI 2019 (oral), DEEM (SIGMOD) 2018 (oral)
|
|
Software 2.0: Learning-Centric Software Stacks
Driven by accuracy improvements and deployment advantages, many organizations have begun to shift toward learning-centered
software stacks.
That is, "Software 1.0" code with explicit instructions written by programmers is being replaced by "Software 2.0" code that is written in the weights of neural networks.
In this paradigm, training data becomes the primary interface through which developers interact with their Software 2.0 systems.
This requires a new level of scalability, control, and efficiency when it comes to generating and shaping training sets.
We are exploring the ramifications of this new programming model and building the tools to support it.
CIDR 2019 (oral)
|
|
Self-Feeding Chatbots
Most of the conversations a chatbot sees over its lifetime happen after it's deployed.
These conversations aren't typically useable as training data, but give the chatbot the
right tools and it can learn from those too! We introduced a multi-task "self-feeding"
chatbot that knows how to extract new training examples from the conversations it
participates in to improve itself further after it's deployed.
ACL 2019
|
|
Generating Titles for Web Tables
We introduce a framework for generating titles for tables
that are displayed out of their original context. We use a
pointer-generator network, a sequence-to-sequence model that is
capable of both generating tokens and copying tokens from the
input (such as rare and out-of-vocab words), resulting in titles
that are both relevant and readable and reducing hallucinations.
The Web Conference (WWW) 2019
|
|
Babble Labble: Learning from Natural Language Explanations
We explore collecting natural language explanations
for why annotators give the labels they do and
parsing these into executable functions, which can then
be used to generate noisy labels for large amounts of
unlabeled data. The resulting probabalistically labeled
training dataset can then be used to train a powerful
downstream discriminative model for the task at hand.
We find that utilizing these natural language explanations
allows real-world users to train classifiers with
comparable F1 scores up to 100 times faster than when they
provide just labels.
ACL 2018 (oral), NeurIPS 2017 Demo
|
|
Fonduer: Knowledge Base Construction from Richly Formatted Data
We introduce an information extraction framework that utilizes
multiple representations of the data (structural, tabular,
visual, and textual) to achieve state-of-the-art performance
in four real-world extraction taks. Our framework is currently
in use commercially at Alibaba and with law enforcement agencies
fighting online human trafficking.
SIGMOD 2018 (oral)
|
|
A Machine-Compiled Database of Genome-Wide Association Studies (Nature Comms)
Using the multi-modal parsing and extraction tools from Fonduer and learning and inference tools from
Snorkel, we construct a knowledge base of genotype/phenotype associations extracted from the text
and tables in ~600 open-access papers from PubMed Central. Our system expands existing manually
curated databases by approximately 20% with 92% precision.
Bio-Ontologies 2017, NeurIPS 2017 MLCB Workshop
[pdf]
|
|
Collective Supervision of Topic Models for Predicting Surveys with Social Media
We use topic models to correlate social media messages
with survey outcomes and to provide an interpretable
representation of the data. Rather than rely on
fully unsupervised topic models, we use existing aggregated
survey data to inform the inferred topics, a class
of topic model supervision referred to as collective supervision.
AAAI 2016
|
|
Recommender Systems for the Department of Defense and Intelligence Community
With an internal committee of 20 MIT and DoD researchers,
I spearheaded the construction of this report, which
formalizes the components and complexities of recommender systems
and surveys their existing and potential uses in the
Department of Defense and U.S. Intelligence community.
MITLL Journal 2016
[pdf]
|
QALF: Information Extraction
for the Long Tail via Question Answering
We use a Question Answering (QA) model as a flexible means of
converting domain expertise expressed as natural language
into weak supervision resources (labeling functions, or LFs).
Preliminary results suggest that with as few as a dozen user inputs
(domain-relevant questions), we can quickly build first-order
extractors for new relations that lack distant supervision resources.
|
|
L-dominance: An approximate-domination mechanism for adaptive resolution of Pareto frontiers
We propose a mechanism called L-dominance (based on the
Lamé curve) which promotes adaptive resolution of solutions
on the Pareto frontier for evolutionary multi-objective
optimization algorithms.
SMO Journal, AIAA ASM 2015, Honors Thesis
Best Student Paper
|
|
Reducing Shock Interactions in a High Pressure Turbine via 3D Aerodynamic Shaping
We show that the shock wave reflections inside a turbine
engine can be approximated by calculating the 3D surface
normal projections of the airfoils. Using a genetic algorithm,
we produce superior airfoil geometries (with respect to
high cycle fatigue failure) four orders of magnitude faster
than the traditional CFD-based approach.
AIAA Journal, AIAA ASM 2014
Best Student Paper
|
|
The Smart Normal Constraint Method for Directly Generating a Smart Pareto Set
We introduce the Smart Normal Constraint (SNC) method, the
first method capable of directly generating a smart Pareto
set (a Pareto set in which the density of solutions varies
such that regions of significant tradeoff have the greatest
resolution). This is accomplished by iteratively updating
an approximation of the design space geometry, which is
used to guide subsequent searches in the design space.
SMO Journal, AIAA MDO 2013
|
|
Usage Scenarios for Design Space Exploration with a Dynamic Multiobjective Optimization Formulation
We investigate three usage scenarios for formulation space
exploration, building on previous work that introduced a new
way to formulate multi-objective problems, allowing a designer
to change up update design objectives, constraints, and variables
in a fluid manner that promotes exploration.
RiED Journal, ASME DETC 2012
Best Paper
[pdf]
|