## 1999

Mark Dras. Tree Adjoining Grammar and the Reluctant Paraphrasing of Text. *PhD Thesis*, Macquarie University, Australia. 1999.

## 2001

Ulrich Germann, Mike Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada. Fast decoding and optimal decoding for machine translation. In *Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics* (ACL 2001), pages 228–235, 2001.

A good decoding algorithm is critical to the success of any statistical machine translation system. The decoder's job is to find the translation that is most likely according to a set of previously learned parameters (and a formula for combining them). Since the space of possible translations is extremely large, typical decoding algorithms are only able to examine a portion of it, thus risking to miss good solutions. Unfortunately, examining more of the space leads to unacceptably slow decodings. In this paper, we compare the speed and output quality of a traditional stack-based decoding algorithm with two new decoders: a fast but non-optimal greedy decoder and a slow but optimal decoder that treats decoding as an integer-programming optimization problem.

## 2004

Dan Roth and Wen-tau Yih. A Linear Programming Formulation for Global Inference in Natural Language Tasks. In *Proceedings of the Eighth Conference on Computational Natural Language Learning* (CoNLL-2004), pages 1-8, 2004.

Given a collection of discrete random variables representing outcomes of learned local predictors in natural language, e.g., named entities and relations, we seek an optimal global assignment to the variables in the presence of general (non-sequential) constraints. Examples of these constraints include the type of arguments a relation can take, and the mutual activity of different relations, etc. We develop a linear programming formulation for this problem and evaluate it in the context of simultaneously learning named entities and relations. Our approach allows us to efficiently incorporate domain and task specific constraints at decision time, resulting in significant improvements in the accuracy and the "human-like" quality of the inferences.

Vasin Punyakanok, Dan Roth, Wen-tau Yih, and Dav Zimak. Semantic Role Labeling Via Integer Linear Programming Inference. In *Proceedings of the International Conference on Computational Linguistics* (COLING-2004), pages 1346-1352, 2004.

We present a system for the semantic role labeling task. The system combines a machine learning technique with an inference procedure based on integer linear programming that supports the incorporation of linguistic and structural constraints into the decision process. The system is tested on the data provided in CoNLL-2004 shared task on semantic role labeling and achieves very competitive results.

Ernst Althaus, Nikiforos Karamanis, and Alexander Koller. Computing Locally Coherent Discourses. In *Proceedings of the Annual Meeting on Association for Computational Linguistics* (ACL-2004), pages 399-406, 2004.

We present the first algorithm that computes optimal orderings of sentences into a locally coherent discourse. The algorithm runs very efficiently on a variety of coherence measures from the literature. We also show that the discourse ordering problem is NP-complete and cannot be approximated.

Ulrich Germann, Mike Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada. Fast Decoding and Optimal Decoding for Machine Translation. *Artificial Intelligence*, 154 (1-2), pages 127-143, 2004.

A good decoding algorithm is critical to the success of any statistical machine translation system. The decoder's job is to find the translation that is most likely according to a set of previously learned parameters (and a formula for combining them). Since the space of possible translations is extremely large, typical decoding algorithms are only able to examine a portion of it, thus risking to miss good solutions. Unfortunately, examining more of the space leads to unacceptably slow decodings.

In this paper, we compare the speed and output quality of a traditional stack-based decoding algorithm with two new decoders: a fast but non-optimal greedy decoder and a slow but optimal decoder that treats decoding as an integer-programming optimization problem.

## 2005

Tomacz Marciniak and Michael Strube. Beyond the Pipeline: Discrete Optimization in NLP. In *Proceedings of the Ninth Conference on Computational Natural Language Learning* (CoNLL-2005), pages 136-145, 2005.

We present a discrete optimization model based on a linear programming formulation as an alternative to the cascade of classiﬁers implemented in many language processing systems. Since NLP tasks are correlated with one another, sequential processing does not guarantee optimal solutions. We apply our model in an NLG application and show that it performs better than a pipeline-based system.

Peter Koomen, Vasin Punyakanok, Dan Roth and Wen-tau Yih. Generalized Inference with Multiple Semantic Role Labeling Systems. In *Proceedings of the Ninth Conference on Computational Natural Language Learning: Shared Task* (CoNLL-2005) Shared Task, pages 181-184, 2005.

We present an approach to semantic role labeling (SRL) that takes the output of multiple argument classiﬁers and combines them into a coherent predicate-argument output by solving an optimization problem. The optimization stage, which is solved via integer linear programming, takes into account both the recommendation of the classiﬁers and a set of problem speciﬁc constraints, and is thus used both to clean the classiﬁcation results and to ensure structural integrity of the ﬁnal role labeling. We illustrate a signiﬁcant improvement in overall SRL performance through this inference.

Tzong-Han Tsai, Chia-Wei Wu, Yu-Chun Lin, and Wen-Lian Hsu. Exploiting Full Parsing Information to Label Semantic Roles Using an Ensemble of ME and SVM via Integer Linear Programming. In *Proceedings of the Ninth Conference on Computational Natural Language Learning: Shared Task* (CoNLL-2005) Shared Task, pages 233-236, 2005.

In this paper, we propose a method that exploits full parsing information by representing it as features of argument classification models and as constraints in integer linear learning programs. In addition, to take advantage of SVM-based and Maximum Entropy-based argument classification models, we incorporate their scoring matrices, and use the combined matrix in the above-mentioned integer linear programs. The experimental results show that full parsing information not only increases the F-score of argument classification models by 0.7%, but also effectively removes all labeling inconsistencies, which increases the F-score by 0.64%. The ensemble of SVM and ME also boosts the F-score by 0.77%. Our system achieves an F-score of 76.53% in the development set and 76.38% in Test WSJ.

Vasin Punyakanok, Dan Roth and Wen-tau Yih. The Necessity of Syntactic Parsing for Semantic Role Labeling. In *Proceedings of the International Joint Conference on Artificial Intelligence* (IJCAI-2005), pages 1117-1123, 2005.

We provide an experimental study of the role of syntactic parsing in semantic role labeling. Our conclusions demonstrate that syntactic parse information is clearly most relevant in the very ﬁrst stage – the pruning stage. In addition, the quality of the pruning stage cannot be determined solely based on its recall and precision. Instead it depends on the characteristics of the output candidates that make downstream problems easier or harder. Motivated by this observation, we suggest an effective and simple approach of combining different semantic role labeling systems through joint inference, which signiﬁcantly improves the performance.

Vasin Punyakanok, Dan Roth, Wen-tau Yih and Dav Zimak. Learning and Inference over Constrained Output. In *Proceedings of the International Joint Conference on Artificial Intelligence* (IJCAI-2005), pages 1124-1129, 2005.

We study learning structured output in a discriminative framework where values of the output variables are estimated by local classiﬁers. In this framework, complex dependencies among the output variables are captured by constraints and dictate which global labels can be inferred. We compare two strategies, learning independent classiﬁers and inference based training, by observing their behaviors in different conditions. Experiments and theoretical justiﬁcation lead to the conclusion that using inference based learning is superior when the local classiﬁers are difﬁcult to learn but may require many examples before any discernible difference can be observed.

Dan Roth and Wen-tau Yih. Integer Linear Programming Inference for Conditional Random Fields. In *Proceedings of the International Conference on Machine Learning* (ICML-2005), pages 737-744, 2005.

Inference in Conditional Random Fields and Hidden Markov Models is done using the Viterbi algorithm, an efficient dynamic programming algorithm. In many cases, general (non-local and non-sequential) constraints may exist over the output sequence, but cannot be incorporated and exploited in a natural way by this inference procedure. This paper proposes a novel inference procedure based on integer linear programming (ILP) and extends CRF models to naturally and efficiently support general constraint structures. For sequential constraints, this procedure reduces to simple linear programming as the inference process. Experimental evidence is supplied in the context of an important NLP problem, semantic role labeling.

Manfred Klenner. Extracting Predicate Structures from Parse Trees. In *Proceedings of the International Conference on Recent Advances in Natural Language Processing* (RANLP 2005)

We evaluate two different approaches for extracting predicate structures from parse trees. We compare the results of a rule-based algorithm incorporating decision tree learning to an integer linear programming (ILP) approach with an underlying statistical model. It turns out that the rule-based approach yields higher precision but lower recall than the ILP approach. Both approaches achieve precision rates of more than 90 %.

## 2006

Regina Barzilay and Mirella Lapata. Aggregation via Set Partitioning for Natural Language Generation. In *Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics* (HLT-NAACL-2006), pages 359-366, 2006.

The role of aggregation in natural language generation is to combine two or more linguistic structures into a single sentence. The task is crucial for generating concise and readable texts. We present an efficient algorithm for automatically learning aggregation rules from a text and its related database. The algorithm treats aggregation as a set partitioning problem and uses a global inference procedure to find an optimal solution. Our experiments show that this approach yields substantial improvements over a clustering-based model which relies exclusively on local information.

James Clarke and Mirella Lapata. Constraint-Based Sentence Compression: An Integer Programming Approach. In *Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions* (ACL-2006), pages 144-151, 2006.

The ability to compress sentences while preserving their grammaticality and most of their meaning has recently received much attention. Our work views sentence compression as an optimisation problem. We develop an integer programming formulation and infer globally optimal compressions in the face of linguistically motivated constraints.We show that such a formulation allows for relatively simple and knowledge-lean compression models that do not require parallel corpora or large scale resources. The proposed approach yields results comparable and in some cases superior to state-of-the-art.

Sebastian Riedel and James Clarke. Incremental Integer Linear Programming for Non-projective Dependency Parsing. In *Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing* (EMNLP-2006), pages 129-137, 2006.

Integer Linear Programming has recently been used for decoding in a number of probabilistic models in order to enforce global constraints. However, in certain applications, such as non-projective dependency parsing and machine translation, the complete formulation of the decoding problem as an integer linear program renders solving intractable. We present an approach which solves the problem incrementally, thus we avoid creating intractable integer linear programs. This approach is applied to Dutch dependency parsing and we show how the addition of linguistically motivated constraints can yield a significant improvement over state-of-the-art.

Philip Bramsen, Pawan Deshpande, Yoong Keok Lee, and Regina Barzilay. Inducing Temporal Graphs. In *Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing* (EMNLP-2006), 189-198, 2006.

We consider the problem of constructing a directed acyclic graph that encodes temporal relations found in a text. The unit of our analysis is a temporal segment, a fragment of text that maintains temporal coherence. The strength of our approach lies in its ability to simultaneously optimize pairwise ordering preferences and global constraints on the graph topology. Our learning method achieves 83% F-measure in temporal segmentation and 84% accuracy in inferring temporal relations between two segments.

Yejin Choi, Eric Breck, and Claire Cardie. Joint Extraction of Entities and Relations for Opinion Recognition. In *Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing* (EMNLP-2006), 431-439, 2006.

We present an approach for the joint extraction of entities and relations in the context of opinion recognition and analysis. We identify two types of opinion-related entities — expressions of opinions and sources of opinions — along with the linking relation that exists between them. Inspired by Roth and Yih (2004), we employ an integer linear programming approach to solve the joint opinion recognition task, and show that global, constraint-based inference can signiﬁcantly boost the performance of both relation extraction and the extraction of opinion-related entities. Performance further improves when a semantic role labeling system is incorporated. The resulting system achieves F-measures of 79 and 69 for entity and relation extraction, respectively, improving substantially over prior results in the area.

Manfred Klenner. Grammatical Role Labeling with Integer Linear Programming. In *Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Conference Companion* (EACL-2006), pages 187-190, 2006.

In this paper, we present a formalization of grammatical role labeling within the framework of Integer Linear Programming (ILP). We focus on the integration of sub-categorization information into the decision making process. We present a first empirical evaluation that achieves competitive precision and recall rates.

## 2007

Ryan McDonald. A Study of Global Inference Algorithms in Multi-Document Summarization. In *European Conference on Information Retrieval* (ECIR-2007), pages 557-564, 2007.

In this work we study the theoretical and empirical properties of various global inference algorithms for multi-document summarization. We start by defining a general framework for inference in summarization. We then present three algorithms: The first is a greedy approximate method, the second a dynamic programming approach based on solutions to the knapsack problem, and the third is an exact algorithm that uses an Integer Linear Programming formulation of the problem. We empirically evaluate all three algorithms and show that, relative to the exact solution, the dynamic programming algorithm provides near optimal results with preferable scaling properties.

Pascal Denis and Jason Baldridge. Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming. In *Proceedings of the Annual Meeting of the North American Chapter of the Association for Computational Linguistics - Human Language Technology Conference* (NAACL-HLT-2007), pages 236-243, 2007.

Standard pairwise coreference resolution systems are subject to errors resulting from their performing anaphora identiﬁcation as an implicit part of coreference resolution. In this paper, we propose an integer linear programming (ILP) formulation for coreference resolution which models anaphoricity and coreference as a joint task, such that each local model informs the other for the ﬁnal assignments. This joint ILP formulation provides f-score improvements of 3.7-5.3% over a base coreference classiﬁer on the ACE datasets.

Pawan Deshpande, Regina Barzilay and David R. Karger. Randomized Decoding for Selection-and-Ordering Problems. In *Proceedings of the Annual Meeting of the North American Chapter of the Association for Computational Linguistics - Human Language Technology Conference* (NAACL-HLT-2007), pages 444-451, 2007.

The task of selecting and ordering information appears in multiple contexts in text generation and summarization. For instance, methods for title generation construct a headline by selecting and ordering words from the input text. In this paper, we investigate decoding methods that simultaneously optimize selection and ordering preferences. We formalize decoding as a task of ﬁnding an acyclic path in a directed weighted graph. Since the problem is NP-hard, ﬁnding an exact solution is challenging. We describe a novel decoding method based on a randomized color-coding algorithm. We prove bounds on the number of color-coding iterations necessary to guarantee any desired likelihood of ﬁnding the correct solution. Our experiments show that the randomized decoder is an appealing alternative to a range of decoding algorithms for selection-and-ordering problems, including beam search and Integer Linear Programming.

Manfred Klenner. Shallow Dependency Labeling. In *Proceedings of the Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions* (ACL-2007), pages 201-204, 2007.

We present a formalization of dependency labeling with Integer Linear Programming. We focus on the integration of subcategorization into the decision making process,where the various subcategorization frames of a verb compete with each other. A maxi-mum entropy model provides the weights for ILP optimization.

James Clarke and Mirella Lapata. Modelling Compression with Discourse Constraints. In *Proceedings of the Conference on Empirical Methods in Natural Language Processing and on Computational Natural Language Learning* (EMNLP-CoNLL-2007), pages 1-11, 2007.

Sentence compression holds promise for many applications ranging from summarisation to subtitle generation. The task is typically performed on isolated sentences without taking the surrounding context into account, even though most applications would operate over entire documents. In this paper we present a discourse informed model which is capable of producing document compressions that are coherent and informative. Our model is inspired by theories of local coherence and formulated within the framework of Integer Linear Programming. Experimental results show significant improvements over a state-of-the-art discourse agnostic approach.

Manfred Klenner. Enforcing Consistency on Coreference Sets. In *Recent Advances in Natural Language Processing* (RANLP), pages 323-328, 2007

We show that intra-sentential binding constraints can act as global constraints that - via transitivity - enforce consistency on coreference sets. This yields about 5 % increase in performance. In our model, the probabilities of a baseline classier for coreference resolution are used as weights in an optimization model. The underlying integer linear programming (ILP) specification is straightforward - binding constraints give rise to (local) exclusiveness of markables, transitivity propagates these restrictions and enforces the re-computation of coreference sets.

Dan Roth and Wen-tau Yih. Global Inference for Entity and Relation Identification via a Linear Programming Formulation. *Introduction to Statistical Relational Learning*, 2007.

Pascal Denis. New Learning Models for Robust Reference Resolution. *PhD Thesis* University of Texas at Austin, USA. 2007.

## 2008

James Clarke and Mirella Lapata. Global Inference for Sentence Compression: An Integer Linear Programming Approach. *Journal of Artificial Intelligence Research* (JAIR), 31, pages 399-429, 2008.

Sentence compression holds promise for many applications ranging from summarization to subtitle generation. Our work views sentence compression as an optimization problem and uses integer linear programming (ILP) to infer globally optimal compressions in the presence of linguistically motivated constraints. We show how previous formulations of sentence compression can be recast as ILPs and extend these models with novel global constraints. Experimental results on written and spoken texts demonstrate improvements over state-of-the-art models.

James Clarke. Global Inference for Sentence Compression: An Integer Linear Programming Approach. *PhD Thesis*, University of Edinburgh, UK. 2008.

Vasin Punyakanok, Dan Roth and Wen-tau Yih. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. *Computational Linguistics* 34(2), pages 257-287, 2008.

We present a general framework for semantic role labeling. The framework combines a machine learning technique with an integer linear programming based inference procedure, which incorporates linguistic and structural constraints into a global decision process. Within this framework, we study the role of syntactic parsing information in semantic role labeling. We show that full syntactic parsing information is, by far, most relevant in identifying the argument, especially, in the very ﬁrst stage—the pruning stage. Surprisingly, the quality of the pruning stage cannot be solely determined based on its recall and precision. Instead, it depends on the characteristics of the output candidates that determine the difﬁculty of the downstream problems. Motivated by this observation, we propose an effective and simple approach of combining different semantic role labeling systems through joint inference, which signiﬁcantly improves its performance.

Our system has been evaluated in the CoNLL-2005 shared task on semantic role labeling, and achieves the highest F1 score among 19 participants.

Katja Filippova and Michael Strube. Dependency Tree Based Sentence Compression. In *Proceedings of the International Natural Language Generation Conference* (INLG-2008), pages 25-32, 2008.

We present a novel unsupervised method for sentence compression which relies on a dependency tree representation and shortens sentences by removing subtrees. An automatic evaluation shows that our method obtains result comparable or superior to the state of the art. We demonstrate that the choice of the parser affects the performance of the system. We also apply the method to German and report the results of an evaluation with humans.

John DeNero and Dan Klein. The Complexity of Phrase Alignment Problems. In *Proceedings of the Annual Meeting of the Association for Computational Linguistics - Human Language Technology Conference, Short Papers* (ACL-HLT-2008), pages 25-28, 2008.

Many phrase alignment models operate over the combinatorial space of bijective phrase alignments. We prove that ﬁnding an optimal alignment in this space is NP-hard, while computing alignment expectations is #P-hard. On the other hand, we show that the problem of ﬁnding an optimal alignment can be cast as an integer linear program, which provides a simple, declarative approach to Viterbi inference for phrase alignment models that is empirically quite efﬁcient.

Jenny Rose Finkel and Christopher D. Manning. Enforcing Transitivity in Coreference Resolution. In *Proceedings of the Annual Meeting of the Association for Computational Linguistics - Human Language Technology Conference, Short Papers* (ACL-HLT-2008), pages 45-48, 2008.

A desirable quality of a coreference resolution system is the ability to handle transitivity constraints, such that even if it places high likelihood on a particular mention being coreferent with each of two other mentions, it will also consider the likelihood of those two mentions being coreferent when making a ﬁnal assignment. This is exactly the kind of constraint that integer linear programming (ILP) is ideal for, but, surprisingly, previous work applying ILP to coreference resolution has not encoded this type of constraint. We train a coreference classiﬁer over pairs of mentions, and show how to encode this type of constraint on top of the probabilities output from our pairwise classiﬁer to extract the most probable legal entity assignments. We present results on two commonly used datasets which show that enforcement of transitive closure consistently improves performance, including improvements of up to 3.6% using the b3 scorer, and up to 16.5% using cluster f-measure.

Wanxiang Che, Zhenghua Li, Yuxuan Hu, Yongqiang Li, Bing Qin, Ting Liu, and Sheng Li. A Cascaded Syntactic and Semantic Dependency Parsing System. In *Proceedings of the 12th Conference on Computational Natural Language Learning* (CoNNL-2008) Shared Task, pages 238-242, 2008.

We describe our CoNLL 2008 Shared Task system in this paper. The system includes two cascaded components: a syntactic and a semantic dependency parsers. A ﬁrst-order projective MSTParser is used as our syntactic dependency parser. In order to overcome the shortcoming of the MST-Parser, that it cannot model more global information, we add a relabeling stage after the parsing to distinguish some confusable labels, such as ADV, TMP, and LOC. Besides adding a predicate identiﬁcation and a classiﬁcation stages, our semantic dependency parsing simpliﬁes the traditional four stages semantic role labeling into two: a maximum entropy based argument classiﬁcation and an ILP-based post inference. Finally, we gain the overall labeled macro F1 = 82.66, which ranked the second position in the closed challenge.

Katja Filippova and Michael Strube. Sentence Fusion via Dependency Graph Compression. In *Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing* (EMNLP-2008), pages 177-185, 2008.

We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize. We use GermaNet and Wikipedia for checking semantic compatibility of co-arguments. In an evaluation with human judges our method out-performs the fusion approach of Barzilay & McKeown (2005) with respect to readability.

Dan Goldwasser and Dan Roth. Transliteration as Constrained Optimization. In *Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing* (EMNLP-2008), pages 353-362, 2008.

This paper introduces a new method for identifying named-entity (NE) transliterations in bilingual corpora. Recent works have shown the advantage of discriminative approaches to transliteration: given two strings (ws, wt) in the source and target language, a classiﬁer is trained to determine if wt is the transliteration of ws . This paper shows that the transliteration problem can be formulated as a constrained optimization problem and thus take into account contextual dependencies and constraints among character bi-grams in the two strings. We further explore several methods for learning the objective function of the optimization problem and show the advantage of learning it discriminately. Our experiments show that the new framework results in over 50% improvement in translating English NEs to Hebrew.

Nathanael Chambers and Dan Jurafsky. Jointly Combining Implicit Constraints Improves Temporal Ordering. In *Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing* (EMNLP-2008), pages 698-706, 2008.

Previous work on ordering events in text has typically focused on local pairwise decisions, ignoring globally inconsistent labels. However, temporal ordering is the type of domain in which global constraints should be relatively easy to represent and reason over. This paper presents a framework that informs local decisions with two types of implicit global constraints: transitivity (A before B and B before C implies A before C) and time expression normalization (e.g. last month is before yesterday). We show how these constraints can be used to create a more densely-connected network of events, and how global consistency can be enforced by incorporating these constraints into an integer linear programming framework. We present results on two event ordering tasks, showing a 3.6% absolute increase in the accuracy of before/after classiﬁcation over a pairwise model.

Sujith Ravi and Kevin Knight. Attacking Decipherment Problems Optimally with Low-Order N-gram Models. In *Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing* (EMNLP-2008), pages 812-819, 2008.

We introduce a method for solving substitution ciphers using low-order letter n-gram models. This method enforces global constraints using integer programming, and it guarantees that no decipherment key is overlooked. We carry out extensive empirical experiments showing how decipherment accuracy varies as a function of cipher length and n-gram order. We also make an empirical investigation of Shannon’s (1949) theory of uncertainty in decipherment.