Date: Mon, 18 Feb 2002 11:15:20 -0600 From: Shabbir Ali Syed To: CSCE976 Handin Account cc: choueiry@cse.unl.edu Subject: minutes of NLP talk CSCE 976 Class minutes Presenter: Amy Scribe: Shabbir A. Syed Material discussed: (1) Chapter 23 of AIMA. (2) Church, K.W. and E.H.Hovy 1993 Good applications for crummy machine translations. INTRODUCTION: The whole presentation is explained under the headings of..History, Applications, Significance, Difficulties, Successes, Goals, Ideas (to achieve those goals) and Important definitions. HISTORY: The idea for proposals in building machine translators began before there was computers. At first there was a dictionary look system at Birkbeck College, London 1948. Ideas of m/c translation began in 1700's first patented in 1930. American interest started by Warren Weaver, a code breaker in WW2, was rather unsuccessful. Taum -Metro was the first widespread MT translation system in use in 1977 and claimed 97% accuracy. APPLICATIONS: (1) Machine Translation. (2) Databases (3) Information Retrieval. SIGNIFICANCE: NLP comes in connected Discourse rather than in isolated sentences. Discourses correctly interpret sequence of sentences.. and all practical examples have to do with Discourse and not with individual sentences.. A document written in one language could be viewed as being a coded version of another language. Once the code was broken, translation of the language is fairly simple. DIFFICULTIES: NLP has practical applications, but none does great job in an open ended domain. Also choosing a good interpretation of sentences requires evidence from many sources. It was soon realized that translation was extremely difficult and complex and also was difficult to encode knowledge of their native language. Dictionary needs 20K to 100K words and grammars from 100 to 10k rules. Ambiguity in direct word translation and even primary language. Requires a good understanding of text. Databases worked well for a time, but the GUI user interface has now largely replaced NLP in the field of databases.. as NLP has moved onto text interpretation. More advanced NLP techniques have not yielded significantly better results. Parsing involves big dictionaries of the order of 10k, it requires defined grammar, requires that sentences follow the grammar defined and requires ability to deal with words not in dictionary. Lexicons are expensive to develop, not readily shared. SUCCESSES: Limited Domain allows for limited vocabulary, grammar, easier disambiguation and understanding. Machine Aided Translation (MAT) where a m/c starts and a real person proof reads for clarity. Database access was the first success for NLP, Circa 1970 is an example of many systems being used on mainframe systems. NLP interfaces to d/b's were developed to save mainframe operators the work of accessing data through complicated programs . ex: LUNAR, this system was never put into real operation but in one test had 78% success rate. Also CHAT: allows queries of geographical nature. In branches of Information retrieval, like Text categorization. NLP has 90% success. NLP works better for text categorization than IR, because categories are fixed. CC is much faster than humans. . automated systems now perform most of the work. Statistical methods are more common for Text Summarization.. as it understands main meaning and describes in a shorter way. Lexicons are the current trend in parsing.. it tokenizes with morphological analysis(inflectional, derivational and compound).. and does dictionary lookup on each token..and also error recovery has produces better results in better understanding of sentences. Goals: (1) To have some understanding based on communication with Natural Language. (2) Understand sentences by syntax analysis. Ideas: (1) NLP applications are all similar in that they require some level of understanding. (2) Understanding, data, query and sentences.Understanding sentences requires Parsing. IMPORTANT DEFINITIONS: Morphological analysis: It is the process of describing words in prefixes, suffixes and root forms that comprise it. Inflectional: It deals with the changes of a word due to the context-ex.. a plural. Derivational: It derives a word from another word which is of a different type.. ex. shortness is derived from short and suffix ness. Discourse: Refers to all processes of natural language understanding that attempt to understand a text or dialogue. To understand Discourse one must track the structure of an unfolding text or dialogue, and interpret every new utterance with respect to proper context- taking into account the real world setting of the utterance as well as the linguistic context built up by the utterance preceding it... AI Encyclopedia[2]. Coherent Discourse has the following branches: - Enablement. - Evaluation. - Causal. - Elaboration. - Explanation. COMMENTS OF STUDENTS: Robert Glaubius asked how NLP handles Ambiguity. and the answer to it is by Back Searching. Dan Buettner assisted in reading a word in german.. signifying machine translation difficulties. Cory Lueninghoener interpreted the sentences as a Discourse Understanding example. In the example of cologne and paris. . he pointed out that why the attention doesn't shift to cologne when we buy it. It was answered by Amy.. as Paris and home are locations and cologne is a thing.. so it is based on context.. she also said that Enablement allows to buy cologne in this example. Dan Buettner argued that Causal and Explanation Discourses were similar to which Praveen and Xu Lin agreed... Amy explained with a k-mart example.