Posteado por: pearlblue | junio 18, 2008

Martin Kay: Publications and Relevant Articles

Martin Kay has been described as a leader in computational linguistics since 1950 to present. His works show a new field of scientific research and development.

Here are his latest publications:

  • 2005 Martin Kay: A Life of Language. Computational Linguistics 31(4): 425-438 (2005)
  • 2004 Martin Kay: Substring Alignment Using Suffix Trees. CICLing 2004: 275-282
  • 1997 Martin Kay: The Proper Place of Men and Machines in Language Translation. Machine Translation 12(1-2): 3-23 (1997)
  • 1997 Martin Kay: It’s Still the Proper Place. Machine Translation 12(1-2): 35-38 (1997)
  • 1996 Martin Kay: Chart Generation. ACL 1996: 200-204
  • 1994 Mark Johnson, Martin Kay: Parsing and Empty Nodes. Computational Linguistics 20(2): 289-300 (1994)
  • 1994 Ronald M. Kaplan, Martin Kay: Regular Models of Phonological Rule Systems. Computational Linguistics 20(3): 331-378 (1994)
  • 1993 Martin Kay, Martin Röscheisen: Text-Translation Alignment. Computational Linguistics 19(1): 121-142 (1993)
  • 1992 Martin Kay: Ongoing directions in Computational Linguistics. COLING 1992
  • 1990 Mark Johnson, Martin Kay: Semantic Abstraction and Anaphora. COLING 1990: 17-27
  • 1987 Martin Kay: Nonconcatenative Finite-State Morphology. EACL 1987: 2-10
  • 1985 Lauri Karttunen, Martin Kay: Structure Sharing with Binary Trees. ACL 1985: 133-136
  • 1984 Martin Kay: The Dictionary Server. COLING 1984: 461-
  • 1984 Martin Kay: Functional Unification Grammar: A Formalism For Machine Translation. COLING 
  • 1984 Martin Kay: Unification in Grammar. Natural Language Understanding and Natural Language Understanding Workshop 1984: 233-240
  • 1982 Martin Kay: Machine Translation. American Journal of Computational Linguistics 8(2): 74-78 (1982)
  • 1979 Martin Kay: Syntactic Process. ACL 1979
  • 1977 Daniel G. Bobrow, Ronald M. Kaplan, Martin Kay, Donald A. Norman, Henry S. Thompson, Terry Winograd: GUS, A Frame-Driven Dialog System. Artif. Intell. 8(2): 155-173 (1977)
  • 1974 Robert Balzer, Norton Greenfeld, Martin Kay, William Mann, Walter Ryder, David Wilczynski, Albert L. Zobrist: Domain-Independent Automatic Programming. IFIP Congress 1974: 326-330
  • 1962 Martin Kay: Rules of Interpretation – An Approach to the Problem of Computation in the Semantics of Natural Language. IFIP Congress 1962: 318-322


Some of his most important articles:

Regular models of phonological rule systems. This paper presents a set of mathematical and computational tools for manipulating and reasoning about regular languages and regular relations and argues that they provide a solid basis for computational phonology. It shows in detail how this framework applies to ordered sets of context-sensitive rewriting rules and also to grammars in Koskenniemi’s two-level formalism. This analysis provides a common representation of phonological constraints that supports efficient generation and recognition by a single simple interpreter. 

Text-translation alignment. We present an algorithm for aligning texts with their translations that is based only on internal evidence. The relaxation process rests on a notion of which word in one text corresponds to which word in the other text that is essentially based on the similarity of their distributions. It exploits a partial alignment of the word level to induce a maximum likelihood alignment of the sentence level, which is in turn used, in the next iteration, to refine the word level estimate. The algorithm appears to converge to the correct sentence alignment in only a few iterations.

A logical version of functional grammar. Kay’s functional-unification grammar notation is a way of expressing grammars which relies on very few primitive notions. The primary syntactic structure is the feature structure, which can be visualised as a directed graph with arcs labeled by attributes of a constituent, and the primary structure-building operation is unification. In this paper we propose a mathematical formulation of FUG, using logic to give a precise account of the strings and the structures defined by any grammar written in this notation.



Posteado por: pearlblue | junio 17, 2008

Martin Kay: Biography

Martin Kay

Prof. Martin Kay is a computer scientist known for his work in computational linguistics at Standford U. and Honorary Professor at Saarland U. He was born and grew up in Great Britain. He received his M.A. from Trinity College, Cambridge, in 1961. In 1958 he started to work at the Cambridge Language Research Unit. Kay is one of the pioneers of computational linguistics and machine translation. He was responsible for introducing the notion of chart parsing in computational linguistics, and the notion of unification in linguistics generally.


With Ron Kaplan, he pioneered research and application development in finite-state morphology. He has been a longtime contributor to, and critic of, work on machine translation. In 1961 he moved to the Rand Corporation in Santa Monica, California, where he became head of research in linguistics and machine translation.

In his seminal paper “The Proper Place of Men and Machines in Language Translation”, Kay argued for MT systems that were integrated in the human translation process. He was reviewer and critic of EUROTRA, Verbmobil, and many other MT projects. Kay is former Chair of the Association of Computational Linguistics and ungoing Chair of the International Committee on Computational Linguistics.

He left Rand in 1972 to become Chair of the Department of Computer Science at the University of California, Irvine. In 1974 he moved to the Xerox Palo Alto Research Center as a Research Fellow. In 1985, while retaining his position at Xerox PARC, he joined the faculty of Stanford University half-time. He is currently Professor of Linguistics at Stanford University and Honorary Professor of Computational Linguistics at Saarland University.

His achievements include the development of chart parsing and functional unification grammar and major contributions to the application of finite state automata in computational phonology and morphology. He is also regarded as a leading authority on machine translation.

His honors include an honorary Doctor of Philosophy from Gothenburg University and the 2005 Association for Computational Linguistics’ Lifetime Achievement Award. He is the permanent chairman of the International Committee on Computational Linguistic.



Posteado por: pearlblue | mayo 11, 2008

Differences between the following specialized terms (Q.3)

Machine translation, sometimes referred to by the abbreviation MT, is a sub-field ofcomputational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simplesubstitution of words in one natural language for words in another. Using corpustechniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.

Computer-assisted translation,computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process.

Multilingual content management, is a multilingual website is usually a mixture of global and local content. Local content presents no particular content management issues; global content – which has to be translated across all language locales – does. Deciding where multiple language versions of content are going to be required and where content can be maintained separately for different locales is a critical decision that will affect how a site should be maintained and what it will cost.

Translation is the action of interpretation of the meaning of a text, and subsequent production of an equivalent text, also called a translation, that communicates the same message in another language. The text to be translated is called the source text, and the language it is to be translated into is called the target language; the final product is sometimes called the “target text.”



Posteado por: pearlblue | mayo 9, 2008

Translation examples by MT system (Q.3)

I have used transledium to translate this text into different languages: 

“Cuando uno es niño, el mundo cabe en el interior de una nuez. En términos geográficos nuestro universo comprende un área reducida que engloba casa, colegio y en el mejor de los casos el barrio en que viven abuelos y primos. en mi caso particular, el mundo cabía holgadamente en una porción del barrio de Flores, comprendida entre la intersección de Boyacá y Avellaneda (mi casa) y la mismísima Plaza Flores, frente a la cual se alzaba mi colegio.” Kamchatka. Marcelo Figueras. 2003

English: When one is child, the world fits in the interior of a walnut. In geographical terms our universe understands a reduced area that comprises home|house, school and in the best of the cases the neighborhood in which grandparents live and cousins. in myparticular|private case, the world fit loosely in a portion of the neighborhood of Flores, comprised between the intersection of Boyacáand Hazel Wood (my home|house) and the very same FlowersPost|Square, in front of which my school stood up.

French: Quand quelqu’un est garçon, le monde entre dans l’intérieur d’une noix. Dans des termes géographiques notre univers comprend uneaire|domaine|zone réduite qu’englobe maison, école|collège et dans le meilleur des cas le quartier dans lequel des grands-parents vivent et cousins. dans mon cas particulier, le monde entrait largement dans une portion du quartier de Flores, comprise entre l’intersection deBoyacá et Coudraie (ma maison) et la même Place Fleurs, en face de laquelle mon école|collège se levait.

German: Wenn einer Kind ist, passt die Welt im Inneren einer Nuss. In geographischen Begriffen versteht unser Universum einen reduziertenBereich|Fläche, der Haus umfasst, Schule und in dem besten der Fälle das Stadtviertel, in dem Großeltern leben, und Cousins. in meinem besonderen Fall passte die Welt weit in einem Teil des Stadtviertels von Flores, zwischen dem Schnittpunkt von Boyacá und Haselbusch (mein Haus) und dem sehr gleichen Platz Blumenverstanden|beinhaltet, gegenüber von dem meine Schule sich erhob.

Catalan: Quan un és nen, el món cap en l’interior d’una nou. En termes geogràfics el nostre univers comprèn una àrea reduïda que engloba casa, col·legi i en el millor dels casos el barri en el qual viuen avis i cosins. en el meu cas particular, el món cabia folgadament en una porció del barri de Flores, compresa entre la intersecció de Boyacá i Avellaneda (la meva casa) i la mateixíssima Plaça Flors, davant la qual s’alçava el meu col·legi.

Galician: Cando un é neno, o mundo cabe no interior dunha noz. En termos xeográficos o noso universo comprende unha área reducida que engloba casa, colexio e no mellor dos casos o barrio en que viven avós e primos. no meu caso particular, o mundo cabía folgadamente nunha porción do barrio de Flores, comprendida entre a intersección deBoyacá e Abeledo (a miña casa) e a mismísima Plaza Flores, fronte á cal se alzaba o o meu colexio.



The Framework for Machine Translation Evaluation in ISLE is a resource that helps MT evaluators define contextual evaluation plans. FEMTI consists of two interrelated classifications or taxonomies: the first one lists possible characteristics of the contexts of use that are applicable to MT systems. The second one lists the possible characteristics of an MT system, along with the metrics that were proposed to measure them.

According to the FEMTI report, the characteristics of the translation task refers to the information flow intended for the output, from the point of view of the agent (human or otherwise) who receives the translation.

The main characteristics are the following:

  • Assimilation: The ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.
  • Document routing or sorting: The purpose of document routing / sorting is to scan incoming translated documents quickly in order to send them to the appropriate points for further processing or storage.
  • Information extraction or summarization: The purpose of information extraction or summarization is to extract some portion(s) of the translated text, either manually or automatically, for subsequent processing or storage. Information extraction is typically concerned with filling templates by identifying atomic elements of events. In contrast, summarization aims to provide a self-contained and internally cohesive text which serves as a selective account of the original.



Posteado por: pearlblue | mayo 1, 2008

Explanation of the Topics (Q.2)

The first topic I am going to explain is “Speech Recognition” which belongs to The Association for Computational Linguistics and Natural Processing Language (Columbus, Ohio).

Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to machine-readable input (for example, to keypresses, using the binary code for a string of character codes). The term voice recognition may also be used to refer to speech recognition, but more precisely refers to speaker recognition, which attempts to identify the person speaking, as opposed to what is being said.

Speech recognition applications include voice dialing (e.g., “Call home”), call routing (e.g., “I would like to make a collect call”), domotic appliance control and content-based spoken audio search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., word processors oremails), and in aircraft cockpits (usually termed Direct Voice Input).

The second topic is “Dialogue systems” which belongs to The Association for Computational Linguistics and Natural Processing Language (Columbus, Ohio).

dialog system is a computer system intended to converse with a human, with a coherent structure. Dialog systems have employed text, speech, graphics, haptics, gestures and other modes for communication on both the input and output channel. An architecture for a typical spoken dialog system is shown in the figure below.

What does and does not constitute a dialog system may be debatable. The typical GUI wizard does engage in some sort of dialog, but it includes very few of the common dialog system components, and dialog state is trivial.

The last topic is “Pragmatics” which belongs to The Association for Computational Linguistics and Natural Processing Language (Columbus, Ohio).

Pragmatics is the study of the ability of natural language speakers to communicate more than that which is explicitly stated. The ability to understand another speaker’s intended meaning is called pragmatic competence. An utterance describing pragmatic function is described as metapragmatic. Another perspective is that pragmatics deals with the ways we reach our goal in communication. Suppose, a person wanted to ask someone else to stop smoking. This can be achieved by using several utterances. The person could simply say, ‘Stop smoking, please!’ which is direct and with clear semantic meaning; alternatively, the person could say, ‘Whew, this room could use an air purifier’ which implies a similar meaning but is indirect and therefore requires pragmatic inference to derive the intended meaning.

Pragmatics is regarded as one of the most challenging aspects for language learners to grasp, and can only truly be learned with experience.


Posteado por: pearlblue | abril 30, 2008

Recent Research Topics on Human Language Technology (Q. 2)

This article will deal with the research topics mentioned in the major Human Language Technologies sites.

The German Research Center for Artificial intelligence is elaborating the following themes:

  • Exploiting – and automatically extending – ontologies for content processing.
  • Tighter integration of shallow and deep techniques in processing.
  • Enriching deep processing with statistical methods.
  • Combining language checking with structuring tools in document authoring.
  • Document indexing for German and English.
  • Automatically associating recognized information with related information and thus building up collective knowledge.
  • Automatically structuring and visualizing extracted information.
  • Processing information encoded in multiple languages, among them Chinese and Japanese.

    The Edimburgh Language Technology Group works on the following areas:

    • Combining Shallow Semantics and Domain Knowledge (EASIE).
    • Text Mining for Biomedical Content Curation (TXM).
    • Cross-retail Multi-agent Retail Comparison (CROSSMARC).
    • Smart Qualitalive Data: Methods and Community tools for Data Mark-up (SQUAD).
    • Machine Learning for Named Entity Recognition (SEER).
    • Integrated Models and Tools for Fine-Grained Prosody in Discourse (Synthesis).
    • Joint Action Science and Technology (JAST).
    • AMI consorting projects that are developing technologies for meeting browsing and to assist people participating in meetings from a remote location.
    • Study of how pairs collaborate when in planning a route on a map (Collaborating using diagrams).


    The Common Language Resources and Technology Infrastructure wants to achieve a number of goals:

    • They need a broad and deep understanding of the goals of CLARIN by everyone involved. Yet they cannot assume that the knowledge is already sufficiently spread.
    • They need to start the interaction with everyone involved and interested and to take up the comments and ideas from all the experts.
    • They need to spread the relevant messages about the different layers of the work that is involved when setting up a research infrastructure in particular since it involves aspects that were not yet topic of the general discussions in our field.
    • We need to create a positive atmosphere and an enthusiasm which will be important to meet our challenging goals.
    • They need to start the actual work in the working groups and invite all experts to participate.
    • Of course those who are partners in the EC funded project need to understand the rules of the game. In particular the double funding scheme – national and EC funding – needs careful attention from all of them. Other members need to be informed about the national groups.


    The Association for Computational Linguistics and Natural Processing Language (Columbus, Ohio) invite student researchers to submit their work to the workshop. The research being presented can come from any topic area within computational linguistics including, but not limited to, the following topic areas: 
    • pragmatics, discourse, semantics, syntax and the lexicon
    • phonetics, phonology and morphology
    • linguistic, mathematical and psychological models of language
    • information retrieval, information extraction, question answering
    • summarization and paraphrasing
    • speech recognition, speech synthesis
    • corpus-based language modeling
    • multi-lingual processing, machine translation, translation aids
    • spoken and written natural language interfaces, dialogue systems
    • multi-modal language processing, multimedia systems
    • message and narrative understanding systems
    Posteado por: pearlblue | marzo 19, 2008

    Definition of Human Language Technologies (Q.1)

    There are many definitions of Human Language Technologies which can be found on the Net. The free encyclopedia Wikipedia refers to the term as Natural Language Processing (NLP) and defines:

    “It is a sub-field of artificial intelligence and computational linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural-language-generation systems convert information from computer databases into normal-sounding human language. Natural-language-understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.”

    There is another definition written by Hans Uszkoreit in his study What is Language Technology? He says:

    “It compromises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering branch of computational linguistics.”

    Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin and the University of Texas at Austin. During his time in Austin he also worked as a research associate in a large machine translation project at the Linguistics Research Center. In 1984 Uszkoreit received his Ph.D. in linguistics from the University of Texas. From 1982 to 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, Ca. During this time he was also affiliated with the Center for the Study of Language and Information at Standford university as a senior researcher and later as a project leader. In 1986 he spent six months in Stuttgart on an IBM Research Fellowship at the Science Division of IBM Germany. In December 1986 he returned to Stuttgart to work for IBM Germany as a project leader in the project LILOG (Linguistics and Logical Methods for the Understanding of German Texts). During this time, he also taught at the University of Stuttgart.

    We can cuote some of his recent publications:

    • Uszkoreit, H. (2007) Methods and Applications for Relation Detection. In: Proceedings of the Third IEEE International Conference on Natural Language Processing and Knowledge Engineering, Beijing, 2007.
    • Bertomeu, N., H. Uszkoreit, A. Frank, H.-U. Krieger, B. Jörg (2006) Contextual phenomena and thematic relations in database QA dialogues: results from a Wizard-of-Oz Experiment. Proceedings of the HLT-NAACL 2006 Workshop on Interactive Question Answering, New York.
    • Aslan I., F. Xu, H. Uszkoreit, A. Krüger, and J. Steffen (2005) COMPASS2008: Multimodal, multilingual and crosslingual interaction for mobile tourist guide applications. To appear in Proceedings of intelligent Technologies for interactive Entertainment (Intetain) 2005.
    • Uszkoreit, H & B. Joerg (2003 A Virtual Information Center for Language Technology: Ontology, Datastructure, Realization, In: Nordic Language Technology Yearbook, Museum Tusculanums Forlag, Copenhagen.
    • Oepen, S., D. Flickinger, J. Tsujii, and H. Uszkoreit. (2002). Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing. CSLI Publications, Stanford, 2002
    Some European Research Centres for Human Language Technologies:
    1. National Centre for Language Technology (NCLT)
    2. Language Technology Documentation Centre (Finland)
    3. Edinburgh Language Technology Group (LTG)
    4. Language Technology Lab


    Posteado por: pearlblue | febrero 24, 2008

    Microsoft se abre a la competencia

    El gigante de la informática permitirá a otras compañías el acceso a datos de su ’software’ hasta ahora secretos para facilitar el desarrollo de programas.

    Tras el por ahora frustrado asalto a Yahoo! a golpe de talonario, el gigante de la informática anunció este jueves que permitirá el acceso a información hasta ahora secreta de sus progrmas para hacerlos más compatibles y facilitar así el desarrollo de software tanto independiente como comercial.

    Los responsables explicaron en Redmond (EE UU) que la información será gratis para quienes creen programas no comerciales, mientras que quienes quieran hacer negocio tendrán que “hacer un pequeño pago”.

    “Nuestro objetivo es favorecer una mayor interoperabilidad (compatibilidad entre productos de diferentes fabricantes) y una mayor posibilidad de elección para nuestros clientes y para los desarrolladores, haciendo nuestros productos más transparentes y compartiendo más información sobre nuestras tecnologías” declaró Steve Ballmer; primer ejecutivo de Microsoft.

    La compañía se regirá para ello por cuatro principios: asegurar las conexiones abiertas, promover la portabilidad de datos, aumentar el apoyo para el establecimiento de estándares y tener una relación más abierta con los clientes y la industria.


    Fuente: EL CORREO Viernes, 22 de febrero de 2008

    Posteado por: pearlblue | febrero 9, 2008

    The world’s thinnest notebook: MacBook Air


    MacBook Air is ultrathin, ultraportable, and ultra unlike anything else. But you don’t lose inches and pounds overnight. It’s the result of rethinking conventions. Of multiple wireless innovations. And of breakthrough design. With MacBook Air, mobile computing suddenly has a new standard.

    MacBook Air is nearly as thin as your index finger. Practically every detail that coulb be streamlined has been. Yet it still has a 13.3-inch widescreen LED display, full-size keyboard, and large multi-touch trackpad. It’sincomparably portable without the usual ultraportable screen and keyboard compromises.

    The incredible thinness of MacBook Air is the result of numerous size -and weight- shaving innovations. From a slimmer hard drive to strategically hidden I/O ports to a lower-profile battery, everything has been considered and reconsidered with thinness in mind.

    MacBook Air is designed and engineered to take full advantage of the wireless world. A world in which 802.11n Wi-Fi is now so fast and so available, people are truly living untethered – buying and renting movies online, downloading software, and sharing and storing files on the web.


    Source: Consulting date, 19 January 2008

    Older Posts »