Subject: distributing and accessing linguistic resources

* * * * * * * * * * * * * * * * * * * * * * call for participation * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * distributing and accessing linguistic resources * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * may 27th , this workshop is part of first international conference on language resources and evaluation at the university of granada , may 26th to 30th 1998 ( see http : / / ceres . ugr . es / ~ rubio / elra . html for details and how to register ) . the workshop will discuss ways to increase the efficacy of linguistic resource distribution and programmatic access , and work towards the definition of a new method for these tasks based on distributed processing and object-oriented modelling with deployment on the www . organizers : yorick wilks , wim peters , hamish cunningham , remi zajac provisional programme - - - - - - - - - - - - - - - - - - - - panel discussion : distributing and accessing linguistic resources khalid choukri , eduard hovy , judith klavans , yorick wilks , antonio zampolli full papers : common formats of mt user dictionaries and environments for exchanging them as a part of aamt activities s . kamei , e . itoh , m . fujii , t . hirai , y . saitoh , m . takahashi , t . hiyama , k . muraki nec / toshiba / sharp / fujitsu / kyushu matsushita , japan distributed thesaurus storage and access in a cultural domain application s . boutsis , b . georgantopoulos , s . piperidis institute for language and speech processing , athens linguistic research utilizing the edr electronic dictionary as a linguistic resource t . ogino edr , japan corpus - based research using the internet d . broeder , h . brugman , a . russel , p . wittenburg , r . piepenbrock max planck institute for psycholinguistics / celex centre for lexical expertise , nijmegen an architecture for distributed nlp objects r . zajac new mexico state university a new model for language resource access and distribution w . peters , h . cunningham , y . wilks , c . mccauley university of sheffield posters : tractor : telri research archive of computational tools and resources r . krishnamurthy university of birmingham the cue corpus access tool o . mason university of birmingham web - surfing the lexicon d . cabrero , m . vilares , l . docampo , s . sotelo ramon pineiro research centre / universities of coruna and santiago exploring distributed mt o . streiter , a . schmidt - wigger , u . reuther , c . pease iai saarbruecken a proposal for an on - line lexical database p . cassidy micra , inc . workshop scope and aims - - - - - - - - - - - - - - - - - - - - - - in general the reuse of of nlp data resources ( such as lexicons or corpora ) has exceeded that of algorithmic resources ( such as lemmatisers or parsers ) . however , there are still two barriers to data resource reuse : 1 ) each resource has its own representation syntax and corresponding programmatic access mode ( e . g . sql for celex , c or prolog for wordnet , sgml for the bnc ) ; 2 ) resources must generally be installed locally to be usable ( and of course precisely how this happens , what operating systems are supported etc . varies from case to case ) . the consequences of 1 ) are that although resources share some structure in common ( lexicons are organised around words , for example ) this commonality is wasted when it comes to using a new resource ( the developer has to learn everything afresh each time ) and that work which seeks to investigate or exploit commonalities between resources ( e . g . to link several lexicons to an ontology ) has to first build a layer of access routines on top of each resources . so , for example , if we wish to do task-based evaluation of lexicons by measuring the relative performance of an information extraction system with different instantiations of lexical resource , we might end up writing code to translate several different resources into sql or sgml . the consequence of 2 ) is that there is no way to " try before you buy " : no way to examine a data resource for its suitability for your needs before licencing it . correspondingly there is no way for a resource provider to expose limitted access to their products for advertising purposes , or gain revenue through piecemeal supply of sections of a resource . this workshop will discuss ways to overcome these barriers . the proposers will discuss a new method for distributing and accessing language resources involving the development of a common programmatic model of the various resources types , implemented in corba idl and / or java , along with a distributed server for non-local access . this model is being designed as part of the gate project ( general architecture for text engineering : http : / / www . dcs . shef . ac . uk / research / groups / nlp / gate / ) and goes under the provisional title of an active creole server . ( creole : collection of reusable objects for language engineering . currently creole supports only algorithmic objects , but will be extended to data objects . ) a common model of language data resources would be a set of inheritance hierarchies making up a forest or set of graphs . at the top of the hierarchies would be very general abstractions from resources ( e . g . lexicons are about words ) ; at the leaves would be data items that were specific to individual resources . programmatic access would be available at all levels , allowing the developer to select an appropriate level of commonality for each application . note that although an exciting element of the work could be to provide algorithms to dynamically merge common resources what we ' re suggesting initially is not to develop anything substantively new , but simply to improve access to existing resources . this is not a new standards initiative , but a way to build on previous initiatives . of course , the production of a common model that fully expressed all the subtleties of all resources would be a large undertaking , but we believe that it can be done incrementally , with useful results at each stage . early versions will stop decomposing the object structure of resources at a fairly high level , leaving the developer to handle the data structures native to the resources at the leaves of the forest . there should still be a substantial benefit in uniform access to higher level strucures . program committee - - - - - - - - - - - - - - - - yorick wilks hamish cunningham wim peters remi zajac roberta catizone paola velardi maria teresa pazienza roberto basili bran boguraev sergei nirenburg james pustejowsky ralph grishman christiane fellbaum
