Subject: amta workshop on embedded mt systems - cfp

workshop announcement - - - - - - - - - - - - - - - - - - - - workshop on embedded mt systems call for papers design , construction , and evaluation of systems with an mt component wednesday , october 28 , 1998 ( preceding the amta 98 conference ) sheraton bucks county hotel , langhorne , pennsylvania introduction as the strengths and weaknesses of machine translation ( mt ) engines have become better understood and accepted , there has been a marked increase in the development of computer systems with anembedded mt component . one consequence of this shift to " embedded mt " is that researchers , developers , as well as users have begun pushing the limits on the input that such systems will accept for translation . in so doing , a new class of problems has surfaced : any input - - - whether it appears in physical form on paper , in electronic form on-line , or mixed in with another modality such as graphics or video - - - will bring with it some unknown mix of noisy natural language data as well as non-linguistic data . how are systems with an mt component to be designed and evaluated given the challenge this input brings ? the objective of this workshop is to examine and evaluate techniques for adjusting this " linguistic impedance mismatch " between the real-world input and the natural language input expected by various mt engines . thus the workshop will focus on computational approaches to preprocessing system input for mt engines andon statistical methods for evaluating systems with an embedded mt component . linguistic preprocessing in image data for researchers working with image data , there is currently underway an effort to augment ocr ( optical character recognition ) engines with linguistic data as they recognize and convert bitmap data into characters - - - similar to what has already been done in speech recognition with linguistic data in hmms ( hidden markov models ) . other ocr researchers have also experimented with image-level early topic detection using word-shape recognition . in principle , this could provide a first-step filtering of documents into a more homogeneous mt input set , a desirable goal for mt evaluation . thus we expect that individuals working with or intending to incorporate ocr into their computer systems will be interested in this new area . linguistic preprocessing in online data for those working with online input , even though the characters are already present , there often still remains the task of preprocessing meaningful , symbolic character strings that are not a part of the text to be translated . for some systems , the rules for identifying and encapsulating or removing such strings may need to be hand-crafted over time as mt engine limitations surface . for others , a combination of hand-crafted rules and statistically trained nl models has worked . many have observed that the html annotations , alphanumeric items , spreadsheet and word processing codes are harder to weed out than originally expected . research efforts with the low-density and less-commonly taught languages , as well as more common ones , encounter a substantial problem with variation in spelling conventions and transcription preferences . for those natural languages that are primarily spoken and not written , for example , this is frequently the case . researchers working on this class of problem have built variants on spell checkers ( sc ) , components that standardize words to one orthography ( spelling convention ) before submitting it to an mt engine . an idea that has arisen for this component is to build in an option to adjust the level of sc correction - - - as would be relevant when input after ocr nonetheless varies from very noisy to relatively clean . evaluation of embedded mt systems among those working on statistical methods for evaluating systems with an embedded mt component , we have seen two distinct trends . one group of statisticians has begun looking for appropriate models from outside the world of mt evaluation , examining the efforts by others to take distinct metrics for components and combine them for an overall system-level metric using fuzzy mathematics . another group of researchers is looking instead at developing a one-dimensional scale for ranking mt engines along a continuum defined by system-level function . that approach , for example , might rank one engine as good enough for filtering documents , while another engine deemed more linguistically robust would be ranked higher because it could generate a good enough initial translation for subsequent post-editing . we welcome other functional evaluations of mt components and computer systems with embedded mt components as well . submissions submitters are invited to send in a short paper , not more than 5 pages , addressing one or more of the three areas discussed above . papers should define the problem in an embedded mt system that is the focus of the work , describe the embedded mt system design ( a simple sketch ) with sample input data where relevant , and present their approach to the problem . work at various stages of completion is acceptable ; we expect the current status of the work to be made clear . submission of end-to - end output of an embedded mt system is especially encouraged . the papers will be collected and distributed to participants of theworkshop . ideally , the result of the workshop will be a clearer delineation of : ( 1 ) the range of linguistic preprocessing problems ( 2 ) the range of designs in embedded mt systems ( 3 ) how these problems aretreated in different embedded mt systems and ( 4 ) the metrics that are being used to evaluate these systems and their components . dates notice of interest in participation : july 10 , 1998 ( to voss @ arl . mil ) please identify which of the three areas you intend to address : preprocessing in image data , preprocessing in online data , evaluation of embedded mt systems . position paper submission : august 10 , 1998 notifications : september 10 , 1998 final copies of papers : october 10 , 1998 workshop : october 28 , 1998 submissions may be in printed or electronic form . submissions should be sent to : clare voss army research laboratory amsrl-is - ci 2800 powder mill road adelphi , md 20783 phone : ( 301 ) 394-5615 fax : ( 301 ) 394-3903 e-mail : voss @ arl . mil the registration fee for the conference is $ 50 . non - presenters will be accepted on a first-come , first served basis . we strongly encourage the participation of embedded mt system users , as well as members of the research and development communities . after july 11 , 1998 , a copy of the call , the registration form , and further update information will be available via a link at : < http : / / rpstl . arl . mil / isb / > florence reeder | phone : ( 703 ) 883-7156 the mitre corporation | ( 703 ) 883-6750 ( secretary ) ms w640 | fax : ( 703 ) 883-1279 1820 dolley madison blvd . | email : reeder @ azrael . mitre . org mclean , va 22102 |
