Search results for "computer.software_genre"

showing 10 items of 3858 documents

Data Augmentation for Pipeline-Based Speech Translation

2020

International audience; Pipeline-based speech translation methods may suffer from errors found in speech recognition system output. Therefore, it is crucial that machine translation systems are trained to be robust against such noise. In this paper, we propose two methods for parallel data augmentation for pipeline-based speech translation system development. The first method utilises a speech processing workflow to introduce errors and the second method generates commonly found suffix errors using a rule-based method. We show that the methods in combination allow significantly improving speech translation quality by 1.87 BLEU points over a baseline system.

Machine translationComputer sciencePipeline (computing)media_common.quotation_subjectSpeech recognition[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG]speech translationSpeech processingcomputer.software_genreneural machine translation[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]robustness to errorsWorkflow[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG][INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]Speech translationQuality (business)Noise (video)Suffixcomputermedia_commonHuman Language Technologies – The Baltic Perspective - Proceedings of the Ninth International Conference Baltic HLT 2020
researchProduct

Semantic Word Error Rate for Sentence Similarity

2016

Sentence similarity measures have applications in several tasks, including: Machine Translation, Paraphrase Iden- tification, Speech Recognition, Question-answering and Text Summarization. However, measures designed for these tasks are aimed at assessing equivalence rather than resemblance, partly departing from human cognition of similarity. While this is reasonable for these activities, it hinders the applicability of sentence similarity measures to other tasks. We therefore propose a new sentence similarity measure specifically designed for resemblance evaluation, in order to cover these fields better. Experimental results are discussed.

Machine translationComputer scienceSpeech recognitionWord error rate02 engineering and technologycomputer.software_genreParaphrase030507 speech-language pathology & audiology03 medical and health sciencesSemantic similarityArtificial IntelligenceLSAWord Error Rate0202 electrical engineering electronic engineering information engineeringsentence resemblanceEquivalence (formal languages)Latent Semantic AnalysiSemantic Word Error Ratesentence similarity measureSWERbusiness.industryLatent semantic analysisSentence SimilaritySemantic ComputingCognitionAutomatic summarizationComputer Networks and Communicationword relatedne020201 artificial intelligence & image processingArtificial intelligence0305 other medical sciencebusinesscomputerNatural language processingWERInformation Systems2016 IEEE Tenth International Conference on Semantic Computing (ICSC)
researchProduct

Robust Neural Machine Translation: Modeling Orthographic and Interpunctual Variation

2020

Neural machine translation systems typically are trained on curated corpora and break when faced with non-standard orthography or punctuation. Resilience to spelling mistakes and typos, however, is crucial as machine translation systems are used to translate texts of informal origins, such as chat conversations, social media posts and web pages. We propose a simple generative noise model to generate adversarial examples of ten different types. We use these to augment machine translation systems’ training data and show that, when tested on noisy data, systems trained using adversarial examples perform almost as well as when translating clean data, while baseline systems’ performance drops by…

Machine translationComputer sciencebusiness.industrycomputer.software_genreTranslation (geometry)Consistency (database systems)Robustness (computer science)Web pageNoise (video)Artificial intelligencebusinesscomputerSentenceOrthographyNatural language processing
researchProduct

Source-Target Mapping Model of Streaming Data Flow for Machine Translation

2017

Streaming information flow allows identification of linguistic similarities between language pairs in real time as it relies on pattern recognition of grammar rules, semantics and pronunciation especially when analyzing so called international terms, syntax of the language family as well as tenses transitivity between the languages. Overall, it provides a backbone translation knowledge for building automatic translation system that facilitates processing any of various abstract entities which combine to specify underlying phonological, morphological, semantic and syntactic properties of linguistic forms and that act as the targets of linguistic rules and operations in a source language foll…

Machine translationDeep linguistic processingbusiness.industryComputer sciencepattern recognitiondata miningTransfer-based machine translationcomputer.software_genreSemanticsmachine translationUniversal Networking LanguageRule-based machine translationComputer-assisted translationstreaming data flowArtificial intelligenceLanguage familynatural language processingbusinesscomputerNatural language processing
researchProduct

Translingual text mining for identification of language pair phenomena

2016

Translingual Text Mining (TTM) is an innovative technology of natural language processing for building multilingual parallel corpora, processing machine translation, contextual knowledge acquisition, information extraction, query profiling, language modeling, contextual word sensing, creating feature test sets and for variety of other purposes. The Keynote Lecture will discuss opportunities and challenges of this computational technology. In particular, the focus will be made on identification of language pair phenomena and their applications to building holistic language model which is a novel tool for processing machine translation, supporting professional translations, evaluation of tran…

Machine translationLanguage identificationComputer sciencebusiness.industry05 social sciencessimilarity metrics02 engineering and technologycomputer.software_genre050105 experimental psychologycomputational linguisticsmultilingual information retrievalUniversal Networking LanguageCache language modelLanguage technology0202 electrical engineering electronic engineering information engineeringComputer-assisted translation020201 artificial intelligence & image processing0501 psychology and cognitive sciencesinformation extractionLanguage modelArtificial intelligencebusinesscomputerLanguage industryNatural language processing2016 Sixth International Conference on Innovative Computing Technology (INTECH)
researchProduct

Outline for a Relevance Theoretical Model of Machine Translation Post-editing

2018

Translation process research (TPR) has advanced in the recent years to a state which allows us to study “in great detail what source and target text units are being processed, at a given point in time, to investigate what steps are involved in this process, what segments are read and aligned and how this whole process is monitored” (Alves 2015, p. 32). We have sophisticated statistical methods and with the powerful tools to produce a better and more detailed understanding of the underlying cognitive processes that are involved in translation. Following Jakobsen (2011), who suspects that we may soon be in a situation which allows us to develop a computational model of human translation, Alve…

Machine translationPoint (typography)business.industryComputer scienceProcess (engineering)Cognitioncomputer.software_genreTranslation (geometry)Relevance (information retrieval)Target textArtificial intelligenceState (computer science)businesscomputerNatural language processing
researchProduct

Monolingual and cross-lingual intent detection without training data in target languages

2021

Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine translating it from another language

Machine translationTK7800-8360Computer Networks and CommunicationsComputer sciencePT languages0211 other engineering and technologies02 engineering and technologycomputer.software_genre[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]DEGermanFRLTLV0202 electrical engineering electronic engineering information engineeringEN DE FR LT LV PT languagesmonolingual and cross-lingual experimentsElectrical and Electronic Engineering021110 strategic defence & security studiesbusiness.industryCosine similarityLatvian020206 networking & telecommunicationsLithuanianEager learningword and sentence transformerslanguage.human_languageLazy learningHardware and ArchitectureControl and Systems EngineeringSignal ProcessinglanguageENArtificial intelligenceElectronicsbusinesscomputerSentenceNatural language processingBERT
researchProduct

2014

Large data sets classification is widely used in many industrial applications. It is a challenging task to classify large data sets efficiently, accurately, and robustly, as large data sets always contain numerous instances with high dimensional feature space. In order to deal with this problem, in this paper we present an online Logdet divergence based metric learning (LDML) model by making use of the powerfulness of metric learning. We firstly generate a Mahalanobis matrix via learning the training data with LDML model. Meanwhile, we propose a compressed representation for high dimensional Mahalanobis matrix to reduce the computation complexity in each iteration. The final Mahalanobis mat…

Mahalanobis distanceTraining setApplied MathematicsFeature vectorHigh dimensionalcomputer.software_genreComputation complexityData miningBenchmark dataClassifier (UML)computerAlgorithmAnalysisMathematicsAbstract and Applied Analysis
researchProduct

Decision Committee Learning with Dynamic Integration of Classifiers

2000

Decision committee learning has demonstrated spectacular success in reducing classification error from learned classifiers. These techniques develop a classifier in the form of a committee of subsidiary classifiers. The combination of outputs is usually performed by majority vote. Voting, however, has a shortcoming. It is unable to take into account local expertise. When a new instance is difficult to classify, then the average classifier will give a wrong prediction, and the majority vote will more probably result in a wrong prediction. Instead of voting, dynamic integration of classifiers can be used, which is based on the assumption that each committee member is best inside certain subar…

Majority ruleBoosting (machine learning)business.industryComputer scienceFeature vectormedia_common.quotation_subjectMachine learningcomputer.software_genreRandom subspace methodComputingMethodologies_PATTERNRECOGNITIONVotingArtificial intelligenceAdaBoostbusinesscomputerClassifier (UML)Information integrationmedia_common
researchProduct

Dynamic Integration of Decision Committees

2000

Decision committee learning has demonstrated outstanding success in reducing classification error with an ensemble of classifiers. In a way a decision committee is a classifier formed upon an ensemble of subsidiary classifiers. Voting, which is commonly used to produce the final decision of committees has, however, a shortcoming. It is unable to take into account local expertise. When a new instance is difficult to classify, then it easily happens that only the minority of the classifiers will succeed, and the majority voting will quite probably result in a wrong classification. We suggest that dynamic integration of classifiers is used instead of majority voting in decision committees. Our…

Majority ruleBoosting (machine learning)business.industryComputer sciencemedia_common.quotation_subjectMachine learningcomputer.software_genreKnowledge acquisitionComputingMethodologies_PATTERNRECOGNITIONVotingInformation systemArtificial intelligenceAdaBoostbusinessClassifier (UML)computerInformation integrationmedia_common
researchProduct