Search results for "Natural language"

showing 10 items of 650 documents

Facilitating terminology translation with target lemma annotations

2021

Most of the recent work on terminology integration in machine translation has assumed that terminology translations are given already inflected in forms that are suitable for the target language sentence. In day-to-day work of professional translators, however, it is seldom the case as translators work with bilingual glossaries where terms are given in their dictionary forms; finding the right target language form is part of the translation process. We argue that the requirement for apriori specified target language forms is unrealistic and impedes the practical applicability of previous work. In this work, we propose to train machine translation systems using a source-side data augmentatio…

FOS: Computer and information sciencesLemma (mathematics)Computer Science - Computation and LanguageMachine translationProcess (engineering)Computer sciencebusiness.industryLatvianTerm (logic)Translation (geometry)computer.software_genrelanguage.human_languageTerminologylanguageArtificial intelligencebusinessComputation and Language (cs.CL)computerNatural language processingSentence

researchProduct

Effectiveness of Data-Driven Induction of Semantic Spaces and Traditional Classifiers for Sarcasm Detection

2019

Irony and sarcasm are two complex linguistic phenomena that are widely used in everyday language and especially over the social media, but they represent two serious issues for automated text understanding. Many labeled corpora have been extracted from several sources to accomplish this task, and it seems that sarcasm is conveyed in different ways for different domains. Nonetheless, very little work has been done for comparing different methods among the available corpora. Furthermore, usually, each author collects and uses their own datasets to evaluate his own method. In this paper, we show that sarcasm detection can be tackled by applying classical machine learning algorithms to input te…

FOS: Computer and information sciencesLinguistics and LanguageComputer Science - Machine LearningComputer sciencemedia_common.quotation_subjectSemantic spaceMachine Learning (stat.ML)02 engineering and technologycomputer.software_genreLanguage and LinguisticsTask (project management)Data-drivenMachine Learning (cs.LG)Artificial IntelligenceStatistics - Machine Learning020204 information systemsEveryday language0202 electrical engineering electronic engineering information engineeringSocial medianatural language processingmedia_commonComputer Science - Computation and LanguageSarcasmSettore INF/01 - Informaticabusiness.industryirony detectionIronymachine learningsemantic spaces020201 artificial intelligence & image processingArtificial intelligencebusinessIrony detectionsemantic spacecomputerComputation and Language (cs.CL)SoftwareNatural language processingsarcasm detection

researchProduct

An LP-based hyperparameter optimization model for language modeling

2018

In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find per…

FOS: Computer and information sciencesMathematical optimizationPerplexityLinear programmingComputer scienceMachine Learning (stat.ML)02 engineering and technology010501 environmental sciences01 natural sciencesTheoretical Computer ScienceNonlinear programmingMachine Learning (cs.LG)Random searchSimplex algorithmSearch algorithmStatistics - Machine Learning0202 electrical engineering electronic engineering information engineeringFOS: MathematicsMathematics - Optimization and Control0105 earth and related environmental sciencesHyperparameterComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Computer Science - LearningHardware and ArchitectureOptimization and Control (math.OC)Hyperparameter optimization020201 artificial intelligence & image processingLanguage modelSoftwareInformation Systems

researchProduct

Pattern statistics in faro words and permutations

2021

We study the distribution and the popularity of some patterns in $k$-ary faro words, i.e. words over the alphabet $\{1, 2, \ldots, k\}$ obtained by interlacing the letters of two nondecreasing words of lengths differing by at most one. We present a bijection between these words and dispersed Dyck paths (i.e. Motzkin paths with all level steps on the $x$-axis) with a given number of peaks. We show how the bijection maps statistics of consecutive patterns of faro words into linear combinations of other pattern statistics on paths. Then, we deduce enumerative results by providing multivariate generating functions for the distribution and the popularity of patterns of length at most three. Fina…

FOS: Computer and information sciencesMultivariate statisticsDistribution (number theory)Discrete Mathematics (cs.DM)Interlacing0102 computer and information sciences02 engineering and technology[INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM]01 natural sciencesTheoretical Computer ScienceCombinatoricsStatistics[MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]05A05 (Primary) 05A15 05A19 68R15 (Secondary)0202 electrical engineering electronic engineering information engineeringFOS: MathematicsDiscrete Mathematics and CombinatoricsMathematics - CombinatoricsLinear combinationMathematicsDiscrete mathematicsMathematics::Combinatorics020206 networking & telecommunicationsComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Derangement010201 computation theory & mathematicsBijectionCombinatorics (math.CO)AlphabetComputer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics

researchProduct

RIGA at SemEval-2016 Task 8: Impact of Smatch Extensions and Character-Level Neural Translation on AMR Parsing Accuracy

2016

Two extensions to the AMR smatch scoring script are presented. The first extension com-bines the smatch scoring script with the C6.0 rule-based classifier to produce a human-readable report on the error patterns frequency observed in the scored AMR graphs. This first extension results in 4% gain over the state-of-art CAMR baseline parser by adding to it a manually crafted wrapper fixing the identified CAMR parser errors. The second extension combines a per-sentence smatch with an en-semble method for selecting the best AMR graph among the set of AMR graphs for the same sentence. This second modification au-tomatically yields further 0.4% gain when ap-plied to outputs of two nondeterministic…

FOS: Computer and information sciencesParsingComputer Science - Computation and LanguageComputer sciencebusiness.industry02 engineering and technologyExtension (predicate logic)computer.software_genreSemEvalSet (abstract data type)Nondeterministic algorithm020204 information systemsTest setClassifier (linguistics)0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerComputation and Language (cs.CL)Natural language processingSentence

researchProduct

On prefix normal words and prefix normal forms

2016

A $1$-prefix normal word is a binary word with the property that no factor has more $1$s than the prefix of the same length; a $0$-prefix normal word is defined analogously. These words arise in the context of indexed binary jumbled pattern matching, where the aim is to decide whether a word has a factor with a given number of $1$s and $0$s (a given Parikh vector). Each binary word has an associated set of Parikh vectors of the factors of the word. Using prefix normal words, we provide a characterization of the equivalence class of binary words having the same set of Parikh vectors of their factors. We prove that the language of prefix normal words is not context-free and is strictly contai…

FOS: Computer and information sciencesPrefix codePrefix normal wordPre-necklaceDiscrete Mathematics (cs.DM)General Computer ScienceFormal Languages and Automata Theory (cs.FL)Binary numberComputer Science - Formal Languages and Automata TheoryContext (language use)Binary languageLyndon words0102 computer and information sciences02 engineering and technologyPrefix grammarprefix normal formsKraft's inequalityCharacterization (mathematics)Lyndon word01 natural sciencesPrefix normal formenumerationTheoretical Computer ScienceFOS: Mathematics0202 electrical engineering electronic engineering information engineeringMathematics - CombinatoricsMathematicsDiscrete mathematicsprefix normal words prefix normal forms binary languages binary jumbled pattern matching pre-necklaces Lyndon words enumerationbinary jumbled pattern matchingSettore INF/01 - InformaticaComputer Science (all)pre-necklacesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)prefix normal wordsPrefix010201 computation theory & mathematics020201 artificial intelligence & image processingCombinatorics (math.CO)binary languagesComputer Science::Formal Languages and Automata TheoryWord (group theory)Computer Science - Discrete MathematicsTheoretical Computer Science

researchProduct

Open and Closed Prefixes of Sturmian Words

2013

A word is closed if it contains a proper factor that occurs both as a prefix and as a suffix but does not have internal occurrences, otherwise it is open. We deal with the sequence of open and closed prefixes of Sturmian words and prove that this sequence characterizes every finite or infinite Sturmian word up to isomorphisms of the alphabet. We then characterize the combinatorial structure of the sequence of open and closed prefixes of standard Sturmian words. We prove that every standard Sturmian word, after swapping its first letter, can be written as an infinite product of squares of reversed standard words.

FOS: Computer and information sciencesSequenceFibonacci numberDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Sturmian wordStructure (category theory)Sturmian wordInfinite productComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Computer Science - Formal Languages and Automata Theory68R15CombinatoricsPrefixComputer Science::Discrete MathematicsCombinatorics on words Sturmian wordFOS: MathematicsMathematics - CombinatoricsClosed wordsCombinatorics (math.CO)SuffixWord (group theory)Computer Science::Formal Languages and Automata TheoryMathematicsComputer Science - Discrete Mathematics

researchProduct

Minimal forbidden factors of circular words

2017

Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language $M$, computes a DFA recognizing the language whose set of minimal forbidden factors is $M$. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word.…

FOS: Computer and information sciencesSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniGeneral Computer ScienceDiscrete Mathematics (cs.DM)Finite automatonSettore INF/01 - InformaticaFormal Languages and Automata Theory (cs.FL)Factor automatonComputer Science - Formal Languages and Automata TheoryComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Circular wordFibonacci wordMinimal forbidden factorTheoretical Computer ScienceComputer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics

researchProduct

A Freely Available Morphological Analyzer, Disambiguator and Context Sensitive Lemmatizer for German

1998

In this paper we present Morphy, an integrated tool for German morphology, part-of-speech tagging and context-sensitive lemmatization. Its large lexicon of more than 320,000 word forms plus its ability to process German compound nouns guarantee a wide morphological coverage. Syntactic ambiguities can be resolved with a standard statistical part-of-speech tagger. By using the output of the tagger, the lemmatizer can determine the correct root even for ambiguous word forms. The complete package is freely available and can be downloaded from the World Wide Web.

FOS: Computer and information sciencesSpectrum analyzerRoot (linguistics)Morphology (linguistics)Computer Science - Computation and LanguageComputer sciencebusiness.industryLemmatisationContext (language use)computer.software_genreLexiconSyntaxlanguage.human_languageGermanH.3.4NounlanguageArtificial intelligencebusinesscomputerComputation and Language (cs.CL)Natural language processingWord (computer architecture)

researchProduct

Semantic Computing of Moods Based on Tags in Social Media of Music

2014

Social tags inherent in online music services such as Last.fm provide a rich source of information on musical moods. The abundance of social tags makes this data highly beneficial for developing techniques to manage and retrieve mood information, and enables study of the relationships between music content and mood representations with data substantially larger than that available for conventional emotion research. However, no systematic assessment has been done on the accuracy of social tags and derived semantic models at capturing mood information in music. We propose a novel technique called Affective Circumplex Transformation (ACT) for representing the moods of music tracks in an interp…

FOS: Computer and information sciencesVocabularyComputer scienceMusic information retrievalmedia_common.quotation_subjectSemantic analysis (machine learning)Moodscomputer.software_genreAffect (psychology)SemanticsComputer Science - Information RetrievalSemantic computingMusic information retrievalAffective computingmedia_commonSocial and Information Networks (cs.SI)ta113Probabilistic latent semantic analysisSocial tagsbusiness.industryComputer Science - Social and Information NetworksMultimedia (cs.MM)Semantic analysisComputer Science ApplicationsMoodComputational Theory and MathematicsWeb miningta6131Vector space modelArtificial intelligenceGenresbusinesscomputerComputer Science - MultimediaInformation Retrieval (cs.IR)MusicNatural language processingPrediction.Information SystemsIEEE Transactions on Knowledge and Data Engineering

researchProduct