Search results for "Natural language"

showing 10 items of 650 documents

The computation of word associations

2002

It is shown that basic language processes such as the production of free word associations and the generation of synonyms can be simulated using statistical models that analyze the distribution of words in large text corpora. According to the law of association by contiguity, the acquisition of word associations can be explained by Hebbian learning. The free word associations as produced by subjects on presentation of single stimulus words can thus be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. The reason is that synony…

Text corpusSyntagmatic analysisbusiness.industryComputer scienceSynonymSpeech recognitionStatistical modelcomputer.software_genreProduction (computer science)Artificial intelligencebusinessAssociation (psychology)computerNatural language processingWord (computer architecture)Proceedings of the 19th international conference on Computational linguistics -

researchProduct

Revisiting corpus creation and analysis tools for translation tasks

2016

Many translation scholars have proposed the use of corpora to allow professional translators to produce high quality texts which read like originals. Yet, the diffusion of this methodology has been modest, one reason being the fact that software for corpora analyses have been developed with the linguist in mind, which means that they are generally complex and cumbersome, offering many advanced features, but lacking the level of usability and the specific features that meet translators’ needs. To overcome this shortcoming, we have developed TranslatorBank, a free corpus creation and analysis tool designed for translation tasks. TranslatorBank supports the creation of specialized monolingual …

Text corpusTranslationProfessionalizationTraducciónLinguistics and LanguageLiterature and Literary TheoryComputer sciencetranslationCorpus toolsMonolingual corpuscomputer.software_genreProfesionalizaciónLanguage and LinguisticsTerminologyDomain (software engineering)Example-based machine translationCorpus linguisticsmonolingual corpusprofessionalizationcorpus toolsConcordancerCorpus monolingüeTerminology extractionbusiness.industrylcsh:Translating and interpretingUsabilitylcsh:P306-310Herramientas de corpusArtificial intelligencebusinesscomputerNatural language processingCadernos de Tradução

researchProduct

Discovering the Senses of an Ambiguous Word by Clustering its Local Contexts

2005

As has been shown recently, it is possible to automatically discover the senses of an ambiguous word by statistically analyzing its contextual behavior in a large text corpus. However, this kind of research is still at an early stage. The results need to be improved and there is considerable disagreement on methodological issues. For example, although most researchers use clustering approaches for word sense induction, it is not clear what statistical features the clustering should be based on. Whereas so far most researchers cluster global co-occurrence vectors that reflect the overall behavior of a word in a corpus, in this paper we argue that it is more appropriate to use local context v…

Text corpusbusiness.industryComputer scienceContext (language use)computer.software_genreWord senseWord-sense inductionArtificial intelligencebusinessCluster analysiscomputerNatural language processingWord (computer architecture)Strengths and weaknesses

researchProduct

A Controllable Text Simplification System for the Italian Language

2021

Text simplification is a non-trivial task that aims at reducing the linguistic complexity of written texts. Researchers have studied the problem by proposing new methodologies for addressing the English language, but other languages, like the Italian one, are almost unexplored. In this paper, we give a contribution to the enhancement of the Automated Text Simplification research by presenting a deep learning-based system, inspired by a state of the art system for the English language, capable of simplifying Italian texts. The system has been trained and tested by leveraging the Italian version of Newsela; it has shown promising results by achieving a SARI value of 30.17.

Text simplificationComputer scienceText simplification02 engineering and technologyEnglish languagecomputer.software_genreTask (project management)03 medical and health sciences0302 clinical medicineLinguistic sequence complexityDeep Learning0202 electrical engineering electronic engineering information engineeringValue (semiotics)Natural Language ProcessingSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniDeep Neural NetworksSettore INF/01 - Informaticabusiness.industryDeep learningItalian language030221 ophthalmology & optometryComputingMethodologies_DOCUMENTANDTEXTPROCESSING020201 artificial intelligence & image processingArtificial intelligenceState (computer science)businesscomputerNatural language processing

researchProduct

On parsing optimality for dictionary-based text compression—the Zip case

2013

Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results …

Theoretical computer scienceComputer scienceData_CODINGANDINFORMATIONTHEORYTop-down parsingcomputer.software_genreTheoretical Computer ScienceParsing optimalityCompression (functional analysis)Discrete Mathematics and CombinatoricsLossless compressionParsingLZ77 algorithmSettore INF/01 - InformaticaDeflate algorithmbusiness.industryDictionary-based text compressionComputational Theory and MathematicsData compressionDEFLATECompression ratioArtificial intelligencebusinesscomputerNatural language processingBottom-up parsingData compressionJournal of Discrete Algorithms

researchProduct

Shrinking language models by robust approximation

2002

We study the problem of reducing the size of a language model while preserving recognition performance (accuracy and speed). A successful approach has been to represent language models by weighted finite-state automata (WFAs). Analogues of classical automata determinization and minimization algorithms then provide a general method to produce smaller but equivalent WFAs. We extend this approach by introducing the notion of approximate determinization. We provide an algorithm that, when applied to language models for the North American Business task, achieves 25-35% size reduction compared to previous techniques, with negligible effects on recognition time and accuracy.

Theoretical computer scienceFinite-state machineNested wordComputer scienceQuantum finite automataAutomata theoryLanguage modelAlgorithmNatural languageAutomatonProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)

researchProduct

High Locality Representations for Automated Programming

2011

We study the locality of the genotype-phenotype mapping used in grammatical evolution (GE). GE is a variant of genetic programming that can evolve complete programs in an arbitrary language using a variable-length binary string. In contrast to standard GP, which applies search operators directly to phenotypes, GE uses an additional mapping and applies search operators to binary genotypes. Therefore, there is a large semantic gap between genotypes (binary strings) and phenotypes (programs or expressions). The case study shows that the mapping used in GE has low locality leading to low performance of standard mutation operators. The study at hand is an example of how basic design principles o…

Theoretical computer sciencebusiness.industryComputer scienceLocalityParse treeGenetic programmingcomputer.software_genreComputingMethodologies_ARTIFICIALINTELLIGENCEGrammatical evolutionLocal search (optimization)Edit distanceArtificial intelligenceHeuristicsbusinesscomputerNatural language processingSemantic gap

researchProduct

LeSSS: Learned Shared Semantic Spaces for Relating Multi-Modal Representations of 3D Shapes

2015

In this paper, we propose a new method for structuring multi-modal representations of shapes according to semantic relations. We learn a metric that links semantically similar objects represented in different modalities. First, 3D-shapes are associated with textual labels by learning how textual attributes are related to the observed geometry. Correlations between similar labels are captured by simultaneously embedding labels and shape descriptors into a common latent space in which an inner product corresponds to similarity. The mapping is learned robustly by optimizing a rank-based loss function under a sparseness prior for the spectrum of the matrix of all classifiers. Second, we extend …

Theoretical computer sciencebusiness.industryComputer scienceRank (computer programming)Cognitive neuroscience of visual object recognitioncomputer.software_genreComputer Graphics and Computer-Aided DesignProduct (mathematics)Similarity (psychology)Line (geometry)Metric (mathematics)Collaborative filteringEmbeddingArtificial intelligencebusinesscomputerNatural language processingComputer Graphics Forum

researchProduct

Tally languages accepted by Monte Carlo pushdown automata

1997

Rather often difficult (and sometimes even undecidable) problems become easily decidable for tally languages, i.e. for languages in a single-letter alphabet. For instance, the class of languages recognizable by 1-way nondeterministic pushdown automata equals the class of the context-free languages, but the class of the tally languages recognizable by 1-way nondeterministic pushdown automata, contains only regular languages [LP81]. We prove that languages over one-letter alphabet accepted by randomized one-way 1-tape Monte Carlo pushdown automata are regular. However Monte Carlo pushdown automata can be much more concise than deterministic 1-way finite state automata.

TheoryofComputation_COMPUTATIONBYABSTRACTDEVICESNested wordTheoretical computer scienceComputational complexity theoryComputer scienceDeterministic pushdown automatonTuring machinesymbols.namesakeRegular languageComputer Science::Logic in Computer ScienceQuantum finite automataNondeterministic finite automatonDiscrete mathematicsFinite-state machineDeterministic context-free languageComputabilityDeterministic context-free grammarContext-free languagePushdown automatonAbstract family of languagesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Cone (formal languages)Embedded pushdown automatonUndecidable problemNondeterministic algorithmTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESDeterministic finite automatonsymbolsComputer Science::Programming LanguagesAlphabetComputer Science::Formal Languages and Automata Theory

researchProduct

Automata and forbidden words

1998

Abstract Let L ( M ) be the (factorial) language avoiding a given anti-factorial language M . We design an automaton accepting L ( M ) and built from the language M . The construction is effective if M is finite. If M is the set of minimal forbidden words of a single word ν, the automaton turns out to be the factor automaton of ν (the minimal automaton accepting the set of factors of ν). We also give an algorithm that builds the trie of M from the factor automaton of a single word. It yields a nontrivial upper bound on the number of minimal forbidden words of a word.

TheoryofComputation_COMPUTATIONBYABSTRACTDEVICES[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Büchi automaton0102 computer and information sciences02 engineering and technologyω-automaton01 natural sciencesTheoretical Computer ScienceCombinatoricsDeterministic automaton0202 electrical engineering electronic engineering information engineeringTwo-way deterministic finite automatonNondeterministic finite automatonMathematicsPowerset constructionLevenshtein automaton020206 networking & telecommunicationsComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Nonlinear Sciences::Cellular Automata and Lattice GasesComputer Science ApplicationsTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES010201 computation theory & mathematicsSignal ProcessingProbabilistic automatonComputer Science::Programming LanguagesComputer Science::Formal Languages and Automata TheoryInformation Systems

researchProduct