Search results for "text corpus"

showing 4 items of 14 documents

The computation of word associations

2002

It is shown that basic language processes such as the production of free word associations and the generation of synonyms can be simulated using statistical models that analyze the distribution of words in large text corpora. According to the law of association by contiguity, the acquisition of word associations can be explained by Hebbian learning. The free word associations as produced by subjects on presentation of single stimulus words can thus be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. The reason is that synony…

Text corpusSyntagmatic analysisbusiness.industryComputer scienceSynonymSpeech recognitionStatistical modelcomputer.software_genreProduction (computer science)Artificial intelligencebusinessAssociation (psychology)computerNatural language processingWord (computer architecture)Proceedings of the 19th international conference on Computational linguistics -

researchProduct

Revisiting corpus creation and analysis tools for translation tasks

2016

Many translation scholars have proposed the use of corpora to allow professional translators to produce high quality texts which read like originals. Yet, the diffusion of this methodology has been modest, one reason being the fact that software for corpora analyses have been developed with the linguist in mind, which means that they are generally complex and cumbersome, offering many advanced features, but lacking the level of usability and the specific features that meet translators’ needs. To overcome this shortcoming, we have developed TranslatorBank, a free corpus creation and analysis tool designed for translation tasks. TranslatorBank supports the creation of specialized monolingual …

Text corpusTranslationProfessionalizationTraducciónLinguistics and LanguageLiterature and Literary TheoryComputer sciencetranslationCorpus toolsMonolingual corpuscomputer.software_genreProfesionalizaciónLanguage and LinguisticsTerminologyDomain (software engineering)Example-based machine translationCorpus linguisticsmonolingual corpusprofessionalizationcorpus toolsConcordancerCorpus monolingüeTerminology extractionbusiness.industrylcsh:Translating and interpretingUsabilitylcsh:P306-310Herramientas de corpusArtificial intelligencebusinesscomputerNatural language processingCadernos de Tradução

researchProduct

Discovering the Senses of an Ambiguous Word by Clustering its Local Contexts

2005

As has been shown recently, it is possible to automatically discover the senses of an ambiguous word by statistically analyzing its contextual behavior in a large text corpus. However, this kind of research is still at an early stage. The results need to be improved and there is considerable disagreement on methodological issues. For example, although most researchers use clustering approaches for word sense induction, it is not clear what statistical features the clustering should be based on. Whereas so far most researchers cluster global co-occurrence vectors that reflect the overall behavior of a word in a corpus, in this paper we argue that it is more appropriate to use local context v…

Text corpusbusiness.industryComputer scienceContext (language use)computer.software_genreWord senseWord-sense inductionArtificial intelligencebusinessCluster analysiscomputerNatural language processingWord (computer architecture)Strengths and weaknesses

researchProduct

Ciència ciutadana contra prejudicis lingüístics: el projecte Milmots.eu

2017

La ciència ciutadana és un bon mecanisme per a fer recerca en l’àmbit de la demolingüística, ja que proposa la participació i l’arreplegada de dades del gran públic al mateix temps que genera una cultura científica en la societat. El projecte Milmots és una eina interactiva que neix amb l’objectiu de difondre, de conèixer i de compartir el nostre lèxic, les nostres paraules, associades als nostres pobles i les nostres comarques. L’elevat volum de mots i locucions recollit ha proporcionat un material idoni per a realitzar un estudi sociolingüístic que posa de manifest que les variables socials d’edat, àrea dialectal, nombre d’habitants de la població tenen una influència significativa sobre …

normatividad léxicademolinguisticstext corpusCiència ciutadana; demolingüística; normativitatcitizen scienceciencia ciudadana; demolingüística; corpus lingüístico; normatividad léxicanormativitatciencia ciudadanalexical normativityCiència ciutadanacorpus lingüísticodemolingüísticacitizen science; demolinguistics; text corpus; lexical normativityLlengua, societat i comunicació

researchProduct