6533b7d4fe1ef96bd12627fa
RESEARCH PRODUCT
Readability and the Web
Ludger MartinThomas Gottronsubject
060201 languages & linguisticsMeasure (data warehouse)Information retrievalcontent extractionlcsh:T58.5-58.64Relation (database)lcsh:Information technologyComputer Networks and CommunicationsComputer sciencebusiness.industryweb document readability; content extraction; corpus statistics06 humanities and the arts02 engineering and technologycorpus statisticsReadabilityWorld Wide Webweb document readability0602 languages and literatureContent extractionComputingMethodologies_DOCUMENTANDTEXTPROCESSING0202 electrical engineering electronic engineering information engineeringWeb application020201 artificial intelligence & image processingBias correctionbusinessdescription
Readability indices measure how easy or difficult it is to read and comprehend a text. In this paper we look at the relation between readability indices and web documents from two different perspectives. On the one hand we analyse how to reliably measure the readability of web documents by applying content extraction techniques and incorporating a bias correction. On the other hand we investigate how web based corpus statistics can be used to measure readability in a novel and language independent way.
year | journal | country | edition | language |
---|---|---|---|---|
2012-03-12 | Future Internet |