6533b7dcfe1ef96bd12733e6

RESEARCH PRODUCT

Is there a formula for formulaic language?

Richard S. ForsythŁUkasz Grabowski

subject

Register (sociolinguistics)Measure (data warehouse)Index (publishing)PhraseologyText typesVariety (linguistics)Productivity (linguistics)LinguisticsMathematicsTerm (time)

description

AbstractThis paper focuses on detecting and measuring traces of "formulaic language". For this purpose, we test a number of computational formulae that quantify the degree to which a text type incorporates inflexible sequences of words. We assess these candidate indices using a number of reference corpora representing a wide variety of text types, both routine and creative. We adopt the concept of "phrase-frame" proposed by Fletcher (2002–2007) as a means of exploring phraseological pattern variability. To date, there have been few studies explicitly addressing this issue, with the exception of Roemer (2010). We examine ten productivity indices, including Roemer's VPR, the Herfindahl-Hirschman index, Simpson's diversity index and relative Shannon entropy. We report that a novel measure, which we term Hapaxity, best meets our criteria, and show how this index of micro-productivity (in phrase-frames) may be used to assess macro-productivity (in text registers), thus quantifying an important aspect of a register’s reliance on formulaic subsequences.

https://doi.org/10.1515/psicl-2015-0019