Rule based machine translation of spreadsheet formulas to natural language expressions
Abstract
Errors in spreadsheets are a pervasive problem both in business and other real-life settings. Most errors in spreadsheets are formula based. Spreadsheet formulas are normally specified using alphanumeric cell addresses and can only be understood if one associates referenced cells to their labels. This increases mental effort (cognitive load) in a person comprehending a spreadsheet formula and hence makes spread-sheet formula comprehension to be error prone. Translating traditional spreadsheet formulas to structured higher level problem domain oriented forms based on labels of referenced cells in a formula is one way that has been proposed to ease formula comprehension. This research work, however, provides a technique to automatically translate traditional spreadsheet formulas to natural language expressions in English using rule based machine translation. Rule-based machine translation is knowledge based machine translation that retrieves rules from bilingual dictionaries to translate source language expressions to target language expressions. This research work has three key contributions. First, an algorithm was designed for automatically translating traditional spreadsheet formulas into natural language expressions based on devised translation rules. Second, a prototype software tool, that implemented the designed algorithm for translating traditional spreadsheet formulas into natural language expressions, was developed. Lastly, through a user study, the utility of having spreadsheet formulas in natural language expressions with respect to spreadsheet debugging, was demonstrated. In the study, it was found that natural language representation of formulas results in a non-statistically significant improvement in debugging performance in terms of percentage of errors found, with Z = -1.414, p = 0.157. However, despite this being the case, it was also found that natural language representation of formulas results in statistically significant improvement in debugging performance in terms of the speed in detecting each error as spreadsheet users took significantly less time in detecting each error than without translations, Z = -2.521, p = 0.012. The mean time for locating each error with translations was 97.1 seconds per error while without translations was 201.7 seconds per error.