I wrote this paper with Yongning Zhang for a course I am taking called: “Advanced Topics in Human-Computer Interaction: Experimental Methods in HCIâ€. I would like to thank Dr. Edward Lank for his help.
Do language-checking tools improve the document quality of non-native speakers?
ABSTRACT
Text editors help users to create and share documents. We choose specifically on one of their most popular feature, language-checking tools. We focus on non-native speakers of the English language and wonder if language-checking tools improve the quality of their documents. We present a quantitative study which finds the effects of language-checking tools on documents. We also conducted a qualitative study to find how non-native speakers use language-checking tools. Results show that language-checking tools do not significantly improve the quality of your documents. However, users still trust them.
INTRODUCTION
In the past decades, language-checking tools have been integrated into different text editors and have been widely adopted by computer users. Recently, with the help of Web 2.0 technology, several web-based editors are able to provide spelling and grammar checking in real time. Users do not need to install any language-checking tools on their computers.
Most of the language-checking tools are designed to scan over the text to find spelling and grammar mistakes, such as: fragments, run-on sentences, subject-verb disagreement, passive voice, double words, and split infinitives. Such mistakes are flagged out with colored wavy lines, highlighted background or underlines, in order to attract the user’s attention. Additionally, suggestions for every mistake are provided for users. The entire process tries to help user’s improve the quality of their documents and release them from the hand-checking work.
Unfortunately, two common phenomenon’s severely affect the performance of current language-checking tools, namely, false negatives and false positives. False negatives are those true errors that the language-checking tools fails to detect [2]. False positives represent the problems that the language-checking tools detect that are not errors [2].
Both false negatives and false positives are non-trivial troubles for language-checking tools. For example, Kies [3] discovered that the language-checking tools can only identify six of twenty most common grammar mistakes [1]. Therefore, false negatives might cause users to ignore mistakes that could be obviously identified by hand-checking. As a result, the document quality is low. On the other hand, false positives can be also a problem. Not only because perfectly acceptable words or passages are flagged erroneously, but also the imperfect suggestions provided by the language-checking tools. Erroneous suggestions would either distort the true meaning of the sentence or would create new mistakes [2].
In order to fully utilize the language-checking tools, a user needs expertise in verbal skills to address the false negatives and false positives generated by such tools. But the question is: what would happen if a user dose not masters the language well, e.g., non-native speakers?
In our study, we conduct a replication study of Galleta et al. to study whether/how language-checking tools can help to improve the documents quality of non-native speakers. It consists of a quantitative study that evaluates the performance of language-checking tools used by non-native speakers. Also, we conducted a qualitative study that reveals the credibility of such tools. Our results show that although language-checking tools are trusted by most of non-native speakers, they cannot make significant improvement in documents quality.
RELATED WORK
At first, it is tempting to assume that language-checking tools improve document quality. Language-checking tools automatically correct spelling and grammar errors.
However, Galletta et al. showed that the presence of language-checking tools in a text-processing system decreases the quality of a document. Galleta et al. asked their participants to edit a business letter using Microsoft Word 2003 [2]. The language-checking tools were on for half the subjects, and turned off for the other half [2]. Task performance was based on three types of errors: correctly identified errors, false negatives, and false positives.
Nonetheless, there are limitations with this study. First, Galletta et al. experiment has too much internal validity. Second, the test scores cannot accurately reflect a participant’s verbal ability. Third, a participant’s verbal ability greatly affects the result. Lastly, Galleta et al. does not take into consideration non-native speakers.
Very little work has been done on tools that can aid non-native speakers to detect errors that language-checking tools cannot detect. However, Park et al. researched on
collocation errors, which are used expressions, idioms and word pairings of a language. To assist non-native speakers with these parts of the English language, they developed AwkChecker [6]. It is the first end-user tool that automatically flags collocation errors and suggests replacement expressions.
AwkChecker suggest corrections for four common types of collocation errors: insertion, deletion, substitution, and transposition errors. Insertion errors insert a word in a phrase. Deletion errors occur when a word is deleted from a phrase. Substitution errors happen when a non-preferred word is used in place of a more commonly used word. Lastly, Transposition errors occur when two words are swapped [6].
To the best of our knowledge, no study has considered if language-checking tools improve the document quality of non-native speakers.
STUDY
To understand if language–checking tools improve document quality for non-native speakers, we conducted a replication study based on Galletta’s experiment.
Participants
We recruited eight participants using snowball sampling and e-mail. All participants use English as their second language.
Experiments Setup
We provided participants with two essays (denoted as essay A and essay B). Both essays have the same style and were written by the same author. Each participant was asked to edit one essay with the language-checking tools and another without the tools. The order of both essays was the same for all participants. For example, all participants first edited essay A. However, we did switch the usage of language-checking tools. More specifically, all odd numbered participants (P1, P3, P5 and P7) were asked to edit essay A with language-checking tools and essay B without tools. All even numbered participants were asked to edit essay B without tools and essay A with tools. Consequently, we minimized the confounding variables between both essays.
Research Hypothesis
Our research hypothesis was that the essays with language-checking tools on did not improve the quality of the essays.
Results
We evaluated sixteen essays that were modified by non-native speakers. We measured each participant’s time and evaluated their essays. The score and time of each participant is listed in table 1.
| Participants | Language tools | Time | ||
| Scores | Minutes | |||
| Without tools | With tools | Without tools | With tools | |
| P1 | 5.5 | 6.5 | 15 | 7 |
| P2 | 5.5 | 7.5 | 5 | 7 |
| P3 | 7.5 | 7 | 12 | 8 |
| P4 | 5.5 | 9.5 | 9 | 12 |
| P5 | 8.5 | 9.5 | 25 | 20 |
| P6 | 6.5 | 6.5 | 22 | 14 |
| P7 | 6.5 | 6.5 | 24 | 15 |
| P8 | 8.5 | 8.5 | 16 | 14 |
Table 1: Participants score and time
To evaluate the performance of our participants our grade criteria was based on:
- The readability of the essay
- Essay structure and content
- Spelling and grammar errors that have been correctly detected and modified.
- Spelling and grammar errors that have been correctly detected but not properly modified.
- False positives
- False negatives
- Four common types of collocation errors.
Each essay was graded with a numeric score from 1.0 to 10.0, being 10.0 the highest score possible.
Before analyzing our data, we calculated the presence of statistical outliers in our sample, which is listed in table 2.
| Language tools | Time | |||
| Without tools | With tools | Without tools | With tools | |
| Highest Value | 10.59 | 11.61 | 37.75 | 25.86 |
| Lowest Value | 2.90 | 3.76 | -5.75 | -1.61 |
Table 2: Calculating the presence of statistical outliers
Based on these results, we can confirm that there are not outliers in our sample.
To analyze our data based on the scores, we decided to use a Paired Difference t-Test and performed the following steps:
Step1: State the Hypothesis
We choose a two-tailed test using these hypotheses:
H0: Language-checking tools do not improve document quality.
H1: Language-checking tools do improve document quality.
Step2: Specify the Decision Rule
Our test statistic will follow a t distribution with d.f. = n – 1 = 8 – 1 = 7. With α = 0.05 the two-tail critical value is t.025 = +- 2.365
The decision rule is:
Reject H0 if tcalc < – 2.365 or if tcalc > + 2.365
or
Reject H0 if p-value < α
Otherwise accept H0
Step3: Calculate the Test Statistic
We used Microsoft Excel 2007 to calculate the test statistic. The results are listed in Table 3.
| Without tools | With tools | |
| Mean | 6.76 | 7.68 |
| Variance | 1.64 | 1.70 |
| Df | 1.64 | |
| Test Statistic | -1.79 | |
| T Critical two-tail | +- 2.36 | |
| P(T<=t) two-tail | 0.1151 | |
Table 3: Calculating the test statistic
Step4: Make the decision
Since tcalc = -1.79 is more than the critical value (-2.36),based on a 5% level of significance, we do not reject H0. Besides, since p-value = 0.1151 is more than α = 0.05, we do not reject H0. We say that the results are statistically insignificant at a 5% level. On the other hand, we found that there is a positive correlation between the scores and time spent editing the essays, as shown in figure 1 and 2.
Figure 1: Correlation between time and scores not
using language-checking tools.
Figure 2: Correlation between time and scores using
language-checking tools.
Discussion
Based on our statistical analysis in the previous section, we found that language-checking tools do not significantly improve the document quality of non-native speakers. This result is slightly different from Galleta’s study [2]. They found that native speakers with strong language abilities make more mistakes when they use language-checking tools. When language-checking tools are used, users tend to ignore unflagged errors, resulting in more false negatives. However in our study, all participants were non-native speakers. Due to their limited language ability, participants made relatively a large number of false negatives without using the language-checking tools. Therefore, the number of false negatives does not necessarily increase when language tools are used. In such case, we can conclude that for non-native speakers, language-checking tools neither improve nor harm the document quality.
As shown in figures 1 and 2, participants finished their tasks faster with the language checking tools. Both figures indicate that language-checking tools do improve the task time. It may also suggest a “lazy†behavior of the users: they tend to skip the parts that do not have flagged errors.
Considering both quality and learning issues, there have been several suggestions made on how to improve language-checking tools targeted to non-native speakers.
Knutsson et al provided a series of guidelines for the design of language tools for non-native speakers [5,6]. They claimed that a good language tool should enable users to understand its capacity and limitation, and help users to learn. From the previous section, we know that language checking tools cannot correct false negatives. If users only rely on language-checking tools to check and edit their essays, they may also ignore false negatives. As a result, users are still unaware that false negatives in their essays are not correct. Therefore, they will neither improve their language ability, nor learn the limitation of the tools.
We suggest that one better way of using language tools is not to rely on them too much. We can utilize language- checking tools to eliminate simple spelling and grammar mistakes. However, additional checking and modification by hand is absolutely necessary.
There are some limitations in our study too. We are not clear about the language abilities of our participants. However, we asked each participant to edit two essays and we compared them. This minimized the effect of language ability in our result. Another limitation is that our study has a confounding variable regarding the grades of the essays. They were graded by one of the authors, who is a non-native speaker. To address this confounding variable the grader did not know which essays were modified with the language-checking tools on or off. Additionally, the grader has over twenty years of experience in the English language.
Interview Questions
In order to obtain deeper insight on the how non-native speakers think of language-checking tools, we conducted a semi-structured interview after the essay editing task. The questions used are as follows:
- Do you use language-checking tools? [yes/no] How often do you use them?
- Where do you usually apply language-checking tools, on emails, essays or IM messages?
- Do you trust language-checking tools?
- What do you think of language-checking tools?
- Do you do extra checking or modifications after you apply language-checking tools?
- Do you have any suggestions for further improvement on language-checking tools?
Frequency of Usage
As expected, all participants use language-checking tools very frequently. This is due to the high availability, good level of integration, ease of use, and the real-time feature of current language-checking tools. Most of the participants expressed that they would use language-checking tools as long as those tools exist in the editors they are using. Most of them gave positive comments on the accessibility of those tools. As quoted from P2:
“I almost use it on everything I write. I do not need to do copy and paste anyway. Today everything has a checker: Word has it, Hotmail has it, Gmail has it, oh, and Latex. So I just leave it open and it will do everything for me. All I need to do is my writing, and those things do not bother at allâ€
Only P4 expressed that he did not always use language-checking tools:
“I only use them on papers and important emails. You do not really want to control C and control V all your writing in word, too much troubleâ€.
Trust
Surprisingly, all participants expressed different levels of trust in language-checking tools. P6 was the only person who showed complete trust in language-checking tools. Most of them considered language-checking tools as something “trustable, but cannot totally rely on itâ€. Others said that they trusted checkers a lot, but they still would double check their writing. For example, as quoted from P1:
“Yes, I did it on almost every paper I wrote, as well as emails for serious purpose.â€
P4 also expressed the same idea:
“I almost do hand checking every time.â€
Some participants mentioned the reason why they do not show complete trust.
P5 expressed the limitation of current tools:
“It is a good tool to improve regular words and basic grammatical structures. It is not always correct and does not catch up certain words within the contextâ€
Many participants expressed their concerns about grammar checkers. They thought spelling checkers are doing better than grammar checkers. For example, as quoted from P3:
“Mostly, I think spelling check is very useful, grammar checkers do not do well, and they have problems with special words, for example, they always flag my name.â€
P4 also expressed a similar opinion:
“I absolutely trust the spelling checkers. I trust the grammar checkers when they deal with singular/plural mistakes, but I think the grammar checkers can only address certain mistakes.â€
Suggestions
We also asked our participants to provide some suggestions on how to improve current language-checking tools. Most suggestions are regarding to the algorithm for scanning and checking, e.g., improving accuracy, enhancing algorithm to deal with special word etc. P3 provided an interesting suggestion from a non-native speaker prospective:
“It would be nice if they could provide the translations of the suggestions. Sometimes, I do not understand the words listed in the suggestions. If they could give translations, I can easier select the correct answer.â€
CONCLUSIONS
In this paper, we first conducted a quantitative study to reveal the effect of language-checking tools on the document quality of non-native speakers. Our results showed that such tools may result in users only focusing on the errors flagged by the language-checking tools. We discovered that language checking tools did not significantly improve the quality of our participant’s documents. Moreover, compared with Galletas’ study, we found that language-checking tools did not diminish the performance of our participants.
On the other hand, a qualitative study was performed, in order to obtain a deeper insight on the how non-native speakers use language-checking tools. We found that although language-checking tools are not perfectly designed, users highly trust them.
REFERENCES
1. Connors, R.J. and Lunsford, A.A. Frequency of Formal Errors in Current College Writing, or Ma and Pa Kettle Do Research. The St. Martin’s Guide to Teaching Writing, 2nd ed. Robert Connors and Cheryl Glenn, Eds. St.Martin’s, New York, NY, 1992.
2. Galetta,D.F., Durcikova,A., Everard, A., and Jones, B.ML. Does Spelling-checking Software Need a Warning Label? Communications of the ACM. Volume 48 , Issue 7. July 2005
3. Kies, D. Evaluating grammar checkers in modern English grammar. Available at: http://papyr.com/hypertextbooks/grammar/gramchek.htm.
4. Knutsson, O., Pargman, T., and Eklundh, K. Transforming grammar checking technology into a learning environment for second language writing. Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing-Volume 2 (2003), 38–45.
5. Knuttson, O., Pargman, T., Eklundh, K., and Westlund, S. Designing and developing a language environment for second language writers. Computers & Education 49, 4 (2007), 1122–1146.
6. Park, Taehyun., Lank, Edward., Poupart, Pascal., and Terry, Michael. Is the sky pure today? AwkChecker: an assistive tool for detecting and correcting collocation errors. UIST 2008: 121-130