Project Description

Definition of Electronic Learner Database

If I say learner database research is a new research field in ESL/EFL, many of you whose research field is related to SLA may immediately wonder what's so new? After all, people interested in second language acquisition and foreign language learning have been collecting learner data ever since the discipline emerged. What I mean by 'a new' research field is we can find some 'added value' from learner database.

Then, what is the added value of learner database? First of all, I present a brief definition of electronic learner database by Granger.

Computer learner corpora are electronic collections of authentic FL/SL textual data assembled according to explicit design criteria for a particular SLA/FLT purpose. They are encoded in a standardized and homogeneous way and documented as to their origin and provenance. (Granger 1998)

The first important characteristic of learner database of this research is 'authenticity'. Sinclair (1996) describes the default value for learner database as being 'authentic'.

All the material is gathered from the genuine communications of people going about their normal business unlike data gathered in experimental conditions or in artificial conditions of various kinds. (Sinclair 1996)

So far SLA research has tended to favor experimental data and neglect natural language use data. Applied to the foreign/ second language field, this means that purely experimental data resulting from elicitation techniques do not qualify as learner database data do. However, the concept of authenticity for learner database should be distinguished from that of native speaker database in a sense that the second or foreign language teaching context usually involves 'artificiality' to some degree and that learner data is therefore rarely fully natural. Free compositions, for instance, are 'natural' in a sense that they represent 'free writing': learners are free to write what they like rather than having to produce items the researcher is interested in. Therefore, if we specify specific learner and task variables of the learner data, I think it is worth while to have learner database as a good source of 'natural' learner output.

The second characteristic of learner database of this research is 'electronic format' of the data.

Electronic Format Computerization of the data liberates language analysts from 'drudgery' (Rundell & Stock 1992)

The fact that learner database is in a machine-readable form means that it is now possible to quantify learner language - morphemes, lexis, grammar, discourse - and although this quantitative aspect is not the be-all and end all of learner database research, it is certainly an important aspect of learner database research. 

Methodology of the Electronic Learner Database Research

1) Contrastive Interlanguage Analysis (CIA)

a. L1 vs. L2 - target of this project

Highlight the differences between English and Korean not only in terms of misuse but also in terms of over- and under-use of words and structures

b. L2 vs. L2 - future target of this project

Comparison between different learner groups: different in terms of age, gender, task, proficiency level, etc. If many learners and task variables are coded, it is possible to assess the impact of these variables on learner output.

2) Computer-aided Error Analysis (CEA)

Once a very popular enterprise, error analysis (EA) is now out of favor with most SLA/FLT circles. It has gone down in history as a fuzzy, unscientific and unreliable way of approaching learner language. Still, I am doing manual correction of my data, but computer-aided error analysis such as use of error tagging system and retrieval of lists of specific error types and error statistics aims to address some of the methodological weakness of previous EA studies.

Pedagogical Implication

This research aim at the improvement of pedagogical materials such as grammars. However, it has some potential to improve other pedagogical materials such as dictionaries and textbooks with in-depth analysis. There are already some concrete electronic learner database-based applications. They are mainly web-based applications. Look at the websites on the links.

Example: Concordance Data Analysis and Error Sentences

1) Word count: 56,964

2) Number of document (text) files: 94

3) Subjects: ESL service course students who speak Korean as their L1 in level 114, 115, 400, 401, 402 at the University of Illinois at Urbana-Champaign from 2000 to 2002.

4) Two distinctive error patterns in Quantifier use

a. Quantifier + singular/ plural count noun(s)

eg. all student/ each people/ few money

b. Quantifier + of + noun(s) - Target !

           eg. I bought apples and oranges. Most of apples were delicious, but some of oranges were rotten.

5) Quantifier Word List & Number of Errors Related

Total # of Tokens
Error # of Target
Ratio (%)
All
102
5
4.90
Both
38
1
2.63
Each
43
1
2.33
Few
25
1
4.00
Many
160
5
3.13
Most
108
13
12.04
Much
99
1
1.01
One
131
12
9.16
Some
197
10
5.07

 

6) Hypothesis based on the concordance error analysis

In English

Quantifier expression + Plural nouns to refer to generic sense of a noun Quantifier expression + of the Nouns to refer to a specified noun

In Korean

ESL students' writing error patterns for the quantifier expressions:

Quantifier expressions + of + Nouns for either a general noun or a specified noun

Hypothesis: This is an example of transfer error.

For a generic reference, insertion of 'of' transferring 'uy', a genitive marker meaning 'out of' in Korean.

For a specified reference, deletion of 'the' which is not always necessary in Korean.

Example:

Most of Americans like McDonald burgers.

Tebubun-uy mikookin-tul-un McDonald Burger-lul coaha-n-ta. (Korean)

Most- Gen American-Plur-Subj McDonald burger-Acc like-Pres-Dec Correct

Most Americans like McDonald burgers

 

Go to Top