Mind the gap
The graph above shows a comparison between the lexis used in learner abstracts from postgraduate students taking the Advanced Thesis Writing Course at the Eastern Mediterranean University and the lexis used in completed abstracts published by learners studying in English medium countries. The ‘target corpus’ - was reduced for the purposes of this exercise to the size of the ‘learner’ corpus.
Note from the graph the differential between the target corpus, showing the active use of 5000+ words and our learners studying at the Eastern Mediterranean University, whose active vocabulary is 3000+ words.
Of course, as course designers and teachers, we might just decide to use a target corpus, and design a teaching / learning programme from the information it provides us. However a learner corpus provides not only the instructors but the learners themselves with data about the distance they have to travel to ‘compete in the academic market-place’.
Furthermore, comparative corpora study, not only the extent of the gap but yields detailed information about the composition of the gap. What are the ones that got away? Examination of the corpora shows us which word families in the Target Corpus are either rarely, or even not used at all by learners. Examination of the Word families also shows which forms of words may be under-used by our learners in comparison with their peers in English medium countries.
It doesn’t take too much imagination to see that a lexical syllabus is taking shape before us, and that such information can be analysed in further detail according to such criteria as functions and moves.
It’s further obvious that the basic principles in operation here can be exploited in any kind of language learning scenario, and involve comparison of specified learner corpora with any type of specified target corpora, offering the breathtakingly simple equation of:
What you need to know minus What you already know equals What you need to learn.
Furthermore, Vocabulary Profiling Tools have the potential to enable learners to do much of this work themselves.
An ongoing Lexitronics project is now encouraging learners as young as the age of 12 to store all their written work in a specified file for editing, analysis, feedback, revision and ongoing profiling. We’ve had the I-Pod, I-tunes, and much more. But here comes the I-corpus, the ultimate configuration of the ‘learner as researcher and language detective’, and a concept that recognises that within the statistical utopias of word frequency drift the million and one I-lexicons that make up the totality, but which also reflect the unique learning histories, interests and needs of individual learners.
It’s a project then that is taking us beyond the more traditional territory of analysing corpora for receptive purposes, and into a perhaps more complex still area where we try to unravel the learning continuum that again we all know exists between learning words for receptive purposes and learning them for productive purposes.
Watch this space for more on the ‘Mind the Gap’ project!


