Hacking Chinese

A better way of learning Mandarin

Mapping the terra incognita of Chinese vocabulary

Terra incognita - Plugging gaps in your vocabularyHow do you fill gaps in your Chinese vocabulary? How do you know which words you don’t know so that you can learn them?

The terra incognita of vocabulary

Learning Chinese is like exploring a foreign and exotic land. If you’ve just started, the explored territory is very small, but the more you study and the larger area you have covered, the likelier it is that you have missed important things that are actually quite basic.

Just to give you an example, I had studied Chinese for three years before I learnt how to say “light bulb” (燈泡). I just hadn’t encountered it or needed to use it until then, but it was a bit embarrassing not to know such a basic word at an advanced level.

The illusion of advanced learning

It’s easy to fool yourself and believe that because you can read a text at a certain level, it means you master everything below that, which is never the case, not even for native speakers. Your map might reach far and wide, but there will always be hidden caves and valleys that you haven’t visited, sometimes really close to home.

In other words, dropping the metaphors for a while, you might know how to say “claustrophobia”, “recession” and “promulgate” in Chinese, but if you haven’t been exposed to enormous amounts of Chinese, it’s very likely that there are fairly easy words that you don’t know (“light bulb” for instance). The problem is that you don’t know that you don’t know them.

Finding and mapping unknown terrain

There are basically two ways to improve your map:

  • Expose yourself to huge volumes of Chinese
  • Use frequency dictionaries and compiled word lists

The first one is what native speakers do and what you should do as much as you can. However, saying that you should simply listen and read more Chinese is not very helpful (and it has already been said many times on Hacking Chinese), so in this case we’re going to focus on the second method.

Using word lists to plug gaps in vocabulary knowledge

I have touched upon this subject before when talking about using more than one textbook to diversify vocabulary, but using frequency lists is taking the same thinking one step further. I have also written about memorising dictionaries, which is related, too.

It’s fairly straightforward: use a frequency list or a list targeted at language learners (such as those for HSK or TOCFL) and go through them to see what words you don’t know. Make sure you choose lists below your current level (more about this soon).

Going through a list picking out words you don’t know doesn’t take that long. If you already have most of your words in a proper spaced repetition program, you should be able to automatically remove all the words you already know. Then manually go through the list of words not in your list and remove any words you already know or don’t actually need.

Even if you don’t have a list of the words you already know, it doesn’t take that long to go through a list even if it’s a thousand words long, at least if it’s significantly below your current level. But what do I mean by that?

Learn words below your expected level

Note that I’m talking about learning words within the limits of the map you already have. I do not suggest that you use only word lists to expand your vocabulary in general. In other words, if you think that you are on an intermediate level, use this method to learn beginner-level words. If you’re advanced, don’t use this method to learn advanced words, but anything below that is cool. If you finished textbook one in a series, use this method to learn words from the first book in other series.

Some practical aspects and an example

I use Anki and some time ago I had around 15,000 cards in my Chinese deck. This means that I should know a significant amount of words, but as we shall see, there were many I didn’t know when compared to HSK and TOCFL lists. I imported these lists to Anki, which of course rejected words already in my list.

This is what I did and how it turned out.

  1. Adding the beginner words (800) gave me two words I didn’t know
  2. Adding the basic words (+1600) gave me roughly a dozen new words
  3. Adding the intermediate words (+3400) gave me a couple of hundred words
  4. Adding the advanced words (+2800) gave me over one thousand new words

Naturally, you should stop at a decent level. Adding a thousand new words is not a good idea, it clearly qualifies as not being significantly below your current level. The point here is not to cram in more words, but to note that there were words on the easier levels I didn’t know. Not all of these were words I truly didn’t know; some of them just weren’t in my deck, but a around half were words I actually didn’t know.

Creating a more complete vocabulary map

My case is a good example of a map that is very spread out and has lots of blank areas. Considering that my deck consisted of more than twice the number of words than the complete list of vocabulary for the test, it goes without saying that I know a lot more words than required for the advanced level. However, there were still around a hundred words on the intermediate list that I didn’t know!

What does that mean? It means that there were a hundred words someone responsible for preparing the word lists thought important, but that I hadn’t learnt yet. Regardless if I’m preparing for that test or if I’m just thirsty for knowledge is irrelevant, adding these words is truly useful. Regarding the words I didn’t know in the beginner, basic and intermediate lists, let me just say that there were some words I was amazed that I actually didn’t know how to say in Chinese!

Learn advanced words in context

When learning words that are very common, I don’t think that spending time to find good example sentences is necessary, especially when we’re talking about nouns and verbs that are fairly straightforward to use. You will pick up how to use common words simply by listening and reading if you do it enough. Combine this with speaking and writing practise and you’ll be fine.

This is not true on an advanced level, though, because you’ll be much less likely to encounter the words you learn often. Thus, I strongly suggest that you learn any advanced words in contexts and with at least one clear example of how to use it. If you doubt the validity or correctness of your sentence, ask a friend or use Lang-8 (you can post several sentences at once and ask people questions about them).

Conclusion

Whether for test preparation or simply to enhance your vocabulary, using frequency lists can be really useful. It might be incredibly hard to find these words, and if you don’t, it can be embarrassing/bad/catastrophic depending on the situation. The outcome depends on why you want to learn Chinese in the first place, but I think that we can all agree that learning words that are actually below your average level is desirable regardless of how far you’ve come in your studies!




Tips and tricks for how to learn Chinese directly in your inbox

I've been learning and teaching Chinese for more than 15 years. My goal is to help you avoid making the mistakes I made and those I have seen other students make. Sign up for my weekly newsletter and get a 7-day crash course on how to learn Chinese!

Please wait...

Please check your inbox and confirm your email address!

27 comments

  1. Alan says:

    Olle,

    You’re totally on point as usual.

    I think everyone has very unique strengths and weaknesses in their vocabulary. This is a result of their background and own preferences. So someone might be awesome at chengyu but really bad at casual talk.

    That’s why I really think personalized language learning is going to be wave of the future.

  2. Olle Linge says:

    @Alan: I think it’s a balancing cat. Yes, it’s absolutely essential that we are motivated and learn what we want to learn (to each his own), so in this way language learning has to be personalised. On the other hand, focusing only our strengths while overlooking weaknesses isn’t good either, especially if we’re serious about learning Chinese (or anything else). I think people have a tendency to practise what they’re already good at while failing to practise what we actually need to practise the most.

  3. Chris says:

    Hey Olle,

    You are referring to the TOP word lists, is that right? Is there a shared ANKI deck of these, or how did manage to import them?

    Also, do you add example sentences to each flashcard?

    Site is looking great!

    Chris

    1. Olle Linge says:

      I added converted the online lists myself, because as far as I know, there are no Anki decks (or there weren’t at that time anyway). It’s doable with some more advanced copy-paste-replace-fu. 🙂 If you want, I can check if I have the converted lists still somewhere on my hard drive, but I’m not sure.

  4. Olle Linge says:

    I just checked more carefully and there is a list of 8000 words in Anki, which should be everything. I haven’t checked the quality of the entries, but it will be a lot better than starting from the text files which is what I have. I integrated these words into my normal deck and so I only have the words I was previously lacking. Here’s info about the deck:

    標題: Test of Proficiency (TOP)
    標籤: chinese china taiwan traditional characters hsk

  5. Sara K. says:

    Yes, I have the TOP Anki deck (I think it is based on an older version of the TOP … but I don’t think the vocabulary changes very much from version to version, and if you just want to round out your vocabulary, it doesn’t matter which version it’s based on).

    Currently, I am focusing on the vocabulary I need in order to understand 1) wuxia and 2) soap operas. There is quite a bit of crossover, consider that both include lots of scenes where people are describing their passionate feelings. I am focusing on wuxia and soap operas because I am focusing on quantity right now, and wuxia and soap opera are things I can take in large quantities because 1) there are lots cliffhangers and 2) the stories are long (reading/watching a long work with the same characters is easier than reading/watching many short works – if I had to constantly figure out new contexts and adjust to different artistic styles, it would a lot more frustrating). I am focusing on quantity because I want to increase my reading speed and my comprehension skills *other* than vocabulary.

    However, I am also very aware that I am missing out on a lot of other useful vocabulary – even though much of that vocabulary is “easier” than the vocabulary I am picking up now. I actually plan to use the TOP Anki deck to round out my vocabulary when I transfer my focus from quantity to variety (being able to read/listen to Chinese about many different topics).

  6. Billy Waters says:

    Do you use Mindmanager? I find it useful for mapping out areas that I do or don’t know.

    1. Olle Linge says:

      @Billy: Never heard of it, I’m afraid. Can you give us a short introduction? I only find mind mapping software when I search the name.

  7. Matt says:

    Hey Olle,

    This is kind of off-topic but you said “of course” Anki filtered out any words already in your list. My deck doesn’t seem to be doing that when I import tab separated text files from pleco. What options do I need to select to ensure there are no duplicates? Thanks!

    1. Olle Linge says:

      Hi Matt,

      You have to tell Anki to prevent duplicates in certain field. If you go to Card Layout, you can select the various fields and check a box called “prevent duplicates”. I only have this checked for 漢字 because it doesn’t make sense for any other field. Thus, if I either try to import or manually enter a character (or word) that already exists, Anki won’t allow it. When I import lists, it won’t import duplicate expressions. Note that it only blocks exact duplicates, so if you have a blank space after a character, it won’t show p as a duplicate. Also, variants of the same character such as 為 and 爲 are treated as two different characters.

      Hope this helps!

  8. Matt says:

    Thanks, Olle, I’ve got those settings checked, and it does prevent me from manually creating duplicates, so there must be some differences in the expressions that I didn’t notice. I’ll keep playing around with it. Cheers!

    1. Olle Linge says:

      Yes, everything counts, including things that aren’t visible. So if you’ve coloured something black (as opposed to just having the default colour), it will still appear to be identical to another entry, but in fact, they are different.

  9. Steven says:

    Hi Olle,

    I just decided I wanted to learn Chinese. I’m really glad I found out about your website, it gives me a great framework to start with. Thanks for the good work!

    Steven

    1. Olle Linge says:

      Steven: Glad to hear you like the site! Feel free to ask questions if there is anything in particular you’d like to know. Otherwise, I’ll just wish you good luck! Enjoy!

  10. Matt says:

    Ok I’ve come across another problem now; about 75% of my deck is in simplified. I want to import the TOP vocab lists but of course Anki won’t catch any duplicates that exist only in simplified form. Anyone have an idea of how I could go about converting my whole deck? I suck at Anki!

    1. Olle Linge says:

      Matt: I’m not sure how to do this, but it would probably be easiest to export the deck, convert the characters (there are lots of tools that do that, even though it would harder to go from simplified to traditional; I don’t know how reliable this kind of automatic conversion is) and then import the deck again. It should be possible to do it while retaining statistics.

      See this and this.

  11. Matt says:

    Yes, my concern was keeping the stats, that’s the hard part. I had a look at those threads but neither seem to address the issue of the stats. I feel like there must be a way to do it using the pinyin toolkit. Anyway if I figure it out I’ll back here and post about it. Thanks for your help Olle, you are truly the Don Corleone of Chinese SLA blogs =)

    1. Olle Linge says:

      It’s possible to export review statistics as well, so it’s at least theoretically possible to achieve what you want. I’ve never actually tried to do it, though. How’s it going?

  12. Ollie Lovell says:

    Hey guys.

    I’ve turned the character frequency list here: http://www.zein.se/patrick/3000en.html

    Into an Anki deck.

    You can download it here: https://ankiweb.net/shared/info/1418386664

    Ollie.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.