ChatGPT is a chat bot developed by OpenAI and was launched in late 2022. It’s been all over the press as the fastest growing web platform of all time, so most of you have probably heard of it. Reports of it being able to pass university exams in a range of subjects have been all over both traditional and social media, which makes me wonder: Can ChatGPT pass the HSK?
I will write more about using ChatGPT for learning and teaching Chinese in future articles, so if you have ideas for how it can or cannot be used, please leave a comment below or send me an email! In this article, we will focus on the HSK specifically.
Tune in to the Hacking Chinese Podcast to listen to the related episode:
Available on Apple Podcasts, Google Podcasts, Overcast, Spotify, YouTube and many other platforms!
Can ChatGPT pass the HSK?
I considered letting ChatGPT write the article for me, but letting it write articles about itself has been done by so many others that it’s getting boring. Also, ChatGPT does much better with shorter texts and clear prompts. Let’s hit two birds with one stone and use ChatGPT to introduce itself, the HSK and the task at hand!
Note: I have introduced ChatGPT to many people recently, mostly in a professional context, and I have noticed that simply telling people about it doesn’t work, because they often dismiss it as just another mediocre chat bot. It is not. While I can show you my interactions with ChatGPT, I strongly encourage you to try it out yourself if you haven’t!
Here’s what we will cover in this article:
- Asking ChatGPT
- Can ChatGPT even take the HSK?
- Listening, speaking, reading and writing
- Using mock HSK exams to test ChatGPT’s reading ability
- ChatGPT vs. HSK 3
- ChatGPT vs. HSK 4-5
- ChatGPT vs. HSK 6
- So can ChatGPT pass the HSK?
- What does this mean?
- How to use ChatGPT to learn and teach Chinese
ChatGPT: What is ChatGPT?
ChatGPT: What is the HSK?
ChatGPT: Do you think you could pass the HSK?
Can ChatGPT even take the HSK?
We will see some more of ChatGPT’s capabilities later when we get to the actual exam, but let’s discuss some practical issues first, some of which were mentioned above by ChatGPT itself.
Obviously, ChatGPT can’t take the HSK. You have to be a real person to sign up and you need to be smarter than ChatGPT to understand how the exam works. However, with some handholding, it’s still possible to feed the exam questions to the bot to see how it would perform on various levels of the HSK.
Unfortunately, the two lowest levels on the exam rely on images, which makes them impossible to use (ChatGPT has no image recognition capabilities). Starting from HSK 3, all reading comprehension questions are pure text, however, and any images do nothing more than provide hints to the general topic of the text. Hence, we’ll focus on HSK 3-6. This will all be HSK 2.0 as HSK 3.0 has only started rolling out, and only with the most advanced levels.
Listening, speaking, reading and writing Chinese
To pass the exam, you also need to be able to listen, speak, read and write. For ChatGPT, listening, would the same as reading, plus an added speech-to-text layer. If it can pass reading questions, it should be able to deal with listening as well, provided that the transcribing goes well. The goal here is not to evaluate transcription software, so we’ll focus on reading.
Speaking and writing are hard to test, because I’m not a trained HSK examiner and don’t know how the speaking and writing parts are graded. The listening and reading parts contain only multiple-choice questions, which makes it easy to check the right answer. If there’s enough interest, I could return to the topic of writing (and maybe speaking) in a follow-up article and evaluate it anyway.
Using mock HSK exams to test ChatGPT’s reading ability
Thus, we are left with reading comprehension. This doesn’t sound like much, but it is a substantial part of the exam, and don’t forget that we expect it to be able to do roughly as well on the listening part, given that speech-to-text is quite good these days.
These combined make up 70 out of the 80 points on HSK 3, for example, so if ChatGPT does well enough, it might be able to pass even with 0 points on the writing part!
To test if ChatGPT can pass the HSK, I used the official mock exams from HSK Mock. If you want to check them yourself or see exactly which questions ChatGPT got wrong, just download the appropriate mock exam from the website. Here are direct links to the PDF documents:
Spending an hour helping ChatGPT finish HSK 3 in one minute
In this article, I’m not going to go through every single question of all the four HSK levels, as this would just be tedious. Most of the process is just me feeding texts and questions to ChatGPT, and it spitting out answers in a few seconds. To show you the process, I will go through HSK 3 in some detail.
ChatGPT vs. HSK 3 Reading: Part 1
Interestingly, there are no instructions on the exam paper for what to do in part one, but the example shown makes it clear that there are six responses (A-F) that each are supposed to be matched with one of six utterances. The example shown is:
你知道怎么去那儿吗?
当然。我们先坐公共汽车,然后换地铁。
Here, I needed to prompt ChatGPT so it had a reasonable chance of giving the correct answer. For example, I told it explicitly that each option could only be used once. In general, I kept tinkering with prompts until ChatGPT gave an answer according to the instructions. If it did something else, such as translating the questions, I tried again until it produced an answer. Obviously, I did not check if the answer it produced was correct until afterwards!
Here’s question 1 (note that I included the example as a real question to make it a bit harder):
Checking the answer key, we can indeed see that C, D, B, A, F are the correct answer (E is the example, so ignore that). Moving on to questions 46-50:
This is a good example of the hand-holding I mentioned earlier. ChatGPT ignored my instruction to only use each response once and used E twice and didn’t use B. Thus, I prompted it again:
Checking the answer key, we can see that the right answer is actually B, E, C, A, D. This means that my hand-holding here actually lost the bot points. This is a good opportunity to talk about one of the limitations of ChatGPT, namely that it doesn’t actually understand anything and it’s hard to know when it outputs nonsense and when it actually knows what it’s doing.
We know that it’s not just guessing, because it gets most questions right (although not in this case), but if I allowed it to output anything and just registered that as its answer, it would in general lose an awful lot of points (I have tried). This is probably because the question format confuses it, not because it doesn’t have the data to answer the questions correctly. Thus, I decided to always force it to answer according to the instruction; no parley bets allowed.
So far, ChatGPT has scored 7/10, so much better than mere guessing.
ChatGPT vs. HSK 3 Reading: Part 2
Moving on to the next part, where you are to insert the right word into the blanks. In this case, I omitted the example, which is why there is no option E:
Checking the key, we can see that this is spot on! D, B, F, A, C. As we shall see, ChatGPT almost never gets this type of cloze deletion wrong, which is not surprising considering that it can check vast amounts of data for what words are most likely to appear together in sentences like these. It gets the next five right too:
ChatGPT vs. HSK 3 Reading: Part 3
The final part of the reading section consists of short texts followed by reading comprehension questions. We can expect thees to be harder, because it’s not longer just a matter of seeing which option is statistically the most likely based on a huge dataset of written Chinese. Here’s the example given on the mock exam:
您是来参加今天会议的吗?您来早了一点儿,现在才八点半。您先进来坐吧。
★ 会议最可能几点开始?
A 8 点
B 8 点半
C 9 点
Here, we might expect ChatGPT to do worse, because this is no longer just a matter of judging which resulting chunk of text is statistically the most likely. But no, it passes with flying colours:
B, B, A, A, C, B, C, B, C, C. All correct.
But wait, isn’t it just copying the answers?
When seeing ChatGPT in action the first time, many people think that it must be copying existing text from the internet. This is not the case, but it is a valid question, because the mock exams I”m using have been available online for years, so doesn’t ChatGPT know the answers based on that?
No. To begin with, the answers are only given as number-letter combinations at the end of the mock exam document, so the answers aren’t written out. Okay, but wait, there probably are people who have explained the correct answers to the mock exam questions online. Still no; it’s not copying anything.
You can easily check this yourself by changing the text in a way that changes which answer is the correct one. If you do, ChatGPT will still give you the correct answer based on the newly created and therefore unique text.
Obviously, I didn’t try modifying every single text in the exam, but I tried it enough to satisfy my curiosity and to be able to write this paragraph in good conscience. Also, if the correct answers were available, it’s hard to explain why it would get some questions wrong
I have also tried using ChatGPT to answer similar reading questions on exams I or my colleagues at university have created and are extremely unlikely to be part of the training data. ChatGPT performed on par with what it does here, so I have reason to suspect that the results are due to the answers being part of the training data.
ChatGPT got 27/30 on HSK 3
Here’s the scorecard for HSK 3:
ChatGPT vs. HSK 4 and HSK 5 reading
Like I said earlier, I’m not going to go go through the process of taking screenshots of my interactions with ChatGPT for all the questions on every level. It took many hours to just collect the data here and posting screenshots would have takes hours more, without adding any new insights. Instead, I will give you the results and discuss them. Remember, you can always download the mock exams to check which questions it got wrong if you’re curious! Here’s a link to my spreadsheet.
In general, ChatGPT excels at matching questions and simple cloze deletion, but it sometimes struggles with more complex questions.For example, it didn’t do very well on part 1 of HSK 5 reading, where it’s supposed to pick sets of three characters that fit into three gaps in a text. Here’s an example:
在高速行驶的火车上,有一位老人不小心把刚买的新鞋从窗口掉下去一只,
周围的人都觉得很 46 。没想到老人把另一只鞋也从窗口扔了出去。他的行为让
周围的人感到很吃惊。这时候,老人笑着 47 说:“剩下的那只鞋无论多么好,
多么贵,多么适合我穿,可对我来说已经没有一点儿用处了。我把它扔了出去,
就有人可能 48 到一双鞋子,说不定他还可以穿呢。”46.A 浪费 B 伤心 C 可惜 D 痛苦
47.A 解释 B 理解 C 建议 D 思考
48.A 捡 B 选 C 买 D 换
It’s hard to prompt the bot to do the right thing here. One way of doing it would have been to present three versions of the text with each set inserted into the gaps but this would have been very time-consuming to do for every question, so I did’t do that. Even so, it only got two of these wrong.
Another thing worth repeating is that ChatGPT doesn’t think and can’t reason logically. This means that some complex reading questions are hard to answer, something that becomes more relevant the more advanced the reading questions become.
As you all know, reading on a basic level is just about extracting information that is clearly presented in the text, but more advanced reading is about inferring an answer fro what’s said in the text, even if the answer isn’t explicitly presented in the text. This becomes very relevant on HSK 6, so let’s move on!
ChatGPT vs. HSK 6
As I mentioned, ChatGPT doesn’t actually think, even though it certainly feels like it does at times (for a related philosophical question, see Chinese Room on Wikipedia, or read Blindsight by Peter Watts). This limitation becomes very obvious on many HSK 6 questions, especially those where the test is constructed to trick you into thinking that something is the answer unless you really understand what it means.
Consequently, ChatGPT did not do as well on HSK 6, and it did particularly badly on parts 1 and 4. When you look at the data here, remember that most questions have ¼ chance of being correct by pure chance.
While this type of question isn’t common on the HSK, I tried to run several more complex reading comprehension questions through ChatGPT, and it did very badly in some cases (worse than guessing). One question we used on an exam featured an explanation of the Monty Hall problem, and then asked some basic questions about it, including a “choose an appropriate title for this text”. ChatGPT got every single question wrong. It’s clear that question type matters quite a lot, but it’s beyond the scope of this article to probe this further.
So can ChatGPT pass the HSK?
Well, if we mean if it can give the correct answers to multiple-choice reading questions, the answer is a resounding “yes”, at least if the questions are straight up, honest reading questions not deliberately designed to trick you.
This is quite amazing! Going into this project, I expected it to do very well on matching and fill-in-the-blanks types of questions, which it did. I also expected it to struggle with trickier questions, which it also did. However, I e expected it to struggle a lot more than it did!
I was also blown away by the accuracy of most of the reading comprehension questions. Sure, it’s one thing to be able to fill in the blanks or match words, but actually, content-based reading questions also went surprisingly well!
I should have predicted that, though, because ChatGPT also does pretty well on most take-home exam style questions, as shown in multiple YouTube videos (search for your area of expertise and ChatGPT and you’ll probably find something).
It should be pointed out that it would be premature to say that ChatGPT could pass HSK, even ignoring the fact that it needs human guidance to format the questions properly, because the HSK is a test you either pass or fail the whole exam, not just the reading part. However, if we assume that the percentage of correct answers required for the whole exam were required for each part, ChatGPT would breeze through HSK 3-5. It would also pass HSK 6, although not with as safe a margin.
What does this mean?
First and foremost, it means that you shouldn’t ignore ChatGPT or its competitors. If you haven’t taken recent AI development seriously in spite of (or maybe because of) the hype in the news, it’s time that you do. If you haven’t tried ChatGPT yourself, you should.
It’s is already quite useful for many tasks, and AI will only get better with time. The Rubicon is crossed, the die is cast, and the machine rebellion is drawing nigh. Or something.
It also means that it’s become even easier to cheat on exams if you have access to the internet. To language teachers like me, this is not really a big thing, because it’s pretty much the same problem as we’ve had with Google Translate for more than a decade, just a bit wider in scope so it now affects all other subjects too.
Additionally, I’m mostly interested how we can leverage tools like ChatGPT and Google Translate to our advantage as language learners and language teachers. If you want to cheat on an exam, that’s your problem, and whatever you’re doing, it does’t count as language learning and it’s outside the scope of Hacking Chinese.
If you want good grades and learn the language at the same time, you can check this article:
How to use ChatGPT to learn and teach Chinese
There are many ways of using ChatGPT to learn or teach Chinese. I have spent a lot of time experimenting with it in the past few months, but rather than posting something quick and dirty, I want to collect good ideas and leave some time for reflection before I write something.
This means that while this article ends here, I will write more about how you can use ChatGPT to learn and teach Chinese in the near future. To be able to do that well, I need your help! If you have used ChatGPT for learning or teaching Chinese, please leave a comment below or send me an email:
- What did you try?
- What worked well?
- What didn’t work so well?
To match the theme of AI, the images used in this article were created using Stable Diffusion with various prompts related to robots and reading. The “HSK” on the book cover on one of the images was added manually afterwards, as SD is very bad at text.
Further reading about HSK on Hacking Chinese
6 comments
One could use ChatGPT to generate mnemonics/sentences for learning Hanzi. Where the generated mnemonic would contain the components and radicals of the Hanzi and one or several meanings. It would be easy to build a mental garden/castle this way so that all mnemonics relate to the same theme. E.g. a walk in a magic forest or any area of interest of the student. Together with a DALL-E generated image and anki this could be a powerful and customizable tool.
Came across this Reddit post: you can ask Chatgpt to write graded reader. The few tests I did gave me articles similar to what TCB are proposing (at least for HSK4):
https://www.reddit.com/r/ChineseLanguage/comments/116wpqb/found_out_today_that_chatgpt_can_be_used_to/?utm_source=share&utm_medium=ios_app&utm_name=iossmf
Yeah, I saw that one too! But thanks for recommending it, I spend less and less time on Reddit these days. I’ve tried similar things, always with pretty bad results. This is on my list of things to evaluate more properly and write about, for sure!
I’ve been using ChatGPT to create grammar exercises (for example, I might use the grammar points in a textbook chapter that we’re studying). I’ll prompt it to write me a summary lesson and then drill me with sentences to translate, critique my answers, and suggest any improvements. In each lesson I’ll give it a list of 6 or so grammar points to cover at a time. I’m using GPT4 and the Prompt Perfect add in.
The results have definitely helped me to practice – I especially like getting the immediate critique of my translations and the suggestions for improvement and I can ask for clarification. So it’s a lot more useful than just doing exercises from textbooks.
I do find that if I give it too much content to cover it tends to omit stuff from the exercises so 6 topics at a time seems about right. I create a new chat for each new batch of topics and go back and practice the old ones from my history from time to time.
Oh – one other shortcoming is that I haven’t found a way to restrict the vocabulary it uses in the exercises (I tried specifying the HSK level but to no avail). But I don’t find that too much of an issue.
Recently I’ve been playing twenty questions with ChatGPT – you can restrict the mystery item to practice more specific vocab (eg “must be an inanimate object you would find inside a house”).
I’ve also used it to generate mnemonics for hanzi (although it’s annoyingly puritanical).