Using Japanese open source LLMs for fun and profit
Dive into the awesome world of Japanese LLMs and gain insights into the cost and quality of responses of four popular open source Japanese LLMs.
Introduction
Large Language Models (LLMs) have become a major disruptive force in the world. So much so major companies refuse to hire programmers unless an LLM simply cannot do the job! Read on to know about the major LLMs available in Japanese language, what makes them unique and know benchmarks to see their performance on a simple instruct task.
This post will first introduce the major Japanese LLMs, and then share some insights on their architecture, usefuless and benchmarks on an Nvidia H100 SXM GPU. By the end of this post, you will be able to make informed decisions on the LLM that best suits your needs. The code used for obtaining the benchmarks in this post is freely available at this GitHub link: https://github.com/v0dro/japanese-llm-inference
Overview of Japanese LLMs
The following are large language models that either general purpose or targeted towards the Japanese language. The description of each model is given.
The top 4 LLMs in the table are open source LLMs - their weights are openly published online by the authors. You can download and modify them freely from a service such as Hugging Face. PLAMO-100B also has open source weights for the base version, but the instruction tuned model is only available as an API from Preferred Networks. Qwen is an open source model from the Alibaba group and has mostly been trained on Chinese data but works well with Japanese too. Most of the LLMs shown above are based on the Llama architecture. Fugaku-LLM is the oldest of the lot and is based on GPT-2.
Using Japanese LLMs for conversations
Since the base models are not very useful for conversations, we will use the instruction tuned models to deploy on an H100 SXM GPU and obtain answers for a real conversation. In order to evaluate the response of the LLM, it is given the following query for replying with the name of the lead character from the anime “Demon Slayer”.
以下は、タスクを説明する指示です。要求を適切に満たす応答を書きなさい。
### 指示: アニメ「鬼滅の刃」の主人公は誰ですか?
### 応答:
The table below shows the results of processing the above query. The specific model used, the time taken for generating a response (including tokenization and model processing time) and the number of replies generated in an hour on the Nvidia H100 SXM GPU. All tests were run on a pod from RunPod rented for about 425 JPY per hour (approx. 2.99 USD at the time of writing).
The above table shows that the GPT2-based model Fugaku-LLM is the most efficient even though it uses 13B parameters. This can be traced to the fact that it uses the GPT-2 architecture which requires matrix multiplications of a smaller size than the Llama architecture on which the rest of the models are based.
Responses returned by each model
The responses returned by each model for the query shown in the previous section are shown below. The correct answer for the question posed by the query is 竈門炭治郎 (Tanjiro Kamado). We notice that every LLM knows the correct answer for this query, and that each LLM has its own unique way of structuring the answer to this query.
Fugaku LLM-13B-instruct
The response from Fugaku-LLM is extremely to the point. It also returns the text from the prompt for completeness.
以下は、タスクを説明する指示です。要求を適切に満たす応答を書きなさい。
### 指示: アニメ「鬼滅の刃」の主人公は誰ですか?
### 応答:竈門炭治郎
Sarashina2.2-3b-instruct-v0.1
The reply from Sarashina2.2-3B is very elaborate. It shows the Hiragana pronounciation of the name of the lead character from Demon Slayer along with some information about him and his strengths and weaknesses.
'アニメ「鬼滅の刃」の主人公は竈門炭治郎(かまど たんじろう)です。彼は家族を鬼に殺され、唯一残った妹である禰豆子(ねずこ)も鬼になってしまったという悲劇的な背景を持っていますが、それでも希望を失わず、仲間たちとともに鬼を倒し、家族を取り戻すために戦います。炭治郎は優しく、強い精神力と勇気を持ち合わせたキャラクターで、物語を通じて多くの読者や視聴者から愛されています。
Llama-3.1-Swallow-8B-Instruct-v0.3
The reply from Llama-3.1-Swallow-8B is as precise as that of Fugaku-LLM, but the answer is in a longer form. It uses the part of the prompt to give a more descriptive answer. However, the various details of the main character are missing from Swallow’s response.
「鬼滅の刃」の主人公は **竈門炭治郎** です。
Llm-jp-3-13b-instruct3
The response from LLM-jp-3-13B conforms to the prompt template similar to that of Fugaku-LLM and is also more descriptive about the details of the answer.
### 指示: 以下は、タスクを説明する指示です。要求を適切に満たす応答を書きなさい。
### 指示: アニメ「鬼滅の刃」の主人公は誰ですか?
### 応答: アニメ「鬼滅の刃」の主人公は、竈門炭治郎(かまど たんじろう)です。彼は家族を鬼に殺され、唯一人鬼となった妹の竈門禰豆子(ねずこ)を人間に戻すため、そして家族を殺した鬼への復讐のために鬼狩りの旅に出ます。
Conclusion
In this post, we saw some important Japanese LLMs, and the responses that they return using an Nvidia H100 SXM GPU. The number of responses per hour of GPU varies greatly for each LLM. All four LLMs that we test return the correct answer for the query posed, but the quality and descriptiveness of their responses varies greatly.
Fugaku-LLM is the cheapest LLM to run, being able to process more than 40,000 queries per hour of GPU time. LLM-jp is the most expensive with only about 2,200 queries per hour of GPU time, but the replies from LLM-jp precisely follow the format given in the prompt and unlike Fugaku-LLM, give more information than just what is asked for. Sarashina-3B has the most descriptive answer, and is the second most expensive to run after LLM-jp.
Stay tuned for more posts!