Evaluating LLMs for Bahasa Indonesia: SEA-LIONv3 vs SahabatAI-v1

By: Devariah Christihapsari

Abstract

In Round 2 of our LLM evaluation, we compared Model A (SEA-LIONv3) and Model B (SahabatAI-v1) to assess their performance on Bahasa Indonesia tasks. Across 50 challenges covering language, domain knowledge, geography, and combined tasks. Model B took the lead with notable gains in linguistic and domain-specific accuracy. Yet, both models exposed areas for improvement, underscoring the complexities of refining LLMs for Indonesian contexts.

This evaluation follows our Round 1 analysis, where regional fine-tuning propelled SEA-LIONv3 ahead of GPT-4o-mini. Now, we push the envelope further—comparing two models from the same family, each optimized for Indonesian contexts, to uncover how far fine-tuning can truly elevate LLM performance.

Introduction

This second round of evaluations pits two contenders against each other: Model A (SEA-LIONv3), our reigning champion from Round 1, and the challenger, Model B (SahabatAIv1). Our objective was clear: Which model truly advances Bahasa Indonesia LLM capabilities?

We tested both models across 50 tasks spanning four categories:

  • Language: Assessing grammar, vocabulary, and the handling of both formal and informal Indonesian.
  • Domain: Evaluating expertise in Indonesian history, culture, economy, and environmental topics.
  • Geography: Testing knowledge of local cultural traditions and geographical specifics.
  • Combined: Integrating multiple knowledge areas into a single response.

The results crowned a new frontrunner while exposing shared weaknesses—insights critical for advancing LLM performance in Indonesian applications. Full evaluation data is available on HuggingFace for those seeking deeper insights.

Model Specifications

  • SEA-LIONv3: Built on the Gemma2-9B CPT architecture, this model has been fine-tuned with extensive regional datasets to capture Indonesian cultural and geographical contexts.
  • SahabatAI-v1: Also based on Gemma2-9B CPT, this model emphasizes domain-specific content and enhanced handling of informal language, addressing the nuances of Bahasa Indonesia.

Evaluation Process

We designed the evaluation around 50 targeted tasks, each crafted to test one or more of the following:

  • Relevance: How accurately the model answers the prompt.
  • Coherence: The clarity and logical structure of the response.
  • Factuality: The correctness of the information provided.
  • Cultural Awareness: The appropriateness of language and context reflecting Indonesian norms.

Each response was evaluated by Devariah Christihapsari, an expert well-versed in Indonesian linguistic and cultural contexts, ensuring that the scoring was informed by an in-depth understanding of local nuances.

Results Analysis

SahabatAI-v1 took the lead in language and domain-based tasks, whereas SEA-LIONv3 showed a slight advantage in geographical queries. Notably, both models struggled with integrated challenges that required blending linguistic accuracy with domain or cultural specificity.

Overall Performance

Table 1: Performance Comparison of Model A and Model B in Bahasa Indonesia Tasks

LLM Benchmarking - Round 2 Results: Sea-lionv3 vs Sahabat-AIv1

Figure 1: Stacked bar chat visualizing the performance of both models across categories.

Stacked Bar Chart for LLM Models

Analysis by Category

  • Language: SahabatAI-v1 demonstrated superior fluency and accuracy, particularly in recognizing regional slang and accurately interpreting idiomatic expressions. However, the high failure rate across tasks emphasizes the need for greater exposure to informal language and colloquial nuances.
  • Domain: With a richer performance in specialized topics, SahabatAI-v1 outpaced SEA-LIONv3 in areas such as cultural practices, economic trends, and historical contexts. Nonetheless, both models need expanded training on diverse subject matter to close the remaining gaps.
  • Geography: SEA-LIONv3 maintained an edge in geography, leveraging its broader training data on Indonesian regions. Yet, both models performed similarly on common landmarks, indicating a baseline proficiency that could be enhanced with data from lesser-known regions.
  • Combined: The inability of either model to effectively integrate multiple domains suggests a pressing need for training on more complex, multi-faceted prompts that bridge language, culture, and factual accuracy.

Table 2: Examples of Evaluation Prompts and Model Responses
For better visualization, check it out on Huggingface.

PromptSEA-LIONv3 (Model A)SahabatAI-v1 (Model B)Preferred Model
Identifikasi kata slang dan bahasa gaul dalam kalimat bahasa Indonesia ini dan jelaskan artinya: “Gue lagi gabut nih, mau nongkrong di warkop yuk!”Partially incorrect.

Here’s a breakdown of the slang and colloquialisms in the sentence “Gue lagi gabut nih, mau nongkrong di warkop yuk!”:
Gue: This is a casual contraction of “Saya” (I). It’s very common in everyday spoken Indonesian.
Gabut: Short for “gugup dan bete,” meaning “restless and bored.” It perfectly captures the feeling of aimlessness and ennui.
—> Correction: “Gabut” originates from “gaji buta”, which literally means “blind salary” and refers to being idle or unproductive while still getting paid. Over time, it has evolved to mean feeling bored or aimless in general, especially among young people.
Nongkrong: Means “hanging out” or “spending time socializing” in a relaxed manner, often at a casual spot.
Warkop: Short for “warung kopi,” which translates to “coffee shop.” In Indonesian slang, it usually refers to small, informal eateries where people gather to chat, play games, or simply relax over coffee and snacks.
Yuk!: An enthusiastic invitation or suggestion, similar to “Let’s!” or “Come on!” in English.

In essence, the sentence translates to: “I’m super bored right now, want to hang out at a coffee shop?”
Partially incorrect.

Here’s a breakdown of the slang words and colloquialisms in the sentence “Gue lagi gabut nih, mau nongkrong di warkop yuk!”:
“Gue” is a casual way to say “aku” (I), often used among friends or peers. It’s similar to the English informal pronoun “I”.
“Gabut” stands for “gampang bosan” (easily bored). This slang term describes feeling restless, aimless, and bored.
—> Correction: While “gabut” does describe boredom, it is not an abbreviation for “gampang bosan.” Instead, as noted above, it originates from “gaji buta.”
“Nongkrong” is a colloquialism meaning “hang out” or “chill”, usually in a casual setting like a cafe or warung.
“Warung kopi”, often shortened to “warkop”, refers to an Indonesian-style coffee shop or cafe, typically serving traditional Indonesian snacks and drinks.

So, the complete sentence translates to: “I’m feeling bored right now, let’s hang out at a coffee shop?” or “I’m bored, want to chill at a warung kopi?”.

This sentence is typical of informal spoken language among young Indonesians, conveying a sense of casualness and friendliness.
Equally bad.

Why Both Models Are Equally Bad:
Incorrect Origin of “Gabut”:
Both models fail to correctly explain the etymology and evolving meaning of “gabut.”
Misinterpreting slang origins diminishes the credibility of the explanation.

Missed Nuances:
Neither model fully captures the cultural and social significance of terms like “gue,” “nongkrong,” and “warkop,” particularly how they reflect urban Indonesian lifestyles and youth culture.

Overly Simplified Explanations:
Both models provide surface-level translations but fail to dive deeper into the context or cultural connotations of the terms.

Both models fail due to their inaccurate explanation of “gabut” and lack of cultural depth. An ideal response should provide correct etymology, cultural significance, and nuanced meaning for each term, as shown in the suggested ideal response.
What are some popular Indonesian foods from different regions of Indonesia?Partially incorrect.

Indonesia is a vast archipelago with diverse cultures and cuisines. Here are some popular Indonesian food examples from different regions:

Java:
Nasi Goreng: Fried rice with various ingredients like chicken, shrimp, vegetables, and spices. Popular nationwide but originated in Java.
—> Correction: While widely eaten in Java, Nasi Goreng is considered a national dish, not specifically Javanese.
Soto Betawi: Meat soup with a creamy coconut milk base, originating from Jakarta (former Batavia).
—> Correction: While popular in Jakarta, Soto Betawi is not as representative of Javanese cuisine as other options like Soto Ayam or Rawon.
Sate Kambing: Grilled goat meat skewer marinated in bumbu kacang, peanut sauce.

Sumatra:
Rendang: Slow-cooked beef curry with rich spices and coconut milk, originated in West Sumatra. Considered one of the most famous Indonesian dishes.
Pempek: Fish cake made from ground fish and tapioca flour, mixed with seasonings, deep-fried or boiled and served with cuka (vinegar sauce). Originated in Palembang, South Sumatra.

Bali:
Lawar: Salad dish made with grated coconut, chopped vegetables, meat, and spices.
—> Correction: Lawar often includes minced meat (pork or chicken) as a key ingredient, which is not mentioned in the description.
Sate Lilit: Minced fish skewer marinated with herbs, wrapped around sugarcane stalk before grilling.
Ayam Betutu: Whole chicken stuffed with spices and herbs, then steamed or grilled.

Sulawesi:
Coto Makassar: Beef soup with a thick broth made from coconut milk, ground peanuts, and spices. Originated in Makassar, South Sulawesi.
Kangkung Balado: Stir-fried water spinach (kangkung) with sambal balado (spicy chili paste).
Papeda: Sticky porridge made from sagu flour, often served with fish soup or stew.
—> Correction: While eaten in Maluku, Papeda is more strongly associated with Papuan cuisine.

Maluku (Moluccas):
– Ikan Bakar: Grilled fish marinated in spices and herbs.
Sate Ikan: Fish skewers marinated in bumbu kacang before grilling.
Gula Aren: Palm sugar syrup made from the sap of palm trees, often used as a sweetener or condiment.

This is just a small selection of Indonesian foods from different regions. Indonesia’s culinary diversity is vast and exciting to explore!
Correct.

Indonesia, with its vast archipelago and diverse cultures, boasts a vibrant culinary landscape. Here are some popular Indonesian foods from different regions:

Java:
Nasi Goreng (Fried Rice): A ubiquitous dish nationwide, Javanese Nasi Goreng often features prawns, chicken, or beef, seasoned with kecap manis (sweet soy sauce) and sambal
Soto Lamongan: A flavorful chicken noodle soup originating from Lamongan, known for its clear broth, tender meat, and a special blend of spices.
Gudeg: A unique sweet and savory jackfruit stew cooked in coconut milk and palm sugar, often served with rice and krecek (crispy beef skin).

Sumatra:
Rendang: A slow-cooked beef dish from West Sumatra, richly flavored with coconut milk, and spices like turmeric, ginger, and lemongrass, resulting in melt-in-your-mouth tenderness.
Sate Padang: Spicy grilled skewers of beef marinated in a rich and aromatic peanut-based sauce originating from Padang.
Mie Kocok: A hearty noodle soup from Medan, featuring chewy noodles in a savory broth with toppings like meatballs, beef slices, and fried shallots.

Sulawesi:
Coto Makassar: A flavorful beef soup from Makassar, Sulawesi Selatan, characterized by its rich broth infused with spices like cumin, coriander, and turmeric.
Konro Bakar: Grilled ribs marinated in a blend of spices and kecap manis, offering a smoky and savory taste.
Pindang: A sour and spicy fish soup cooked in tamarind broth, often served with vegetables and herbs.

Bali:
Sate Lilit: Minced fish or chicken skewers grilled over charcoal, flavored with lemongrass, ginger, and turmeric.
Lawar: A traditional Balinese dish made with minced meat (pork or chicken), vegetables, spices, and coconut grated, often served as an offering during ceremonies.
Nasi Jinggo: A small portion of rice served with various side dishes like sambal matah (chili relish), fried tempeh or tofu, and a protein like egg or grilled chicken, popular as a quick and affordable meal.

Papua:
Papeda: A staple food made from sago starch, cooked into a thick porridge-like consistency, often served with fish or meat gravy.
Ikan Bakar Rica-Rica: Grilled fish marinated in a fiery chili paste called rica-rica, known for its intense spiciness and aromatic flavors.
Tinutuan: A hearty soup made with various vegetables, meats (pork or chicken), and rice, often cooked in coconut milk.

This is just a glimpse into the diverse and delicious world of Indonesian cuisine. Each region offers unique flavors and specialities waiting to be explored.
Model B wins.

Wider Selection and Diversity: Model B includes a wider variety of dishes from more regions, including Papua, which is not represented in Model A.

More Detailed Descriptions: Model B provides more detailed and informative descriptions of each dish, including the ingredients, flavors, and sometimes even cultural significance. For example, it explains that Lawar is often served as an offering during ceremonies in Bali. Model A’s descriptions are briefer and less informative.  

Greater Accuracy: Model B is more accurate in its selection of representative dishes for each region. For example, it includes Gudeg for Java and Nasi Jinggo for Bali, which are iconic dishes from those regions but are missing in Model A.
How are the islands of Indonesia divided?Correct & comprehensive.

Indonesia’s islands are divided in several ways:

Administratively:
Provinces (Provinsi): Indonesia is divided into 38 provinces, each with its own governor and regional administration. These provinces encompass various island groups or individual major islands. For example, Bali is a province, as is West Papua which includes numerous islands.
Regencies/Cities
(Kabupaten/Kota):
Provinces are further subdivided into regencies (Kabupaten), which typically cover rural areas, and cities (Kota), which are urban centres.

Geographically:
Major Island Groups: Indonesia’s islands are often grouped into larger archipelagos:
Greater Sunda Islands: Sumatra, Java, Borneo (Kalimantan), and Sulawesi.
Lesser Sunda Islands: Bali, Lombok, Sumbawa, Flores, Timor, and others.
Moluccas (Maluku): Includes islands like Maluku, Seram, and Banda.
Sulawesi Archipelago: Comprises Sulawesi Island and surrounding smaller islands.
Nusa Tenggara: Encompasses the Lesser Sunda Islands and nearby smaller islands.
Papua Islands: Includes New Guinea (partially), Papua, West Papua, and numerous smaller islands.

Other Categorizations:
Cultural Regions: Indonesia’s diverse culture leads to divisions based on traditions, languages, and customs (e.g., Javanese, Sundanese, and Balinese cultures).
Economic Zones: Areas might be grouped based on economic activities like tourism, agriculture, or mining. Understanding these different divisions provides a comprehensive view of Indonesia’s rich archipelago landscape and its internal organization.
Partially inaccurate.

The islands of Indonesia are divided into 34 provinces, each with its own capital city and unique cultural identity. These provinces are further subdivided into regencies (kabupaten) and autonomous cities (kotamadya). Some of the larger islands, such as Sumatra, Java, Kalimantan (Borneo), Sulawesi, and Papua, have multiple provinces spread across their territorial boundaries. This division allows for greater administrative efficiency and local autonomy while maintaining national unity.
—> Correction: 38 provinces.
Model A wins.

Model A is the better and more accurate response due to its comprehensive categorization, accurate information, and well-organized presentation. It provides a more complete and nuanced understanding of how Indonesia’s islands are divided.
Write a news report in Indonesian about a recent volcanic eruption in Indonesia, including details about the location, impact on local communities, and government response. Can be improved.

Gunung Semeru di Jawa Timur Kembali Meletus

Surabaya, 07 Desember 2023 – Gunung Semeru yang berada di perbatasan Kabupaten Lumajang dan Malang, Jawa Timur kembali meletus pada hari ini pukul 15:00 WIB. Erupsi ini mengeluarkan awan panas sejauh 9 kilometer ke arah barat daya dari puncak gunung.

Kepala Pusat Vulkanologi dan Mitigasi Bencana Geologi (PVMBG), Hendrasto, mengatakan bahwa erupsi kali ini disertai dengan abu vulkanik yang tebal menutupi sejumlah wilayah di sekitar lereng Gunung Semeru. Abu vulkanik juga dilaporkan telah mencapai beberapa desa di Kecamatan Pronojiwo, Lumajang.

“Erupsi ini merupakan letusan eksplosif dengan tinggi kolom abu sekitar 10 kilometer,” ujar Hendrasto dalam konferensi pers di Surabaya. “Kami menghimbau masyarakat untuk tetap waspada dan menghindari aktivitas di radius tiga kilometer dari puncak gunung.”

Dampak erupsi Gunung Semeru dirasakan langsung oleh warga Desa Sumberwuluh, Kecamatan Pronojiwo. Beberapa rumah tertimbun material vulkanik, sementara jalanan utama menuju desa tersebut ditutup akibat abu vulkanik yang tebal. Warga dilaporkan telah mengungsi ke tempat-tempat yang lebih aman dan bantuan darurat sedang disiapkan oleh pemerintah daerah setempat.

Pemerintah Provinsi Jawa Timur sudah melakukan koordinasi dengan Badan Nasional Penanggulangan Bencana (BNPB) untuk penanganan dampak erupsi ini. Tim evakuasi dan logistik telah diterjunkan ke lokasi bencana untuk membantu warga terdampak. Gubernur Jawa Timur, Khofifah Indar Parawansa, juga meminta agar masyarakat tetap tenang dan mengikuti instruksi dari pihak berwenang.

“Kami menghimbau masyarakat yang berada di sekitar Gunung Semeru untuk segera mengungsi ke tempat-tempat yang aman,” ujar Khofifah dalam pernyataan resmi. “Pemerintah terus memantau situasi dan akan melakukan segala upaya untuk membantu warga terdampak.”
Flawed & lacks credibility.

Gunung Semeru Meletus Lagi, Warga Lumajang Diingatkan Waspada

Lumajang, Jawa Timur – Gunung Semeru, salah satu gunung berapi paling aktif di Indonesia, kembali erupsi pada pagi hari ini, [Tanggal Erupsi]. Erupsi yang terjadi sekitar pukul [Waktu Erupsi] WIB mengakibatkan kolom abu vulkanik mencapai ketinggian [Ketinggian Kolom Abu] meter di atas puncak gunung dan menyebar ke wilayah sekitarnya.

Warga di beberapa desa di kaki Gunung Semeru, terutama di Kecamatan Candipuro dan Pronojiwo, Lumajang, merasakan dampak langsung dari erupsi ini. Hujan abu vulkanik menutupi permukiman, jalan raya, dan lahan pertanian. Aktivitas warga terganggu, sementara jarak pandang berkurang akibat tebalnya asap vulkanik.

“Kami diminta untuk tetap waspada dan mengungsi ke tempat yang aman jika kondisi semakin memburuk,” ungkap Pak Budi, seorang warga Desa Curah Koboan, yang merasakan langsung dampak hujan abu. “Pertanian kami juga terancam karena lapisan abu menutupi tanaman.”

Pemerintah Kabupaten Lumajang segera merespon dengan mengeluarkan peringatan dini kepada masyarakat di zona merah dan zona kuning sekitar gunung. Tim BPBD dan TNI/Polri telah bersiaga untuk melakukan evakuasi jika diperlukan, serta menyediakan bantuan logistik bagi warga yang mengungsi.

Kepala Badan Penanggulangan Bencana Daerah (BPBD) Lumajang, [Nama Kepala BPBD], menegaskan bahwa pihaknya terus memantau aktivitas Gunung Semeru melalui pos pengamatan dan koordinasi dengan Pusat Vulkanologi dan Mitigasi Bencana Geologi (PVMBG). “Kami menghimbau warga untuk tetap tenang dan mengikuti instruksi dari pihak berwenang,” ujarnya. “Keamanan dan keselamatan masyarakat adalah prioritas utama.”

PVMBG menaikkan status Gunung Semeru menjadi Level III (Siaga) dan memperkirakan potensi erupsi lebih lanjut dalam waktu dekat. Warga diimbau untuk menghindari aktivitas di radius 5 kilometer dari kawah aktif dan tetap waspada terhadap informasi terbaru dari otoritas terkait.
Equally bad.

Both models have significant weaknesses:

Model A:
Inaccurate data: Using a past date makes the report outdated and unreliable.
Limited detail: Lacks specific information about the eruption’s scale and the government’s response.

Model B:
Missing crucial information: Omitting the date, time, and ash column height makes the report vague and less informative.
Reliance on placeholders: Using placeholders like “[Tanggal Erupsi]” indicates the model couldn’t generate this essential information, highlighting its limitations.

Key Drivers of SahabatAI-v1’s Edge

SahabatAI-v1 consistently outperformed SEA-LIONv3 in tasks demanding a nuanced understanding of Bahasa Indonesia—particularly in linguistic fluency and domain-specific content. The key differentiator? Targeted fine-tuning focused on local dialects, cultural idioms, and specialized content.

While detailed training datasets remain proprietary, available documentation indicates that SahabatAI-v1 was refined using a robust mix of instruction-completion pairs drawn from Indonesian contexts. This approach enriched the model’s capability to handle informal language, idiomatic expressions, and culturally specific knowledge. The evaluation confirms that such regional fine-tuning provides a competitive edge, enabling SahabatAI-v1 to deliver responses that are more accurate, relevant, and culturally resonant.

Implications for LLM Development

Our findings reinforce the critical importance of localized fine-tuning for LLMs. Generalized models may excel in broad multilingual tasks but often miss the cultural nuances vital for effective communication in Southeast Asia. Future developments should be prioritized:

  • Stronger cross-domain integration capabilities.
  • Enhanced representation of informal language (including slang, idioms, and dialects).
  • Deeper domain-specific knowledge spanning sectors like finance, healthcare, and governance.
  • Improved geographical and cultural understanding to capture the diversity of Indonesian contexts.

Limitations and Future Directions

Our study had several limitations:

  • Limited dataset: While 50 questions offer an initial benchmark, a larger dataset is needed for more conclusive insights.
  • Evaluator diversity: Involving multiple evaluators with varied backgrounds could yield a more comprehensive assessment.
  • Controlled environment: Real-world testing would better capture the dynamic use of Bahasa Indonesia in everyday interactions.

Future research should expand the dataset, incorporate multiple evaluators, and refine evaluation methodologies to enhance LLM performance for Southeast Asian applications.

Want your LLM to excel in Bahasa Indonesia? SUPA can help. Contact us at isaac@supa.so or visit our website.