What IT services does KGA provide?

KGA provides comprehensive IT support services including software installation and setup, SaaS system maintenance, application configuration, technical support, digital consulting (including website development), security services, and data management & backup solutions.

What areas do you cover?

Based in Kosai, Shizuoka, we provide remote support nationwide across Japan. On-site support is available primarily in the Tokai region.

Can I consult before signing a contract?

Yes, initial consultation and estimates are completely free. We will listen to your IT challenges and propose the optimal solution.

Is emergency support available?

Yes, the Premium plan includes 24-hour emergency support. The Standard plan also provides priority response during business hours.

Can you set up international TV apps?

Yes, we support the installation and configuration of international TV applications and media players. We help set up environments for legal access to international content.

Do you offer multilingual support?

We support 9 languages: Japanese, English, Portuguese, Korean, Chinese, Malay, Filipino, Vietnamese, and Spanish.

Are there any setup or hidden fees?

No. All prices displayed are final and tax-included. There are no setup fees, hidden charges, or surprise invoices. What you see is exactly what you pay.

Can I change plans later?

Yes. You can upgrade, downgrade, or cancel at any time. Upgrades take effect immediately and we will prorate the difference. Downgrades take effect at the next renewal cycle.

Which payment methods do you accept?

We accept all major credit cards (Visa, Mastercard, JCB, American Express) through Komoju, as well as bank transfers and convenience store payments in Japan. Invoicing is available for Business IT Plan customers.

Do you offer refunds?

Yes. We offer a 14-day money-back guarantee on all annual plans — no questions asked. Monthly Business IT Plan subscriptions can be cancelled at any time with prorated refunds for unused service.

What is the difference between the annual plans and the Business IT Plan?

Annual plans cover app configuration and support for individuals and small teams. The Business IT Plan is a comprehensive monthly subscription for companies that require website development, system management, automation, security, and a dedicated account manager.

Do you provide support in English?

Yes. Our team provides full multilingual support in Japanese, English, Portuguese, Korean, Chinese, Malay, Filipino, Vietnamese, and Spanish — by email, chat, and scheduled video calls.

Open LLM Fine-Tuning 2026: Synthetic Data, DPO Variants, Japanese-Specific Models — KGA Tech Blog

The Synthetic Data Era Has Arrived

The open-source LLM fine-tuning landscape of 2026 has undergone a complete paradigm shift — from the "human-annotation-centered" approach of 2023–2024 to a "teacher-model-driven synthetic data" approach. High-quality datasets distilled from frontier closed-source models like Claude Opus 4.7, GPT-5, and Gemini 2.5 Ultra are now publicly available, and 7B–13B base models can now achieve instruction-following capabilities comparable to 70B models from 2024.

This article organizes April 2026 best practices across five axes: data generation, algorithms, Japanese-language specialization, reproducible recipes, and ethics.

Standard Teacher Model Distillation Pipeline

Microsoft's Phi series pioneered the "textbook-quality data" philosophy, and it has been refined further in 2026. Community datasets replicating the Phi-5/Phi-5-mini recipe have standardized on the following pipeline:

Seed data extraction: pull the top 5% by quality score from Common Crawl + GitHub + arXiv + Stack Exchange
Question generation via teacher model: prompt Claude Opus 4.7 to generate "10 questions a graduate student might ask about this document"
Answer generation with chain-of-thought: GPT-5 generates answers with reasoning traces, self-consistency checked
Difficulty balancing: mix easy/medium/hard at a 3:5:2 ratio, 200–4000 tokens in length
Rejection sampling: a separate teacher scores outputs, bottom 30% discarded

The MAP-Neo-v2 dataset published in March 2026 (2.1T tokens, CC-BY-4.0) is a Japanese-English-Chinese multilingual corpus built with this pipeline. The compute required for continued pretraining on a Llama 3 8B base was equivalent to roughly ¥3 billion — and it's being distributed for free.

Choosing Between DPO, IPO, and KTO

Preference learning algorithms moved past the RLHF era into computationally lighter offline methods. Here's the current state of when to use each:

DPO (Direct Preference Optimization): first choice when you have abundant pairwise preference data. Simple to implement, 1/5 the compute cost of PPO. Weaker reward-hacking resistance than PPO.
IPO (Identity Preference Optimization): theoretically addresses DPO's overfitting problem. Outperforms DPO especially on small datasets (under 10K pairs).
KTO (Kahneman-Tversky Optimization): no pairs required — learns from binary good/bad labels only. Can directly leverage user thumbs-up/thumbs-down logs, which is a major practical advantage.
SimPO: improves on DPO without a reference model. 40% memory reduction, performance maintained. Close to becoming the 2026 standard.
RLAIF (AI Feedback): replaces human labelers with Claude or GPT. 1/100 the cost, ~95% of human-annotation quality.

```yaml # SimPO configuration in axolotl (Qwen 3 7B base) base_model: Qwen/Qwen3-7B-Base rl: simpo simpo_gamma: 1.4 simpo_beta: 2.0 datasets: - path: argilla/ultrafeedback-binarized-preferences-cleaned type: chatml.ultra learning_rate: 5.0e-7 num_epochs: 1 sample_packing: true gradient_checkpointing: true adapter: lora lora_r: 64 lora_alpha: 128 ```

Japanese-Language Model Progress

By 2026, the route of continued training on top of foreign base models has decisively won for Japanese LLMs. Here's the current status of the three major lineages:

Swallow v3 (Tokyo Institute of Technology): continued pretraining + instruction tuning on Llama 4 70B. 600B additional Japanese tokens, JMT-Bench 8.52, Jaster 77.4. Free for research; commercial use follows the Llama 4 Community License.

Rinna Nekomata-2 (rinna): Qwen 3 72B base, commercially usable under Apache 2.0. Outperforms Swallow in honorifics, formal register, and business document fluency; JMT-Bench 8.47.

Sarashina 2.5 (SB Intuitions): hybrid of scratch training and Llama 4 distillation. Two sizes: 405B and 70B. As the standard-bearer for domestically developed "sovereign AI," adoption in finance, healthcare, and municipal government is accelerating rapidly.

The key 2026 trend: Japanese-specific model development has decomposed into three stages — base model selection × Japanese synthetic data × lightweight preference learning — reproducible by anyone with a few hundred lines of axolotl YAML.

Reproducible Recipe: axolotl × unsloth

unsloth in its 2026 version has improved QLoRA memory efficiency by 4.2x, reaching the point where a 70B QLoRA run fits on a single RTX 4090. axolotl supports both distributed training and preference learning with high reproducibility in multi-node, multi-GPU setups.

A typical Japanese instruction-tuning recipe:

Choose base model (Qwen 3 7B Base)
Japanese synthetic data: 500K examples (Claude Opus 4.7 distillation, CC-BY-4.0)
unsloth + QLoRA r=128, 3 epochs, 18 hours on a single 3090
SimPO phase: 100K pairs from rinna/ultrafeedback-ja, 6 hours on a single 4090
Evaluation: JMT-Bench, Jaster, elyza-tasks-100

Total cost: roughly $180 in cloud compute equivalents. The era of building a Japanese model that outperforms 2024 commercial APIs has arrived.

Ethics and Data Provenance

The most important point to emphasize is data provenance. Even synthetic data carries the shadow of the teacher model's training data and its copyright implications. Since the EU AI Act took effect in 2026, models intended for European deployment must document:

License list of seed data (including robots.txt compliance status)
Teacher model ToS and derivative work clauses
PII removal methodology and filter accuracy
Bias evaluation (BBQ-ja, StereoSet-ja, etc.)
Right-to-erasure compliance procedures

Hugging Face made Dataset Cards v2 mandatory in March 2026; datasets lacking the above information are excluded from download statistics displays. If you're building for commercial use, provenance documentation is a high-ROI investment.

What to Watch in H2 2026

Self-improvement loops (self-play, self-reward) are moving from research to practical application. Successors to Meta's Self-Rewarding Language Models, public implementations of Anthropic's Constitutional AI, and a Japanese-language "Constitutional AI" developed domestically are all anticipated. The era has arrived where fine-tuning practitioners are differentiated not by algorithm mastery but by their skill in data design and evaluation design.

Open LLM Fine-Tuning 2026: Synthetic Data, DPO Variants, Japanese-Specific Models