What IT services does KGA provide?

KGA provides comprehensive IT support services including software installation and setup, SaaS system maintenance, application configuration, technical support, digital consulting (including website development), security services, and data management & backup solutions.

What areas do you cover?

Based in Kosai, Shizuoka, we provide remote support nationwide across Japan. On-site support is available primarily in the Tokai region.

Can I consult before signing a contract?

Yes, initial consultation and estimates are completely free. We will listen to your IT challenges and propose the optimal solution.

Is emergency support available?

Yes, the Premium plan includes 24-hour emergency support. The Standard plan also provides priority response during business hours.

Can you set up international TV apps?

Yes, we support the installation and configuration of international TV applications and media players. We help set up environments for legal access to international content.

Do you offer multilingual support?

We support 9 languages: Japanese, English, Portuguese, Korean, Chinese, Malay, Filipino, Vietnamese, and Spanish.

Are there any setup or hidden fees?

No. All prices displayed are final and tax-included. There are no setup fees, hidden charges, or surprise invoices. What you see is exactly what you pay.

Can I change plans later?

Yes. You can upgrade, downgrade, or cancel at any time. Upgrades take effect immediately and we will prorate the difference. Downgrades take effect at the next renewal cycle.

Which payment methods do you accept?

We accept all major credit cards (Visa, Mastercard, JCB, American Express) through Komoju, as well as bank transfers and convenience store payments in Japan. Invoicing is available for Business IT Plan customers.

Do you offer refunds?

Yes. We offer a 14-day money-back guarantee on all annual plans — no questions asked. Monthly Business IT Plan subscriptions can be cancelled at any time with prorated refunds for unused service.

What is the difference between the annual plans and the Business IT Plan?

Annual plans cover app configuration and support for individuals and small teams. The Business IT Plan is a comprehensive monthly subscription for companies that require website development, system management, automation, security, and a dedicated account manager.

Do you provide support in English?

Yes. Our team provides full multilingual support in Japanese, English, Portuguese, Korean, Chinese, Malay, Filipino, Vietnamese, and Spanish — by email, chat, and scheduled video calls.

Embedding Models 2026 Landscape: text-embedding-3-large, Voyage-3, Cohere Embed v4, BGE-M3, Jina v3 — KGA Tech Blog

Why Embedding Models Have Become the Primary Battleground Again

In the early days of the 2023 RAG boom, embedding models were largely dismissed — "text-embedding-ada-002 is good enough" was the prevailing attitude. The situation in 2026 is entirely different. As LLM generation quality approaches saturation for some tasks, it has become common understanding that what you fail to retrieve sets the ceiling for the entire RAG system. Embedding model selection and tuning have recovered their status as the highest-ROI investment available.

Hugging Face's MTEB (Massive Text Embedding Benchmark) leaderboard now lists 200+ models as of April 2026, but the practical options for Japanese-language products narrow down to around 10. This post compares OpenAI, Voyage, Cohere, BAAI, and Jina across MTEB, BEIR, and JMTEB scores alongside ease of implementation.

April 2026 Score Summary

OpenAI text-embedding-3-large: MTEB 64.6, JMTEB 75.2, 3072 dimensions, 8192-token context, $0.13/M tokens
OpenAI Embed-4 (released 2026/02): MTEB 68.9, JMTEB 79.8, 4096 dimensions, 32k-token context, $0.18/M tokens
Voyage-3-large: MTEB 67.8, BEIR 58.3, 1024/2048/4096 dimensions, 32k-token context, $0.18/M tokens
Voyage-3: MTEB 64.2, 1024 dimensions, 32k-token context, $0.06/M tokens (among the best value-for-cost)
Cohere Embed v4: MTEB 68.1, JMTEB 77.5, 1536 dimensions, 128k-token context, $0.12/M tokens, multimodal
BGE-M3 (BAAI): MTEB 59.4, JMTEB 73.1, 1024 dimensions, 8192-token context, OSS (MIT)
Jina Embeddings v3: MTEB 65.5, JMTEB 74.8, 1024 dimensions (Matryoshka), 8192-token context, OSS + API

On raw scores, Embed-4 and Voyage-3-large sit at the top — but the production decision depends on four axes: (1) fine-tuning feasibility for your domain, (2) latency and dimensionality, (3) multilingual and Japanese-language performance, (4) data residency compliance.

Japanese Performance: Where JMTEB Matters

JMTEB, a Japanese-language MTEB maintained by a research group at Tokyo Institute of Technology, evaluates Retrieval, STS, Classification, Clustering, and Reranking in a combined score. Models that rank highly on English MTEB can shift significantly on JMTEB.

April 2026 JMTEB trends:

On the Japanese Retrieval subset: Embed-4 > Voyage-3-large > Cohere Embed v4 > BGE-M3 > text-embedding-3-large
On STS (sentence similarity): Cohere Embed v4 leads. It handles the wide variety of Japanese keigo and phrasing variations particularly well.
On cross-lingual retrieval (Japanese-English semantic search): BGE-M3 performs surprisingly well, within close range of Voyage-3-large.

For domestic finance and public sector projects where data cannot cross borders, OpenAI, Voyage, and Cohere are off the table. Self-hosting BGE-M3 is the only viable path. In 2026, running GGUF/AWQ quantized BGE-M3 via Llama.cpp or vLLM on a single H100 can handle 2,000 req/s — it has become the default embedding model for on-premises RAG deployments.

Matryoshka Representation: Layering Dimensions

Matryoshka Representation Learning (MRL) is a 2026 technology common to Voyage-3, Jina v3, and OpenAI's text-embedding-3 series. The model is trained with a hierarchical loss function so that just the first k dimensions of the resulting vector still carry sufficient semantic meaning.

Previously, "3072 dimensions gives high accuracy but is heavy; truncating to 256 dimensions causes accuracy to collapse" was the tradeoff. With MRL-enabled models, a two-stage approach becomes possible: index at 3072 dimensions, run first-stage retrieval at 256 dimensions for speed, then re-rank with the full 3072 dimensions at the second stage.

```python from openai import OpenAI import numpy as np

client = OpenAI() resp = client.embeddings.create( model="text-embedding-3-large", input=texts, dimensions=256, # Return only first 256 dimensions via MRL ) short_vecs = np.array([d.embedding for d in resp.data])

# Separately obtain full 3072-dimension vectors for full re-ranking resp_full = client.embeddings.create( model="text-embedding-3-large", input=texts, ) full_vecs = np.array([d.embedding for d in resp_full.data]) ```

Storing 100M vectors at 3072 dimensions requires 1.2 TB of raw data. Trimming to 256 dimensions with MRL brings that down to 100 GB, making HNSW construction and memory residency practical. Combined with the multi-vector capabilities in Qdrant and Weaviate, this becomes even more powerful.

ColBERT and Late Interaction

In 2026, production RAG systems implementing Late Interaction (ColBERT-style retrieval) alongside or in place of single dense vectors have become more common. ColBERT does not compress a document into a single vector — it retains token-level vector arrays, and similarity is computed against query-side token vectors using MaxSim.

Nuance retention in long documents is dramatically better than single dense vectors
Storage cost is 10–50× that of dense (depending on token count)
Qdrant 1.12, Vespa, and Weaviate 1.28 all support multi-vector natively

Jina-ColBERT-v2 and ColBERTv2 (Stanford) deliver retrieval performance approaching top dense models on MTEB asymmetric tasks while remaining more robust to domain shift. They are particularly effective for long contracts, academic papers, and source code — content that cannot be adequately compressed into a single vector.

```python from ragatouille import RAGPretrainedModel

rag = RAGPretrainedModel.from_pretrained("jinaai/jina-colbert-v2") rag.index( collection=documents, index_name="contracts", max_document_length=512, split_documents=True, ) hits = rag.search("Warranty period clauses for LCD panels", k=20) ```

Domain-Specific Fine-Tuning

General-purpose embedding models are broad but inevitably lose to specialized domain models in areas like healthcare, legal, or proprietary terminology. In 2026, the toolchain for contrastive fine-tuning can be run in a matter of hours.

sentence-transformers 3.x: `SentenceTransformerTrainer` with LoRA fine-tuning
Voyage Fine-tuning API: Generates a custom model from domain query/document pairs in around 2 hours
Cohere Custom Models: Domain learning for both Rerank and Embed

On a KGA healthcare project, fine-tuning BGE-M3 on 50,000 pairs of in-house Q&A produced an 8-point improvement in nDCG@10 on the JMTEB healthcare Retrieval subset and a 14-point improvement on the internal test set. The value of breaking dependency on the best general-purpose model is significant.

Selection Rules

Broad domain, English-centric, API access acceptable — Voyage-3-large
Japanese-centric, 128k long-context required — Cohere Embed v4
On-premises required, MIT license — BGE-M3
Cost efficiency above all — Voyage-3 or text-embedding-3-small
Unified image + text retrieval — Cohere Embed v4 (multimodal)
Long-text nuance at highest priority — Jina-ColBERT-v2 (Late Interaction)

Embedding selection is not "pick one model and done." The 2026 production architecture is a layered design: first-stage dense + second-stage Late Interaction + domain reranker.

Embedding Models 2026 Landscape: text-embedding-3-large, Voyage-3, Cohere Embed v4, BGE-M3, Jina v3