A French biomedical instruction dataset and model suite for studying how data provenance (native, synthetic, translated) impacts instruction-tuning of LLMs.
Total size: 571,436 instruction–response pairs
Components:
Tasks: