ENHANCING CODE GENERATION ACCURACY USING FINE-TUNING AND TASK-ADAPTIVE PRETRAINING WITH DOMAIN-SPECIFIC DATA AUGMENTATION

Authors

  • Thomas Lass Barna Department of Software Engineering, Mewar International University, Abuja,
  • Samson Isaac Department of Computer Science, Kaduna State University, Kaduna,
  • Amina Bala Jaafaru Department of Computer Science, Kaduna State University, Kaduna,
  • Hajara Idris Department of Computer Science, Kaduna State University, Kaduna,
  • Ramat Imam Abba Department of Cyber Security, Air Force Institute of Technology, Kaduna,

Abstract

Recent advancements in deep learning, particularly through Transformer architectures, have significantly improved code generation tasks. However, current pre-trained language models still encounter limitations when applied to code generation. The Improved RoBERTaMarian model, built upon the Marian neural machine translation framework, addresses these limitations by fine-tuning on natural language descriptions to generate code. The model was trained and tested on Django and CoNaLa datasets. The results in the CoNaLa dataset, was BLEU score of 36.834, Exact Match Accuracy of 15.300%, SacreBLEU score of 34.215, and ROUGE score of 49.827, reflecting its ability to generate accurate and semantically aligned code. Similarly, when evaluated on the Django dataset, the Improved RoBERTaMarian model outperformed BERTMarian, ELECTRAMarian, LUKEMarian, MarianCG and RoBERTaMarian models with a BLEU score of 91.230, Exact Match Accuracy of 83.676%, SacreBLEU score of 75.984, and ROUGE score of 95.210. These results indicate that the Improved RoBERTaMarian model excels in both syntactic and semantic code generation, making it a robust solution for applications requiring precise, contextually relevant code generation. Its high performance suggests significant potential for use in automated code synthesis and language model-based code assistants in software engineering tasks.

Downloads

Published

2024-12-30

Issue

Section

ARTICLES