ENHANCING CODE GENERATION ACCURACY USING FINE-TUNING AND TASK-ADAPTIVE PRETRAINING WITH DOMAIN-SPECIFIC DATA AUGMENTATION
Abstract
Recent advancements in deep learning, particularly through Transformer architectures, have significantly improved code generation tasks. However, current pre-trained language models still encounter limitations when applied to code generation. The Improved RoBERTaMarian model, built upon the Marian neural machine translation framework, addresses these limitations by fine-tuning on natural language descriptions to generate code. The model was trained and tested on Django and CoNaLa datasets. The results in the CoNaLa dataset, was BLEU score of 36.834, Exact Match Accuracy of 15.300%, SacreBLEU score of 34.215, and ROUGE score of 49.827, reflecting its ability to generate accurate and semantically aligned code. Similarly, when evaluated on the Django dataset, the Improved RoBERTaMarian model outperformed BERTMarian, ELECTRAMarian, LUKEMarian, MarianCG and RoBERTaMarian models with a BLEU score of 91.230, Exact Match Accuracy of 83.676%, SacreBLEU score of 75.984, and ROUGE score of 95.210. These results indicate that the Improved RoBERTaMarian model excels in both syntactic and semantic code generation, making it a robust solution for applications requiring precise, contextually relevant code generation. Its high performance suggests significant potential for use in automated code synthesis and language model-based code assistants in software engineering tasks.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Science World Journal
This work is licensed under a Creative Commons Attribution 4.0 International License.