{"id":18435,"date":"2024-04-24T15:48:05","date_gmt":"2024-04-24T08:48:05","guid":{"rendered":"https:\/\/gcloudvn.com\/?p=18435"},"modified":"2024-04-25T11:24:19","modified_gmt":"2024-04-25T04:24:19","slug":"introducing-llm-fine-tuning-and-evaluation-in-bigquery","status":"publish","type":"post","link":"https:\/\/gcloudvn.com\/en\/kienthuc\/introducing-llm-fine-tuning-and-evaluation-in-bigquery\/","title":{"rendered":"Introducing LLM fine-tuning and evaluation in BigQuery"},"content":{"rendered":"<section class=\"wpb-content-wrapper\"><div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<p><span style=\"font-weight: 400;\">BigQuery cho ph\u00e9p b\u1ea1n ph\u00e2n t\u00edch d\u1eef li\u1ec7u c\u1ee7a m\u00ecnh b\u1eb1ng c\u00e1ch s\u1eed d\u1ee5ng m\u1ed9t lo\u1ea1t c\u00e1c m\u00f4 h\u00ecnh ng\u00f4n ng\u1eef l\u1edbn (LLMs) \u0111\u01b0\u1ee3c l\u01b0u tr\u1eef trong Vertex AI bao g\u1ed3m Gemini 1.0 Pro, Gemini 1.0 Pro Vision v\u00e0 text-bison. C\u00e1c m\u00f4 h\u00ecnh n\u00e0y ho\u1ea1t \u0111\u1ed9ng t\u1ed1t cho nhi\u1ec1u nhi\u1ec7m v\u1ee5 nh\u01b0 t\u00f3m t\u1eaft v\u0103n b\u1ea3n, ph\u00e2n t\u00edch t\u00e2m tr\u1ea1ng, vv. ch\u1ec9 b\u1eb1ng c\u00e1ch s\u1eed d\u1ee5ng k\u1ef9 thu\u1eadt prompt. Tuy nhi\u00ean, trong m\u1ed9t s\u1ed1 t\u00ecnh hu\u1ed1ng, vi\u1ec7c tinh ch\u1ec9nh b\u1ed5 sung th\u00f4ng qua vi\u1ec7c tinh ch\u1ec9nh m\u00f4 h\u00ecnh l\u00e0 c\u1ea7n thi\u1ebft, nh\u01b0 khi h\u00e0nh vi mong \u0111\u1ee3i c\u1ee7a m\u00f4 h\u00ecnh kh\u00f3 m\u00f4 t\u1ea3 m\u1ed9t c\u00e1ch ng\u1eafn g\u1ecdn trong m\u1ed9t prompt, ho\u1eb7c khi prompt kh\u00f4ng t\u1ea1o ra k\u1ebft qu\u1ea3 mong \u0111\u1ee3i m\u1ed9t c\u00e1ch \u0111\u1ed3ng nh\u1ea5t \u0111\u1ee7.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Fine-tuning also helps the model learn specific response styles (e.g., terse or verbose), new behaviors (e.g., answering as a specific persona), or to update itself with new information.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Today, we are announcing support for customizing LLMs in BigQuery with supervised fine-tuning. Supervised fine-tuning via BigQuery uses a dataset which has examples of input text (the prompt) and the expected ideal output text (the label), and fine-tunes the model to mimic the behavior or task implied from these examples.Let\u2019s see how this works.<\/span><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_80 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/introducing-llm-fine-tuning-and-evaluation-in-bigquery\/#Cac_tinh_nang_tieu_bieu\" >Feature walkthrough<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/introducing-llm-fine-tuning-and-evaluation-in-bigquery\/#Dataset\" >Dataset\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/introducing-llm-fine-tuning-and-evaluation-in-bigquery\/#Hieu_ve_hieu_suat_co_ban_cua_mo_hinh_text-bison\" >Baseline performance of text-bison model<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/introducing-llm-fine-tuning-and-evaluation-in-bigquery\/#Tao_mot_mo_hinh_duoc_tinh_chinh\" >Creating a fine-tuned model<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/introducing-llm-fine-tuning-and-evaluation-in-bigquery\/#Danh_gia_hieu_xuat_tinh_chinh_model\" >Evaluating performance of fine-tuned model<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/introducing-llm-fine-tuning-and-evaluation-in-bigquery\/#Danh_gia_dua_tren_chi_so_cho_mo_hinh_duoc_tinh_chinh\" >Metrics based evaluation for fine tuned model<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/introducing-llm-fine-tuning-and-evaluation-in-bigquery\/#San_sang_cho_suy_luan\" >Ready for inference\u00a0<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Cac_tinh_nang_tieu_bieu\"><\/span><b>Feature walkthrough<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">To illustrate model fine-tuning, let\u2019s  look at a classification problem using text data. We\u2019ll use a medical transcription dataset and ask our model to classify a given transcript into one of 17 categories, e.g. \u2018Allergy\/Immunology\u2019, \u2018Dentistry\u2019, \u2018Cardiovascular\/ Pulmonary\u2019, etc.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Dataset\"><\/span><b>Dataset\u00a0<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Our dataset is from<\/span><span style=\"font-weight: 400;\">\u00a0cung c\u1ea5p tr\u00ean <\/span><span style=\"font-weight: 400;\">Kaggle<\/span><span style=\"font-weight: 400;\">. To fine-tune and evaluate our model, we first create an evaluation table and a training table in BigQuery using a subset of this data available in Cloud Storage as follows:<\/span><\/p>\n<p>&#8212; Create a eval table<\/p>\n<p>LOAD DATA INTO<br \/>\nbqml_tutorial.medical_transcript_eval<br \/>\nFROM FILES( format=&#8217;NEWLINE_DELIMITED_JSON&#8217;,<br \/>\nuris = [&#8216;gs:\/\/cloud-samples-data\/vertex-ai\/model-evaluation\/peft_eval_sample.jsonl&#8217;] )<\/p>\n<p>&#8212; Create a train table<\/p>\n<p>LOAD DATA INTO<br \/>\nbqml_tutorial.medical_transcript_train<br \/>\nFROM FILES( format=&#8217;NEWLINE_DELIMITED_JSON&#8217;,<br \/>\nuris = [&#8216;gs:\/\/cloud-samples-data\/vertex-ai\/model-evaluation\/peft_train_sample.jsonl&#8217;] )<\/p>\n<p><span style=\"font-weight: 400;\">The training and evaluation dataset has an \u2018input_text\u2019 column that contains the transcript, and a \u2018output_text\u2019 column that contains the label, or ground truth.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-18429 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry.jpg\" alt=\"\" width=\"601\" height=\"388\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry.jpg 601w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry-18x12.jpg 18w\" sizes=\"auto, (max-width: 601px) 100vw, 601px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Hieu_ve_hieu_suat_co_ban_cua_mo_hinh_text-bison\"><\/span><b>Baseline performance of text-bison model<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">First, let\u2019s establish a performance baseline for the text-bison model. You can create a remote text-bison model in BigQuery using a SQL statement like the one below.<\/span><\/p>\n<p>CREATE OR REPLACE MODEL<br \/>\n`bqml_tutorial.text_bison_001` REMOTE<br \/>\nWITH CONNECTION `LOCATION. ConnectionID`<br \/>\nOPTIONS (ENDPOINT =&#8217;text-bison@001&#8242;)<\/p>\n<p><span style=\"font-weight: 400;\">For inference on the model, we first construct a prompt by concatenating the task description for our model and the transcript from the tables we created. We then use the <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/reference\/standard-sql\/bigqueryml-syntax-generate-text#text-bison\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">ML.Generate_Text<\/span><\/a><span style=\"font-weight: 400;\"> function to get the output. While the model gets many classifications correct out of the box, it classifies some transcripts erroneously. Here\u2019s a sample response where it classifies incorrectly.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Prompt<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u200b<\/span><span style=\"font-weight: 400;\">Please assign a label for the given medical transcript from among these labels [Allergy \/ Immunology, Autopsy, Bariatrics, Cardiovascular \/ Pulmonary, Chiropractic, Consult &#8211; History and Phy., Cosmetic \/ Plastic Surgery, Dentistry, Dermatology, Diets and Nutritions, Discharge Summary, ENT &#8211; Otolaryngology, Emergency Room Reports, Endocrinology, Gastroenterology, General Medicine, Hematology &#8211; Oncology, Hospice &#8211; Palliative Care, IME-QME-Work Comp etc., Lab Medicine &#8211; Pathology, Letters, Nephrology, Neurology, Neurosurgery, Obstetrics \/ Gynecology, Office Notes, Ophthalmology, Orthopedic, Pain Management, Pediatrics &#8211; Neonatal, Physical Medicine &#8211; Rehab, Podiatry, Psychiatry \/ Psychology, Radiology, Rheumatology, SOAP \/ Chart \/ Progress Notes, Sleep Medicine, Speech &#8211; Language, Surgery, Urology]. TRANSCRIPT:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">INDICATIONS FOR PROCEDURE:, The patient has presented with atypical type right arm discomfort and neck discomfort. She had noninvasive vascular imaging demonstrating suspected right subclavian stenosis. Of note, there was bidirectional flow in the right vertebral artery, as well as 250 cm per second velocities in the right subclavian. Duplex ultrasound showed at least a 50% stenosis.,APPROACH:, Right common femoral artery.,ANESTHESIA:, IV sedation with cardiac catheterization protocol. Local infiltration with 1% Xylocaine.,COMPLICATIONS:, None.,ESTIMATED BLOOD LOSS:, Less than 10 ml.,ESTIMATED CONTRAST:, Less than 250 ml.,PROCEDURE PERFORMED:, Right brachiocephalic angiography, right subclavian angiography, selective catheterization of the right subclavian, selective aortic arch angiogram, right iliofemoral angiogram, 6 French Angio-Seal placement.,DESCRIPTION OF PROCEDURE:, The patient was brought to the cardiac catheterization lab in the usual fasting state. She was laid supine on the cardiac catheterization table, and the right groin was prepped and draped in the usual sterile fashion. 1% Xylocaine was infiltrated into the right femoral vessels. Next, a #6 French sheath was introduced into the right femoral artery via the modified Seldinger technique.,AORTIC ARCH ANGIOGRAM:, Next, a pigtail catheter was advanced to the aortic arch. Aortic arch angiogram was then performed with injection of 45 ml of contrast, rate of 20 ml per second, maximum pressure 750 PSI in the 4 degree LAO view.,SELECTIVE SUBCLAVIAN ANGIOGRAPHY:, Next, the right subclavian was selectively cannulated. It was injected in the standard AP, as well as the RAO view. Next pull back pressures were measured across the right subclavian stenosis. No significant gradient was measured.,ANGIOGRAPHIC DETAILS:, The right brachiocephalic artery was patent. The proximal portion of the right carotid was patent. The proximal portion of the right subclavian prior to the origin of the vertebral and the internal mammary showed 50% stenosis.,IMPRESSION:,1. Moderate grade stenosis in the right subclavian artery.,2. Patent proximal edge of the right carotid.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u200b<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Response<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Radiology<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">In the above case the correct classification should have been \u2018Cardiovascular\/ Pulmonary\u2019<\/span><\/p>\n<p><b>Metrics-based evaluation for base model<\/b><span style=\"font-weight: 400;\"> To perform a more robust evaluation of the model\u2019s performance, you can use BigQuery\u2019s <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/reference\/standard-sql\/bigqueryml-syntax-evaluate\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">ML.EVALUATE<\/span><\/a><span style=\"font-weight: 400;\"> function to compute metrics on how the model responses compare against the ideal responses from a test\/eval dataset. You can do so as follows:<\/span><\/p>\n<p>&#8212; Evaluate base model<\/p>\n<p>SELECT<br \/>\n*<br \/>\nFROM<br \/>\nml.evaluate(MODEL bqml_tutorial.text_bison_001,<br \/>\n(<br \/>\nSELECT<br \/>\nCONCAT(&#8220;Please assign a label for the given medical transcript from among these labels [Allergy \/ Immunology, Autopsy, Bariatrics, Cardiovascular \/ Pulmonary, Chiropractic, Consult &#8211; History and Phy., Cosmetic \/ Plastic Surgery, Dentistry, Dermatology, Diets and Nutritions, Discharge Summary, ENT &#8211; Otolaryngology, Emergency Room Reports, Endocrinology, Gastroenterology, General Medicine, Hematology &#8211; Oncology, Hospice &#8211; Palliative Care, IME-QME-Work Comp etc., Lab Medicine &#8211; Pathology, Letters, Nephrology, Neurology, Neurosurgery, Obstetrics \/ Gynecology, Office Notes, Ophthalmology, Orthopedic, Pain Management, Pediatrics &#8211; Neonatal, Physical Medicine &#8211; Rehab, Podiatry, Psychiatry \/ Psychology, Radiology, Rheumatology, SOAP \/ Chart \/ Progress Notes, Sleep Medicine, Speech &#8211; Language, Surgery, Urology]. &#8220;, input_text) AS input_text,<br \/>\noutput_text<br \/>\nFROM<br \/>\n`bqml_tutorial.medical_transcript_eval` ),<br \/>\nSTRUCT(&#8220;classification&#8221; AS task_type))<\/p>\n<p><span style=\"font-weight: 400;\">In the above code we provided an evaluation table as input and chose \u2018classification\u2018 as the task type on which we evaluate the model. We left other <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/generate-text-tuning#generate_text\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">inference parameters<\/span><\/a><span style=\"font-weight: 400;\"> at their defaults but they can be modified for the evaluation.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The evaluation metrics that are returned are computed for each class (label). The results look like following:<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-18428 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry-1.jpg\" alt=\"\" width=\"511\" height=\"585\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry-1.jpg 511w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry-1-10x12.jpg 10w\" sizes=\"auto, (max-width: 511px) 100vw, 511px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Focusing on the <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/F-score\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">F1 Score<\/span><\/a><span style=\"font-weight: 400;\"> (harmonic mean of precision and recall), you can see that the model performance varies between classes. For example, the baseline model performs well for \u2018Autopsy\u2019, \u2018Diets and Nutritions\u2019, and \u2018Dentistry\u2019, but performs poorly for \u2018Consult - History and Phy.\u2019, \u2018Chiropractic\u2019, and \u2018Cardiovascular \/ Pulmonary\u2019 classes.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Now let\u2019s fine-tune our model and see if we can improve on this baseline performance.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Tao_mot_mo_hinh_duoc_tinh_chinh\"><\/span><b>Creating a fine-tuned model<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Creating a fine-tuned model in BigQuery is simple. You can perform fine-tuning by specifying the training data with \u2018prompt\u2019 and \u2018label\u2019 columns in it in the Create Model statement. We use the same prompt for fine-tuning that we used in the evaluation earlier. Create a fine-tuned model as follows:<\/span><\/p>\n<p>&#8212; Fine tune a textbison model<\/p>\n<p>CREATE OR REPLACE MODEL<br \/>\n`bqml_tutorial.text_bison_001_medical_transcript_finetuned` REMOTE<br \/>\nWITH CONNECTION `LOCATION. ConnectionID`<br \/>\nOPTIONS (endpoint=&#8221;text-bison@001&#8243;,<br \/>\nmax_iterations=300,<br \/>\ndata_split_method=&#8221;no_split&#8221;) AS<br \/>\nSELECT<br \/>\nCONCAT(&#8220;Please assign a label for the given medical transcript from among these labels [Allergy \/ Immunology, Autopsy, Bariatrics, Cardiovascular \/ Pulmonary, Chiropractic, Consult &#8211; History and Phy., Cosmetic \/ Plastic Surgery, Dentistry, Dermatology, Diets and Nutritions, Discharge Summary, ENT &#8211; Otolaryngology, Emergency Room Reports, Endocrinology, Gastroenterology, General Medicine, Hematology &#8211; Oncology, Hospice &#8211; Palliative Care, IME-QME-Work Comp etc., Lab Medicine &#8211; Pathology, Letters, Nephrology, Neurology, Neurosurgery, Obstetrics \/ Gynecology, Office Notes, Ophthalmology, Orthopedic, Pain Management, Pediatrics &#8211; Neonatal, Physical Medicine &#8211; Rehab, Podiatry, Psychiatry \/ Psychology, Radiology, Rheumatology, SOAP \/ Chart \/ Progress Notes, Sleep Medicine, Speech &#8211; Language, Surgery, Urology]. &#8220;, input_text) AS prompt,<br \/>\noutput_text AS label<br \/>\nFROM<br \/>\n`bqml_tutorial.medical_transcript_train`<\/p>\n<p><span style=\"font-weight: 400;\">The CONNECTION you use to create the fine-tuned model should have (a) Storage Object User  and (b) Vertex AI Service Agent roles attached. In addition, your Compute Engine (GCE) default service account should have an <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/generate-text-tuning#gce-service-account-access\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">editor <\/span><\/a><span style=\"font-weight: 400;\">\u00a0access to the project. Refer to the <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/generate-text-tutorial#create_a_connection\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">document<\/span><\/a><span style=\"font-weight: 400;\"> for guidance on working with BigQuery connections.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">BigQuery performs model fine-tuning using a technique known as Low-Rank Adaptation (LoRA. LoRA tuning is a parameter efficient tuning (PET) method that freezes the pretrained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture to <\/span><a href=\"https:\/\/arxiv.org\/pdf\/2106.09685.pdf\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">reduce the number of trainable parameters<\/span><\/a><span style=\"font-weight: 400;\">. The model fine-tuning itself happens on a Vertex AI compute and you have the option to choose GPUs or TPUs as accelerators. You are billed by BigQuery for the data scanned or slots used, as well as by Vertex AI for the Vertex AI resources consumed. The fine-tuning job creates a new model endpoint that represents the learned weights. The Vertex AI inference charges you incur when querying the fine-tuned model are the same as for the baseline model.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This fine-tuning job may take a couple of hours to complete, varying based on training options such as \u2018max_iterations\u2019. Once completed, you can find the details of your fine-tuned model in the BigQuery UI, where you will see a different remote endpoint for the fine-tuned model.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-18427 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry-2.jpg\" alt=\"\" width=\"601\" height=\"243\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry-2.jpg 601w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry-2-18x7.jpg 18w\" sizes=\"auto, (max-width: 601px) 100vw, 601px\" \/><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Currently, BigQuery supports fine-tuning of text-bison-001 and text-bison-002 models.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Danh_gia_hieu_xuat_tinh_chinh_model\"><\/span><b>Evaluating performance of fine-tuned model<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">You can now generate predictions from the fine-tuned model using code such as following:\u00a0<\/span><\/p>\n<p>SELECT<br \/>\nml_generate_text_llm_result,<br \/>\nlabel,<br \/>\nprompt<br \/>\nFROM<br \/>\nml.generate_text(MODEL bqml_tutorial.text_bison_001_medical_transcript_finetuned,<br \/>\n(<br \/>\nSELECT<br \/>\nCONCAT(&#8220;Please assign a label for the given medical transcript from among these labels [Allergy \/ Immunology, Autopsy, Bariatrics, Cardiovascular \/ Pulmonary, Chiropractic, Consult &#8211; History and Phy., Cosmetic \/ Plastic Surgery, Dentistry, Dermatology, Diets and Nutritions, Discharge Summary, ENT &#8211; Otolaryngology, Emergency Room Reports, Endocrinology, Gastroenterology, General Medicine, Hematology &#8211; Oncology, Hospice &#8211; Palliative Care, IME-QME-Work Comp etc., Lab Medicine &#8211; Pathology, Letters, Nephrology, Neurology, Neurosurgery, Obstetrics \/ Gynecology, Office Notes, Ophthalmology, Orthopedic, Pain Management, Pediatrics &#8211; Neonatal, Physical Medicine &#8211; Rehab, Podiatry, Psychiatry \/ Psychology, Radiology, Rheumatology, SOAP \/ Chart \/ Progress Notes, Sleep Medicine, Speech &#8211; Language, Surgery, Urology]. &#8220;, input_text) AS prompt,<br \/>\noutput_text as label<br \/>\nFROM<br \/>\n`bqml_tutorial.medical_transcript_eval`<br \/>\n),<br \/>\nSTRUCT(TRUE AS flatten_json_output))<\/p>\n<p><span style=\"font-weight: 400;\">Let us look at the response to the sample prompt we evaluated earlier. Using the same prompt, the model now classifies the transcript as \u2018Cardiovascular \/ Pulmonary\u2019 \u2014 the correct response.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Danh_gia_dua_tren_chi_so_cho_mo_hinh_duoc_tinh_chinh\"><\/span><b>Metrics based evaluation for fine tuned model<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Now, we will compute metrics on the fine-tuned model using the same evaluation data and the same prompt we previously used for evaluating the base model.<\/span><\/p>\n<p>&#8212; Evaluate fine tuned model<\/p>\n<p>SELECT<br \/>\n*<br \/>\nFROM<br \/>\nml.evaluate(MODEL bqml_tutorial.text_bison_001_medical_transcript_finetuned,<br \/>\n(<br \/>\nSELECT<br \/>\nCONCAT(&#8220;Please assign a label for the given medical transcript from among these labels [Allergy \/ Immunology, Autopsy, Bariatrics, Cardiovascular \/ Pulmonary, Chiropractic, Consult &#8211; History and Phy., Cosmetic \/ Plastic Surgery, Dentistry, Dermatology, Diets and Nutritions, Discharge Summary, ENT &#8211; Otolaryngology, Emergency Room Reports, Endocrinology, Gastroenterology, General Medicine, Hematology &#8211; Oncology, Hospice &#8211; Palliative Care, IME-QME-Work Comp etc., Lab Medicine &#8211; Pathology, Letters, Nephrology, Neurology, Neurosurgery, Obstetrics \/ Gynecology, Office Notes, Ophthalmology, Orthopedic, Pain Management, Pediatrics &#8211; Neonatal, Physical Medicine &#8211; Rehab, Podiatry, Psychiatry \/ Psychology, Radiology, Rheumatology, SOAP \/ Chart \/ Progress Notes, Sleep Medicine, Speech &#8211; Language, Surgery, Urology]. &#8220;, input_text) AS prompt,<br \/>\noutput_text as label<br \/>\nFROM<br \/>\n`bqml_tutorial.medical_transcript_eval`), STRUCT(&#8220;classification&#8221; AS task_type))<\/p>\n<p><span style=\"font-weight: 400;\">metrics from the fine-tuned model are below. Even though the fine-tuning (training) dataset we used for this blog contained only 519 examples, we already see a marked improvement in performance. F1 scores on the labels, where the model had performed poorly earlier, have improved, with the \u201cmacro\u201d F1 score (a simple average of F1 score across all labels) jumping from 0.54 to 0.66.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"San_sang_cho_suy_luan\"><\/span><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-18426 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry-3.jpg\" alt=\"\" width=\"521\" height=\"602\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry-3.jpg 521w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/04\/llm-bigquerry-3-10x12.jpg 10w\" sizes=\"auto, (max-width: 521px) 100vw, 521px\" \/><br \/>\n<b>Ready for inference\u00a0<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The fine-tuned model can now be used for inference using the ML.GENERATE_TEXT function, which we used in the previous steps to get the sample responses. You don\u2019t need to manage any additional infrastructure for your fine-tuned model and you are charged the same inference price as you would have incurred for the base model.<\/span><\/p>\n\n\t\t<\/div>\n\t<\/div>\n<div class=\"templatera_shortcode\"><div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\"><div class=\"vc_message_box vc_message_box-standard vc_message_box-rounded vc_color-blue\" ><div class=\"vc_message_box-icon\"><i class=\"vc-mono vc-mono-technorati\"><\/i><\/div><p><a href=\"https:\/\/gcloudvn.com\/en\/main-logo-1\/\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-664\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1.png\" alt=\"\" width=\"221\" height=\"72\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1.png 214w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1-18x6.png 18w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1-183x60.png 183w\" sizes=\"auto, (max-width: 221px) 100vw, 221px\" \/><\/a>As a senior partner of Google in Vietnam, Gimasys has more than 10+ years of experience, consulting on implementing digital transformation for 2000+ domestic corporations. Some typical customers Jetstar, Dien Quan Media, Heineken, Jollibee, Vietnam Airline, HSC, SSI...<\/p>\n<p>Gimasys is currently a strategic partner of many major technology companies in the world such as Salesforce, Oracle Netsuite, Tableau, Mulesoft.<\/p>\n<p>Contact Gimasys - Google Cloud Premier Partner for advice on strategic solutions suitable to the specific needs of your business:<\/p>\n<ul>\n<li>Email: gcp@gimasys.com<\/li>\n<li>Hotline: 0974 417 099<\/li>\n<\/ul>\n<\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div>\n<\/section>","protected":false},"excerpt":{"rendered":"BigQuery cho ph\u00e9p b\u1ea1n ph\u00e2n t\u00edch d\u1eef li\u1ec7u c\u1ee7a m\u00ecnh b\u1eb1ng c\u00e1ch s\u1eed d\u1ee5ng m\u1ed9t lo\u1ea1t c\u00e1c m\u00f4 h\u00ecnh ng\u00f4n ng\u1eef l\u1edbn (LLMs) \u0111\u01b0\u1ee3c l\u01b0u tr\u1eef trong Vertex AI bao g\u1ed3m Gemini 1.0 Pro, Gemini 1.0 Pro Vision v\u00e0 text-bison.&hellip;","protected":false},"author":2,"featured_media":18425,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-18435","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kienthuc","entry","has-media"],"_links":{"self":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/18435","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/comments?post=18435"}],"version-history":[{"count":0,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/18435\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media\/18425"}],"wp:attachment":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media?parent=18435"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/categories?post=18435"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/tags?post=18435"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}