AI Optimization (Beta)

AI Optimization (Beta)

AI Recommended Specs Tab

Overview: As AI usage has increased recently, the need for AI optimization has also increased, so this feature recommends appropriate settings based on AI usage.

This is a new feature opened on 2025-09-22.

  • This is an Azure Open AI optimization feature, and AWS and GCP AI features will also be added in the future.

1. RightSizing screen

1.1 Full screen

image-20250925-045946.png
  1. Tab Summary Information

    1. Up: Number of resources with increased cost due to recommendation

    2. Down: Number of resources with a cost reduction based on recommendations

  2. Full Summary

    1. Shows a summary of all recommended information

  3. Summary information card by recommendation type

    1. Shows the amount and quantity saved by recommended type

    2. Recommendation method

      1. 3-1. Model update

        1. Replace your EOS or Legacy model with the latest GA version.

        2. Depending on the model, costs may increase or decrease.

      2. 3-2. Version Update

        1. Please update your model version.

        2. LLM has the latest data available.

        3. It may have a cost-saving effect.

      3. 3-3. Model Type Optimization

        1. Recommended exclusively for Azure.

        2. Switch from Standard type to Provisioned or Batch.

        3. Provisioned

          1. Secure and use PTU in advance.

          2. We recommend the number of PTUs you need based on your usage.

          3. CloudXper is 1-month unit reservation purchase costIt is calculated as follows.

          4. Cost savings only occur when using a large number of tokens.

          5. If you compare the Standard usage cost and Provisioned usage cost during the measurement period and the Provisioned usage cost is lower, we recommend Provisioned.

          6. IMPORTANT: Purchase according to your usage patterns and purposes.

        4. Batch

          1. We will respond within 24 hours, not in real time.

          2. Cost savings of up to 1/2 compared to Standard.

          3. IMPORTANT: This is not real-time, so adjust it to suit your usage patterns and purposes.

      4. 3-4. Model position optimization

        1. Recommended exclusively for Azure.

        2. Switch from Datazone/Regional to Global.

        3. Cost savings of up to 1/2 are possible.

        4. IMPORTANT: It is difficult to predict which region a request will be routed to, so consider the purpose and security before making any changes.

      5. 3-5. Consider using cache

        1. If the input prompt hits the cache, it can save up to 1/2 the cost.

        2. CloudXper calculates costs assuming 20% cache utilization.

        3. Tune your prompts to increase the likelihood of a cache hit.

      6. 3-6. Normal

        1. If it does not meet the recommended criteria, it is normal.

      7. 3-7. No data

        1. If performance indicators are not collected for more than 3 days No dataI recommend it.

        2. If there is no request for more than 14 days No data - TerminationI recommend it.

  4. Search conditions

    1. It syncs with the summary information card by recommendation type.

    2. Only the selected recommendation types will be displayed in the Search Results.

  5. Search Results

    1. Shows recommended information for all resources

    2. Only the recommendation types with the highest recommendation priority for each resource are shown.

    3. Main Column

      1. 5-1: Recommendation, Detail, Savings($)

        1. Displays the recommended type and recommendation details.

        2. Savings:

          1. Difference in cost when changing to current cost and recommended type

          2. Cost calculation is based on usage during the From and To periods.

      2. 5-2: Request, Input Token, Output Token

        1. Shows the total usage for the From and To periods.

      3. 5-3: Cache Match Rate (Avg)

        1. Shows the average cache hit rate over the From and To periods.

        2. Because we used an average, it may differ from the actual cache hit rate.

      4. 5-4: From, To

        1. The period for which usage is aggregated.

        2. Up to 3 months

1.2 Optimization Planner screen: Slide when clicking on Row

image-20250925-051639.png
  1. Current resource information

  2. Recommendation types based on recommendation priority

    1. Recommended Priority

      1. Recommended for deletion (if no requests have been made for 14 days or more)

      2. Model update 

      3. Latest version 

      4. Model Type Optimization (Provisioned Recommended) 

      5. Model Position Optimization (Global Recommendation) 

      6. Consider using cache (20%)

      7. Model Type Optimization (Batch Recommendation) 

      8. normal