AI Optimization (Beta)
AI Recommended Specs Tab
Overview: As AI usage has increased recently, the need for AI optimization has also increased, so this feature recommends appropriate settings based on AI usage.
This is a new feature opened on 2025-09-22.
This is an Azure Open AI optimization feature, and AWS and GCP AI features will also be added in the future.
1. RightSizing screen
1.1 Full screen
Tab Summary Information
Up: Number of resources with increased cost due to recommendation
Down: Number of resources with a cost reduction based on recommendations
Full Summary
Shows a summary of all recommended information
Summary information card by recommendation type
Shows the amount and quantity saved by recommended type
Recommendation method
3-1. Model update
Replace your EOS or Legacy model with the latest GA version.
Depending on the model, costs may increase or decrease.
3-2. Version Update
Please update your model version.
LLM has the latest data available.
It may have a cost-saving effect.
3-3. Model Type Optimization
Recommended exclusively for Azure.
Switch from Standard type to Provisioned or Batch.
Provisioned
Secure and use PTU in advance.
We recommend the number of PTUs you need based on your usage.
CloudXper is 1-month unit reservation purchase costIt is calculated as follows.
Cost savings only occur when using a large number of tokens.
If you compare the Standard usage cost and Provisioned usage cost during the measurement period and the Provisioned usage cost is lower, we recommend Provisioned.
IMPORTANT: Purchase according to your usage patterns and purposes.
Batch
We will respond within 24 hours, not in real time.
Cost savings of up to 1/2 compared to Standard.
IMPORTANT: This is not real-time, so adjust it to suit your usage patterns and purposes.
3-4. Model position optimization
Recommended exclusively for Azure.
Switch from Datazone/Regional to Global.
Cost savings of up to 1/2 are possible.
IMPORTANT: It is difficult to predict which region a request will be routed to, so consider the purpose and security before making any changes.
3-5. Consider using cache
If the input prompt hits the cache, it can save up to 1/2 the cost.
CloudXper calculates costs assuming 20% cache utilization.
Tune your prompts to increase the likelihood of a cache hit.
3-6. Normal
If it does not meet the recommended criteria, it is normal.
3-7. No data
If performance indicators are not collected for more than 3 days No dataI recommend it.
If there is no request for more than 14 days No data - TerminationI recommend it.
Search conditions
It syncs with the summary information card by recommendation type.
Only the selected recommendation types will be displayed in the Search Results.
Search Results
Shows recommended information for all resources
Only the recommendation types with the highest recommendation priority for each resource are shown.
Main Column
5-1: Recommendation, Detail, Savings($)
Displays the recommended type and recommendation details.
Savings:
Difference in cost when changing to current cost and recommended type
Cost calculation is based on usage during the From and To periods.
5-2: Request, Input Token, Output Token
Shows the total usage for the From and To periods.
5-3: Cache Match Rate (Avg)
Shows the average cache hit rate over the From and To periods.
Because we used an average, it may differ from the actual cache hit rate.
5-4: From, To
The period for which usage is aggregated.
Up to 3 months
1.2 Optimization Planner screen: Slide when clicking on Row
Current resource information
Recommendation types based on recommendation priority
Recommended Priority
Recommended for deletion (if no requests have been made for 14 days or more)
Model update
Latest version
Model Type Optimization (Provisioned Recommended)
Model Position Optimization (Global Recommendation)
Consider using cache (20%)
Model Type Optimization (Batch Recommendation)
normal