Wednesday 4 March 2026
How Sawt Helped Floward Deliver 2 Million Flowers in a Single Day
On February 14th - the highest-pressure day in Floward's history - Sawt's system handled thousands of calls, 91.5% of them without any human intervention. Equivalent in output to a team of 120 agents. Customer satisfaction exceeded 93%.
"During our most demanding season, our partnership with Sawt allowed us to handle an unprecedented volume of orders with full operational stability - while maintaining the quality of experience Floward is known for."
- Mohammed Al-Arifi, COO, Floward
Floward is the leading flowers and gifts platform in the Middle East. Founded in 2017, the company set out to make sending a gift as enjoyable as receiving one. Today it operates across 3 continents.
What appears to the recipient as a well-arranged gift at their door is the product of a dense network of coordinated operations, all working toward one commitment: every order arrives on time, at the quality it promises.
On a normal day, this runs smoothly. On February 14th, the load is an order of magnitude higher.
The Day Everything Scales
11
12
FEB
16
17
Every year, the same problem. Demand multiplies in a single day with no tolerance for delays. Quality can't slip. Costs can't balloon. And at the center of the operational puzzle is a detail that's easy to miss: the person placing the order is almost never the person receiving it. No order can move forward until the recipient's delivery address is confirmed.
The recipient gets a notification asking for their location. If they don't share it before the cutoff window closes, the order is cancelled. That affects three people: the sender who planned the surprise, the recipient who expected it, and Floward, which absorbs the cost of an unfulfilled order.
Sawt: An Operational Platform, Not a Bot
Floward decided to stop solving the same problem every year. They partnered with Sawt, an AI platform for customer communication management, to run a full slice of their operations through a system that understands their business and acts on that understanding. The brief wasn't to build a bot that makes calls. It was to deploy something that operates with the same judgment a well-trained team would.
The voice agent that speaks with customers is one output of that platform. It runs inside it. It is not the platform itself.
Building the Floward Agent: From Operational Understanding to First Call
Before the agent made a single call, the Sawt team spent time inside Floward's operations. How does an order get created? When does the recipient notification go out? What happens when someone doesn't respond? What situations come up daily that the customer service team has learned to handle?
That understanding gets translated inside the platform into a structure called Journeys. Each Journey is a complete operational scenario with five components: a Procedure that defines exactly what the agent does at each step, Policies that set hard limits on what it must never do, Data preloaded before the call begins, Tools it can use mid-conversation (like sending a WhatsApp location link or querying the order system), and Handoffs that govern when and how the agent escalates to a human.
Above the Journeys sits a layer called Shared Instructions: stacked system-level directives that shape the agent's behavior across all scenarios. For Floward, that meant three layers, each scoped to a different level of specificity:
Floward-specific rules: warm tone, no sender disclosure for surprise gifts
LAYER 3
Saudi Arabic dialect, local terminology, regional conversational conventions
LAYER 2
Universal rules: be concise, be clear, do not interrupt
LAYER 1
Above the Journeys sits a layer called Shared Instructions: stacked system-level directives that shape the agent's behavior across all scenarios. For Floward, that meant three layers, each scoped to a different level of specificity:
Agent Context
→
Order DB
+
+
CRM
+
Genesys
Average call duration. The agent starts every call knowing who it's calling and why. It doesn't ask questions it can already answer. It goes straight to the one thing it needs: the address.
Going from operational understanding to a live, running agent took weeks. The platform is built so that deep knowledge of how a business works translates directly into how the agent behaves.
How the Agent Learned to Pronounce the Neighborhood Correctly Before Ever Calling a Customer
Sawt's platform has a simulation engine built in. Before the agent talks to a real customer, it runs through dozens of test scenarios. The process starts with Testing Goals: what should the agent do when someone refuses to share their location? When they ask who sent the gift? When they push for a human? Each scenario is defined as a goal the agent either passes or fails.
Scenarios can be written manually or generated automatically. The platform's Generate with AI feature analyzes past conversation data and produces a list of scenarios covering common patterns and edge cases. The system then runs hundreds of synthetic conversations and scores each one against a Pass Rate per goal.
During Floward's simulation, a problem came up that would have taken hundreds of live calls to notice:
The loop does not stop at launch.
In production, improvement is ongoing. Each call adds data. Each unexpected situation becomes a training scenario. By February 14th, the agent had already seen and trained on most of what it would encounter. Call 4,125 on Valentine's Day ran on everything the system had learned since call one.
10 Out of Thousands: How That Number Holds
Of the thousands of calls on February 14th, 10 required a human. That is not a rounding error. It is the result of an observability system that processes every call without exception, through a pipeline that starts the moment a conversation ends.
The system runs on two parallel tracks. The first measures platform-level quality. The second is built on Floward's own business logic and measures how closely the agent adheres to it. Together they produce a weighted score for every call:
NON-DETERMINISTIC (AI-EVALUATED)
AI
LLM Hallucination
AI
WER
AI
Tool Call Accuracy
AI
Transition Accuracy
AI
Sentiment
DETERMINISTIC (DIRECT MEASUREMENT)
Direct
e2e Latency (P95)
Direct
Transfer Success
Direct
Interruptions
Direct
Tool Latency
Direct
Overlapping Speech
For Floward, the team weighted each metric according to what their operations actually depend on. Correct pronunciation of neighborhood names accounts for 30% of the score, because a wrong district name means a gift arrives at the wrong address. Every call is evaluated as a weighted composite:
30%
30%
30%
10%
30%
Every call passes through full post-call analysis. Performance is scored, failures are logged, and any call that deviates from the expected path is transferred immediately to a human agent with the full context attached. Those calls then go back into the simulation engine as new training scenarios.
Numbers that tell the story
One use case. Demonstrated value. Built to expand.
What began as address confirmation became a reliable operational capability - available year-round, at any volume. Peak season no longer requires temporary fixes or headcount that needs to be rebuilt each year. Floward's team could focus on what automation doesn't touch: the parts of gifting that require human judgment and care.
At Sawt, we build a platform for AI-managed customer communication. We start by understanding how your operations actually work, demonstrate measurable impact within weeks, and build from there.
Not a one-time result. A repeatable model, across seasons, channels, and scale.