Weever.ai: Evaluating trust and usability in an AI shopping assistant

Uncovering what makes AI feel human and trustworthy

Role

UX Researcher

Industry

AI

Duration

2 months

a cellphone leaning against a wall
a cellphone leaning against a wall
a cellphone leaning against a wall

Overview

Weever.ai is an AI-powered shopping assistant that recommends products based on user prompts. My role was to evaluate how people trust and perceive usability in this type of conversational AI.

Through a controlled usability study, we explored how search context, whether if its a general search or specific, affects user trust and usability uncovering what makes an AI interface feel more intuitive and trustworthy.

The challenge

The Weever.ai team wanted to understand how users interact with the platform’s AI-powered search and identify which factors most influence the platform's bounce rate and conversions.

Our goal

Our goal was to uncover these factors and identify how Weever.ai could build more transparent and human-like interactions that boost conversions.

Accordingly, we proposed to evaluate usability and trust across two common search scenarios:

a specific search, where users already know what they want, and a general search, where they explore the platform to get ideas.

Research objectives

Compare the level of satisfaction between general search and specific search. 

Compare how effectively users achieve goals between general search and specific search. 

Compare the effort between general search and specific search. 

Compare users’ trust in the AI assistant across general search and specific search, when compared to before using the product.


Setting up the study


To understand how users experience trust and usability in Weever.ai, we conducted a within-subject usability study with 12 participants. Each participant completed two tasks, one general and one specific search, while thinking aloud to capture their thoughts. Therefore, they were given two scenarios:



The session combined quantitative measures and qualitative observation to capture both behavioral patterns and perceived dimensions.

Measure

Dimension

Customer Satisfaction Score (CSAT)

Satisfaction

Customer Effort Score (CES)

Effort

Task Success

Effectiveness

Human-Computer Trust Scale

Trust


At the beginning of the study, participants completed a short task to capture their first impressions of the platform. At the end, these perceptions were compared with their responses in a post-test survey reflecting the overall experience.


After completing each search task, participants filled out a post-task survey to capture their perceptions. The order of tasks was randomized to counterbalance results and minimize potential learning bias.


At the end of the test, we conducted post-test interviews to explore why users interacted with the technology the way they did, uncovering key pain points and opportunities for improvement.

Participants

To ensure relevant insights, we recruited participants who met specific inclusion criteria. The following diagram provides a summary of the participant profile.

Our findings and recommendations

The study revealed how search context, system performance, and transparency directly shaped the users experience. Below are the main findings that highlight where Weever.ai succeeded and where the experience fell short.

First impressions vs Overall impressions

First, users had slightly positive first impressions, describing Weever.ai as modern and smart. Most expected it to act as a product recommendation or review tool, building strong initial trust based on appearance and branding.

"It will give me product recommendations as well as unbiased reviews." Tester 1A


Compared to their initial excitement, users’ overall impressions declined notably after interacting with Weever.ai. Pain points like irrelevant results, slow loading, and limited transparency had a negative effect on users perceptions.


“I searched for basketball gifts, but it gave me a Fisher-Price toy. That’s not what I meant.” Tester 2B

“I don’t get it...What is the reasons of recommending this [product], and not another” Tester 4A

"If it takes more than 2 seconds I would probably just exit" Tester 3A


Finding summary

The following table summarizes all pain points identified in the study, organized by priority. Each includes a corresponding design recommendation to address the issue.

Pain point

Priority

Design recommendation

Irrelevant or inconsistent search results

🔴 High

Display clearer logic behind AI recommendations (e.g., “Why this result?”).

Slow loading time

🔴 High

Optimize performance speed; add a progress indicator or loading feedback to manage expectations.

Lack of transparency about how AI works

🔴 High

Add transparency cues: brief explanations of data sources, tooltips, or messages showing how recommendations are generated.

Limited number of results and missing pricing info

🟠 Medium 

Display a consistent number of results with complete info; if unavailable, provide contextual reasons (“Price unavailable for this source”).

No filtering or customization options

🟠 Medium


Introduce filtering tools (e.g., by price, brand, or source) and a “Load more results” option to increase autonomy.

Conclusion

The study revealed that trust in AI systems depends not only on accuracy, but also on transparency, feedback, and user control. By addressing these factors, Weever.ai can rebuild user confidence and transform its experience from one of uncertainty to one that feels trustworthy and human-centered, potentially boosting conversions.


My takeaway

This project reinforced my belief that humanizing AI starts with trust. Through behavioral research and usability testing, I learned how small design cues, like transparency and giving user feedback, can make complex system feel more understandable and human.

Overview

Weever.ai is an AI-powered shopping assistant that recommends products based on user prompts. My role was to evaluate how people trust and perceive usability in this type of conversational AI.

Through a controlled usability study, we explored how search context, whether if its a general search or specific, affects user trust and usability uncovering what makes an AI interface feel more intuitive and trustworthy.

The challenge

The Weever.ai team wanted to understand how users interact with the platform’s AI-powered search and identify which factors most influence the platform's bounce rate and conversions.

Our goal

Our goal was to uncover these factors and identify how Weever.ai could build more transparent and human-like interactions that boost conversions.

Accordingly, we proposed to evaluate usability and trust across two common search scenarios:

a specific search, where users already know what they want, and a general search, where they explore the platform to get ideas.

Research objectives

Compare the level of satisfaction between general search and specific search. 

Compare how effectively users achieve goals between general search and specific search. 

Compare the effort between general search and specific search. 

Compare users’ trust in the AI assistant across general search and specific search, when compared to before using the product.


Setting up the study


To understand how users experience trust and usability in Weever.ai, we conducted a within-subject usability study with 12 participants. Each participant completed two tasks, one general and one specific search, while thinking aloud to capture their thoughts. Therefore, they were given two scenarios:



The session combined quantitative measures and qualitative observation to capture both behavioral patterns and perceived dimensions.

Measure

Dimension

Customer Satisfaction Score (CSAT)

Satisfaction

Customer Effort Score (CES)

Effort

Task Success

Effectiveness

Human-Computer Trust Scale

Trust


At the beginning of the study, participants completed a short task to capture their first impressions of the platform. At the end, these perceptions were compared with their responses in a post-test survey reflecting the overall experience.


After completing each search task, participants filled out a post-task survey to capture their perceptions. The order of tasks was randomized to counterbalance results and minimize potential learning bias.


At the end of the test, we conducted post-test interviews to explore why users interacted with the technology the way they did, uncovering key pain points and opportunities for improvement.

Participants

To ensure relevant insights, we recruited participants who met specific inclusion criteria. The following diagram provides a summary of the participant profile.

Our findings and recommendations

The study revealed how search context, system performance, and transparency directly shaped the users experience. Below are the main findings that highlight where Weever.ai succeeded and where the experience fell short.

First impressions vs Overall impressions

First, users had slightly positive first impressions, describing Weever.ai as modern and smart. Most expected it to act as a product recommendation or review tool, building strong initial trust based on appearance and branding.

"It will give me product recommendations as well as unbiased reviews." Tester 1A


Compared to their initial excitement, users’ overall impressions declined notably after interacting with Weever.ai. Pain points like irrelevant results, slow loading, and limited transparency had a negative effect on users perceptions.


“I searched for basketball gifts, but it gave me a Fisher-Price toy. That’s not what I meant.” Tester 2B

“I don’t get it...What is the reasons of recommending this [product], and not another” Tester 4A

"If it takes more than 2 seconds I would probably just exit" Tester 3A


Finding summary

The following table summarizes all pain points identified in the study, organized by priority. Each includes a corresponding design recommendation to address the issue.

Pain point

Priority

Design recommendation

Irrelevant or inconsistent search results

🔴 High

Display clearer logic behind AI recommendations (e.g., “Why this result?”).

Slow loading time

🔴 High

Optimize performance speed; add a progress indicator or loading feedback to manage expectations.

Lack of transparency about how AI works

🔴 High

Add transparency cues: brief explanations of data sources, tooltips, or messages showing how recommendations are generated.

Limited number of results and missing pricing info

🟠 Medium 

Display a consistent number of results with complete info; if unavailable, provide contextual reasons (“Price unavailable for this source”).

No filtering or customization options

🟠 Medium


Introduce filtering tools (e.g., by price, brand, or source) and a “Load more results” option to increase autonomy.

Conclusion

The study revealed that trust in AI systems depends not only on accuracy, but also on transparency, feedback, and user control. By addressing these factors, Weever.ai can rebuild user confidence and transform its experience from one of uncertainty to one that feels trustworthy and human-centered, potentially boosting conversions.


My takeaway

This project reinforced my belief that humanizing AI starts with trust. Through behavioral research and usability testing, I learned how small design cues, like transparency and giving user feedback, can make complex system feel more understandable and human.

Our main findings

After analyzing the data, we uncovered multiple insights. In this section, we separate the insights per task.

Informational tasks

Task: Find employee ranking and quota

Metric

Ranking Task (Avg)

Quota Task (Avg)

Effectiveness

10 of 12 completed it successfully

Only 2 of 12 completed it successfully

Efficiency (Time)

Avg time: 79.9 sec

Avg time: 113.9 sec (42% longer)

Satisfaction

CSAT: 4.42 / 7

CSAT: 2.89 / 7

Efficiency (Perceived effort)

CES: 3.25 / 5

CES: 2.25 / 5


How did the users feel about the pre-bidding tasks?

”I think that could have [been] made a little bit more obvious where I can find it. (...) Because it was obviously right in my face, but I didn’t know it was the quota.” – P10

“It was hard to find... the rank. It was just a little icon that didn't tell me much.” – P02

I actually have no idea where to find that (quota). – P05

Identifying the pain points

  1. Quota visibility is a major usability issue

Despite being a core element in vacation bidding, most participants failed to locate the quota. Only 2 out of 12 succeeded, indicating poor discoverability. Additionally, the quota task took 42% longer on average than the ranking task. This suggests not only a lack of clarity but also a more time-consuming process, impacting user efficiency and user satisfaction (2.89 out of 7).

Quota is highlighted in green.

  1. High success in the ranking task does not mean clarity

Although 10 participants completed the ranking task, several still expressed confusion, especially around the icon used to access that information. Participants could not recognize key interface elements, like the ranking icon, which added unnecessary steps and confusion.

Ranking highlighted in green.

Bidding tasks

Task: Find employee ranking and quota

Metric

1-week Bid

2-week Bid

Effectiveness

12 of 12 completed successfully

10 of 12 completed successfully

Efficiency (Time)

Avg time: 48.7 sec

Avg time: 74.7 sec (53% longer)

Satisfaction

CSAT: 4.83 / 7

CSAT: 4.25 / 7

Efficiency (Perceived effort)

CES: 2.58 / 5

CES: 2.00 / 5


How did users feel about completing the bids?

“I think the two week bid, I struggled because it wouldn’t let me select two weeks total, which I didn’t understand if it was something that I did or if it was a system blockage because it didn’t say.” - P11

“Some parts are easy to use, but some parts are very confusing and I have no clue how to proceed.” – P05

"I wanted to select the whole period of two weeks, but I had to do this action twice because it didn't allow me to do that." - P11

"Too many clicks. And I didn't understand why I had to do all these clicks." - P01


Identifying the pain points

  1. Two-week bidding flow creates unnecessary complexity

While the one-week bid task was completed successfully by all participants, the two-week bid led to confusion and inefficiencies. Only 10 out of 12 users succeeded, and the average completion time increased by 26 seconds. Satisfaction scores also dropped.

These findings suggest that the interface lacked clear guidance for multi-week bidding. Users were unsure how to perform the task in one action, and several believed they had to repeat the process or ask for help. This impacted both efficiency and confidence, highlighting an opportunity to simplify the bid interaction and reduce friction.

Only one week was able to be selected at a time.

  1. Too many steps to add a bid impacted flow and satisfaction

Several participants noted that the process of adding a bid involved too many clicks and redundant steps. After selecting the days, users were required to validate, then click “Add Bid,” and finally submit, a sequence that felt unnecessarily long and repetitive. Even users who completed the task successfully expressed frustration with the interaction flow, describing it as time-consuming and unintuitive.

Too many additional steps were required to complete a bid.


Recommendations

To guide the product team in prioritizing usability improvements, we classified each issue based on Nielsen’s severity ratings for usability problems. These ratings help distinguish between minor concerns and those that significantly affect the user experience.

Severity 3: Major Usability Problems

Problem

Recommendation

Lack of visibility and clarity around the “Quota” feature

Higher dev effort: Add a brief tutorial at the beginning of the experience to explain key features like quota and ranking. This would require more development effort but can significantly improve onboarding and confidence.

Most users could not find the quota feature due to low visibility and lack of contextual clarity. This caused delays and task failure, especially during critical pre-bidding actions.

Lower dev effort: Improve visual clarity of the quota by changing its color to a brighter option or applying more visual hierarchy. This is a quicker fix that can increase discoverability with minimal development impact.

Tutorial recommendation.

Problem

Recommendation

Lack of system guidance on selecting a bid for 2 weeks

Higher dev effort: Let users select two weeks at once using the date picker.

Users lacked guidance on how to submit a two-week vacation bid. Many did not realize they had to repeat the process twice, which caused delays and confusion.

Lower dev effort: Add a tooltip that clearly instructs the user to select a week, submit, then repeat.

Two weeks selection at a time.


Severity 2: Minor Usability Problems

Problem

Recommendation

Lack of hierarchy or visual emphasis of the ranking information and inconsistent icon

Enhance the visibility of the ranking section by using a clearer icon, adding a descriptive label, and applying bold styling. Pairing icon and text will help users recognize the ranking information at a glance and reduce hesitation.

Users had trouble locating their employee ranking due to weak visual hierarchy and an unrecognizable icon. Many did not realize what the symbol represented or where to find the ranking, leading to confusion during the task.


Problem

Recommendation

Excess steps for submitting a bid

Automatically add selected dates to the bid list after clicking on “Validate”, eliminating the need for an “Add Bid” button.

There were too many steps involved in submitting a bid. The additional “Add Bid” button after selecting dates added unnecessary friction.


Outcomes

-22%

HR Support tickets requested

+30%

Efficiency in bidding

After the usability recommendations were implemented, the company reported significant improvements in both user experience and bidding efficiency:

  • 22% decrease in HR support tickets
    Users encountered fewer issues and uncertainties, reducing the load on the support team.

  • 30% increase in bidding efficiency
    Employees were able to complete vacation bids more quickly and confidently.

These results highlighted the business value of user research and how targeted design improvements can drive measurable impact in SaaS platforms.

Takeaways

From this project, I learned that combining usability metrics with qualitative insights is essential to truly understand where and why users struggle. Observing task completion alone wasn’t enough, post-test interviews revealed the underlying causes of hesitation, like unclear icons and missing guidance.

Other projects

Copyright 2025 by Nicolas Peyre

Copyright 2025 by Nicolas Peyre

Copyright 2025 by Nicolas Peyre