How to build an AI product that doesn’t suck

In article, Archana Kumari, Senior Product Manager at Microsoft, shares her insights from her experience in spam detection, covering the evolution from traditional methods to the role of large language models (LLMs).

11 min read
Share on

Chances are, if you are working in tech, you must be researching or already using AI to enhance your current product. AI is becoming essential in nearly every industry, powering everything from healthcare diagnosis to financial predictions and customer support. Yet, the number of AI products that suck is surprisingly high. Why? Most AI products fail to align with user expectations, lack real-world adaptability, or are a poor use case for AI. How do you build a successful AI product?

Let's learn by diving into the world of spam detection. I am going to share what I have learned so far—starting with the basics of spam detection and then exploring how large language models (LLMs) can help. I am still learning, so bear with me as I explain some common pitfalls and build an AI product that users actually want to use.

1. First Step - Clearly define the problem

In my example, we need to define what does “spam” mean?

The first reason many AI products fail is the lack of clear problem definition. Without this, even the most sophisticated AI models won’t deliver value because they won’t align with users' needs.

In the case of spam detection, it’s crucial to define what “spam” means. Is spam any unsolicited message? What about harmless promotional messages from a company you once bought from?

The first step in building an AI product is to define your goal clearly and ensure that it aligns with the user’s expectations. For our spam detector, you’ll need to gather input from both users and stakeholders about what should qualify as spam.

2. How is this problem solved traditionally?

Let’s understand the evolution of spam detection: From rules to AI

The old way: Rule-based systems

At first, spam detection was all about setting up rules. If a message contained words like "free" or "click here," it would be marked as spam. It was simple and worked—for a while. But spammers got smarter, tweaking their messages to bypass the filters. So yeah, rules-based systems were cool and effective... until they weren’t.

Example: Imagine you’re building a system to filter spam messages. A simple rule might be: if a message contains the word "urgent," mark it as spam. So, a message like "Urgent! Your account needs attention" would get flagged. But spammers could trick the system by using alternatives like "urgnt" or symbols like "ur^gent," or change the word altogether like “hurry up!!” easily bypassing the rules.

Enter AI: The rise of machine learning - adaptive filtering

Then came machine learning (ML), which was a huge step up. Instead of relying on strict rules, ML models like Naive Bayes could look at large datasets of emails and learn what features (like certain words or patterns) were common in spam. This made the system more adaptable, which was awesome because spammers were always evolving. But even this approach had its limitations, which is where more advanced AI comes in.

With machine learning, the system adapts by learning from large sets of data. It might recognize that messages using phrases like "limited time offer" or "click to claim" frequently appear in spam. So, a message that says "Claim your exclusive prize, only today!" would be flagged based on patterns the model has learned, without relying on explicit rules.

3 Jumping to LLMs: Why they’re awesome, and why they can suck

The game changer: Large language models (LLMs)

Fast-forward to today, and we’ve got large language models (LLMs) like GPT-4, which can understand context and language better than anything before. LLMs can recognize nuanced patterns in spam that older systems would miss. I’m still in awe of how they can detect intent, like if an messages is trying to trick someone without using obvious spammy words.

But—full transparency here—LLMs aren’t perfect. I’ve been learning that while they’re super powerful, they can also be super unpredictable. So, how do you make sure you’re using these tools in the right way?

An LLM is great at understanding context. For instance, a message that says, "Hi, we noticed some activity in your account. Please verify it immediately." may not seem overtly suspicious, but the LLM can pick up on the subtle pressure to take urgent action, identifying it as a potential phishing attempt based on the tone and content.

Why LLMs are awesome

What’s cool about LLMs is that they go beyond just spotting keywords. They actually understand the tone and context of a message, so they can catch more subtle spam. For example, they might flag something that doesn’t use any typical spam words but still feels shady because of its manipulative tone. They also handle multiple languages really well, which is something older models struggled with.

Why LLMs can suck

That said, I’ve learned the hard way that LLMs aren’t perfect. They rely on the data they’re trained on, and if that data is biased or limited, they’ll make mistakes. Also, they can be really hard to debug. If an LLM misclassifies something, it’s not easy to figure out why, which makes it tricky to improve the system. Oh, and let’s not forget—they’re expensive to run and may be overkill for some problems. You don’t always need a hammer to swat a fly, right?

4. The blueprint for building a spam detection system that doesn’t suck

4.1 Start simple, then scale

One of the key pitfalls of AI products is trying to do too much from the outset. According to best practices, you should start with a Minimum Viable Product (MVP), where your AI solves a basic but meaningful problem well.

For our spam detection system, you don’t need to create a highly sophisticated AI that can detect all kinds of spam from the get-go. Start by identifying a small set of criteria, like specific keywords and common spam patterns. You can later expand the model's capabilities based on user feedback and evolving patterns of spam.

Lesson: Focus on solving a core problem first, then gradually build up complexity. When you start simple, you ensure the AI remains functional and easy to improve.

4.2 Know when to use LLMs

LLMs are great, but I’ve learned that they’re not always the answer. They shine when the problem is complex—like if spam is written in a way that’s designed to look super convincing. They’re also good at detecting context-heavy or foreign language spam, but for simpler problems, they might be overkill. I’m still figuring out when to pull the trigger on LLMs and when to stick with a simpler approach.

4.3 Use real-world data for training

Many AI models fail because they are trained in overly sanitized or narrow datasets that don’t reflect the complexity of the real world. For spam detection, this can lead to embarrassing results—legitimate messages get flagged while obvious spam slips through.

The best way to avoid this is to train your AI on diverse, real-world data that includes a wide variety of message types. Spam messages can take many forms, and your AI needs to learn to adapt to different writing styles, industries, and contexts.

Training on a wide variety of real-world data is crucial. In the case of spam detection, this means collecting datasets from multiple sources (e.g., emails, SMS, social media) and updating them frequently.

4.4 Iterate with user feedback

AI is never finished. One major reason AI products flop is the lack of iteration. As with all software, AI models should continuously evolve based on user feedback and new data. If your spam detector misses certain spam messages or flags legitimate ones, users will get frustrated. But with the right feedback loop in place, you can continuously refine the product to better meet their needs.

Here’s how this can be done for spam detection:

Lesson: AI should be treated as a living product, constantly learning and evolving based on user interaction and feedback.

4.5 Transparency is key

There has been several ideas discussed in the literature to make spam detection AI systems more transparent. LLMs can feel like black boxes, which makes it hard for users (and me!) to trust them. That’s why I’m trying out tools that explain how the model makes its decisions. It’s still a work in progress, but I’ve found that when people understand why their messages was marked as spam, they’re more likely to trust the system.

4.6. Continuous monitoring and updating

Spam is always changing, so your AI needs to keep up. I’ve learned that even the best models can become outdated quickly if you don’t continuously monitor and retrain them. Right now, I’m experimenting with setting up systems that automatically update the model with new data. It’s definitely a process, but it’s one of the most important things I’m working on.

4.7 Identity validation and detecting AI-generated content

Instead of just focusing on message content, it’s sometimes better to verify the sender’s identity. For example, deepfakes or AI-generated messages might be part of a more sophisticated spam attempt. By using AI to validate the sender—checking if they’re known or if their content shows signs of being artificially generated—systems can detect spam or deepfakes even when the messages are well-disguised.

4.8 Ethical considerations

Ethical considerations are crucial when developing and deploying AI-based solutions. These considerations help ensure that AI technologies are used responsibly, equitably, and with respect for human rights. Here are key ethical considerations for AI-based solutions:

Ensure privacy by collecting only necessary data and securing it against breaches. Obtain clear user consent and be transparent about how data is used. Bias mitigation is essential—avoid disproportionately flagging messages based on characteristics like language, race, or gender to maintain fairness.

Make spam detection decisions explainable so users understand why their messages were flagged. Define clear accountability for system errors, like false positives or missed spam, and include human oversight in reviewing critical or ambiguous cases to ensure fairness.

Ensure the system is secure from attacks and adaptable to new spam tactics. Comply with relevant laws and regulations (e.g., GDPR), and consider the long-term societal impact and sustainability of your AI solutions.

5. Final thoughts: Build AI products that people can trust

At the end of the day, I’m still figuring out how to create AI products that people will love. But I know so far It’s a balancing act between using advanced AI tools like LLMs and keeping things simple and transparent. While LLMs are powerful, they’re not a magic bullet, and I’m learning that sometimes the simplest solutions are the best. The goal is to build something that’s reliable, adaptable, and easy to understand.

So, as I keep experimenting, that’s the key lesson I’m taking away: AI products don’t have to be overly complicated to be good. They just need to solve real problems in a way that people can rely on—and that’s how I’m trying to create AI products that don’t suck.