Good Fake Data is Hard To Find


Author: Jonathan Friedman
Last updated: Published:

From Hollywood to Silicon Valley: The story of fake data

Have you ever noticed how phone numbers work in movies and on TV? Picture this: a dashing hero manages to charm a mysterious yet witty femme fatale, and just before he leaves, he spots a short message on the back of a napkin: “Call me. Jessica. 555-3140”. But why do movie numbers look like that?

The primary reason is to avoid using real phone numbers, as enthusiastic viewers might call them out of curiosity or fandom. But the number mustn't be blatantly fake either, like “111-1111”, as it would take audiences out of the story. Thus, the prefix “555” strikes a good balance, making it plausible enough to be real without intruding on anyone's privacy.

In product demonstrations, similar fake data is often necessary to tell a story. But procuring good fake data that convincingly illustrates your product's capabilities is an arduous task. Let's delve into why this is the case and explore possible solutions.

The challenge of creating convincing fake data

The data showcased in your product needs to represent an ideal use-case scenario, not an average day. An empty account is akin to an empty house—if you want to sell the concept, you need to stage it convincingly. For example, if your product is a CRM, it could mean creating fake customers, fake orders, and even fake employees. However, generating this data manually is a labor-intensive process and there are constraints around maintaining data consistency and freshness, especially when dealing with temporal data such as analytics or activity history.

Exploring Solutions: Manual, automated, and hybrid approaches

1. Manual Data Generation: Simple but slow

Manually using your product to create fake data is straightforward, but it requires significant effort. Furthermore, maintaining the consistency and freshness of this data is challenging. For instance, temporal data like "Recent Purchases" might become stale over time, necessitating regular updates and refreshes.

2. Automated Data Generation: Quick but tricky

Seeking the help of your R&D team to write code that generates fake data can provide a quick solution. This automated approach can generate large volumes of data and allow customization for different scenarios. However, it presents its own challenges—securing time with the R&D team, ensuring ongoing support, and developing code that convincingly simulates human behavior.

3. Redaction of Real Data: Authentic but risky

An alternative is copying real customer data into an empty account, and redacting all identifiable information. This approach guarantees authentic data, but it’s not without risks and technical challenges. This method requires careful redaction to protect customer information, and even after careful redaction, the customer's identity could potentially leak from context clues, especially if it’s a well-known customer.

Concluding Thoughts: Navigating the maze of fake data generation

The demo data problem is more intricate than it appears. Each approach, while presenting certain advantages, also brings its own challenges. Crafting the perfect fake data for your demo is akin to creating a masterpiece—it requires creativity, precision, and patience.

Here are some guiding principles to navigate this process:

  • Begin with the story you want to tell.
  • Understand what data you need to tell that story convincingly.
  • Identify how to create this data within your product.
  • Consider how this data will be updated and maintained over time.
  • Exercise utmost caution when using customer data, even if it’s been anonymized or redacted.

Though difficult, the journey of crafting perfect fake data is a rewarding one, breathing life into your product and making it shine in the eyes of potential users.

Tell us about yourself

Tell us about yourself so we can show you a demo on the first call

Hand holding screenshots of an application