“
6Pages write-ups are some of the most comprehensive and insightful I’ve come across – they lay out a path to the future that businesses need to pay attention to.
— Head of Deloitte Pixel
“
At 500 Startups, we’ve found 6Pages briefs to be super helpful in staying smart on a wide range of key issues and shaping discussions with founders and partners.
— Thomas Jeng, Director of Innovation & Partnerships, 500 Startups
“
6Pages is a fantastic source for quickly gaining a deep understanding of a topic. I use their briefs for driving conversations with industry players.
— Associate Investment Director, Cambridge Associates
Read by

Used at top MBA programs including
May 13 2022
12 min read
1. Synthetic data is becoming pervasive in AI
- So far this year, there’s already been $170M+ in venture funding for synthetic data startups – already reaching an annual high with more than half the year still left. (The prior 5 years saw a total of $210M in funding.) This recent uptick in funding for synthetic data startups has come in parallel with the growth of AI.
- Traditionally, training AI has required vast volumes of high-quality data. Synthetic data – algorithmically generated data that mirrors the statistical characteristics of real-world data – is a technique to reduce the data needed to train AI models by as much as 70-90%. According to Gartner, by 2024, 60% of data used to develop AI models will be synthetically generated. By 2027, the market for synthetic data is projected to reach $1.2B (up from $110M in 2021).
- Synthetic data presents a faster, less expensive, and more scalable alternative to collecting, labeling, and cleansing real-world data. Companies with AI ambitions but inadequate data can supplement in-house data with 3rd-party synthetic data. Synthetic data can also reduce the compliance risk associated with the growing number of consumer privacy laws. This has risen in importance as regulators start demanding that companies destroy algorithms built with ill-gotten data. Synthetic data can also mask corporate data, allowing industry rivals to pool data without sharing competitively sensitive information.
- There are at least 76+ startups working on synthetic data (as well as an array of open-source tools like the Synthetic Data Vault libraries). This year alone, we’ve seen funding rounds for Synthesis AI ($17M), Datagen ($50M), Synthetaic ($13M), Mindtech ($3.7M), MDClone ($63M), and MOSTLY AI ($25M). Many are focusing on use cases in computer vision, such as facial recognition and satellite imagery analysis.
- Synthetic data can be structured (e.g. tables, spreadsheets) or unstructured (e.g. faces, 3D environments, audio, text). Some startups – particularly those in Europe – offer privacy-preserving datasets, while others are generating test data where privacy is less important.
- Synthetic data is increasingly being generated via AI rather than SQL and common programmatic languages. Using AI techniques like generative adversarial networks (GANs), one startup can generate 10M+ labeled images. GANs pit two unsupervised neural networks against each other in a feedback loop. The generator creates synthetic data from real-world inputs, and the discriminator tries to figure out what is fake. The two learn over thousands of iterations until the generator’s output is convincingly real. Not all synthetic data is developed from real-world data – e.g. Rendered.ai generates data from physics equations (e.g. how light interacts with matter).
- Established tech companies – such as Google, Microsoft, Amazon, Meta (incl. its AI.Reverie acquisition), NVIDIA, and IBM – have also been working with synthetic data. Industry watchers are speculating that some of the larger data warehouse players such as Snowflake and Databricks may follow, offering synthetic-data generation as a service.
- There are an expansive set of use cases for synthetic data. In Europe, regulated industries with privacy needs – such as large banks, insurers, telecom firms, and healthcare – have driven the demand for synthetic data. In the US, demand has come from innovation use cases such as predictive algorithms, fraud detection, and pricing. Israel/US-based startup Datagen is lately reporting a pickup in metaverse, automotive (incl. in-cabin simulation), smart videoconferencing, and home security applications.
- In healthcare and life sciences, providers are using synthetic data based on real patients’ medical records to train AI to diagnose and recommend treatment. (Synthetic patient populations can also be created from scratch.) Synthetic patient data can accelerate clinical trials where study subjects may be hard to find. (85% of clinical trials experience recruitment delays, and 30% are terminated after failing to recruit enough patients.) Synthetic data can also allow a broader circle of researchers to collaborate while preserving privacy – which can be advantageous in situations like a global pandemic.
- In autonomous driving and mobile robotics, synthetic data can help model complex interactions in 3D environments. Researchers can randomize elements like object positions, lighting, and shadows to train AI in more diverse and dynamic environments. Tesla, for instance, is using synthetic data to train its Autopilot. Advanced gaming engines can be used in conjunction to create realistic imagery for outlier situations difficult to capture in real life (e.g. accidents, fires, 100 bicyclists crossing an intersection). Synthetic data is also being used to develop driver safety systems, detecting motions like taking hands off the wheel.
- Similarly, with the rise of the metaverse, synthetic data can be used to train the algorithms that will power the environments and interactions in the future metaverse – the primary rationale for Meta’s AI.Reverie acquisition. Synthesis AI has generated 100K “synthetic people” that can be used to develop more emotive and realistic avatars. “Emotion AI” – using AI on facial micromovements and body language to detect people’s emotions – is a particularly active and controversial use case.
- In banking, payments and finance, fraud and anomaly detection is one of the most common use cases. Companies like Wells Fargo and American Express are using patterns in synthetic data to train AI to detect fraudulent activity. Synthetic data can also be used for customer acquisition, credit decisions, and churn reduction, as well as other use cases.
- Other industries/arenas using synthetic data include retail (e.g. cashierless shopping), customer service (e.g. training chatbots), meetings (e.g. smart conference rooms, videoconferencing), consumer tech (e.g. photo enhancement), insurance (e.g. underwriting), telecom (e.g. customer analytics), marketing (e.g. campaign simulation), smart home (e.g. smart speakers), industrial manufacturing (e.g. digital twins), and agriculture (e.g. crop detection). The use cases for synthetic AI are as diverse as AI itself – it can be used to navigate difficult terrains, track animals and identify poachers, assess coastline erosion, analyze human-trafficking patterns, spot different weapons, detect cheating in virtual exams, and alert staff to safety issues in transportation hubs (e.g. people at risk of falling onto train tracks).
- Synthetic data can carry with it all the same biases as the real-world data it draws on – or even amplify those biases. The data can be adjusted, however, to meet defined standards of fairness (e.g. to include more women for job-hiring algorithms), though there may be tradeoffs with accuracy.
- Despite the buzz, the reality remains that synthetic data is still not as effective as real-world data in training high-accuracy AI systems. Datasets that are highly representative and deeply anonymized are hard to generate. The quality of the data is highly dependent on the quality of the process/AI creating it. Poor synthetic data can result in algorithms that make low-quality and even unsafe recommendations.
- One study found that 92% of AI models trained on synthetic data had lower accuracy than those trained on real data. Models trained on synthetic data had a 6-19% deviation in accuracy compared to models trained on real data. There’s also the risk that the synthetic data misses key patterns, insights and opportunities – or turns up patterns not in the original data. For some companies, these downsides may be “manageable,” while for others they may be unacceptable.
- One startup CEO believes 2022 will be “the year where synthetic data will take off.” The greater speed and reduced cost of training associated with synthetic data suggests the potential to “democratize” AI – or at least make it more accessible – to more researchers and smaller companies with limited resources. As larger firms push into the space, we should see a pickup in both funding and M&A – subject, of course, to the volatility of interest-rate increases and the broader economy.
Related Content:
- Apr 15 2022 (3 Shifts): Google’s Transformer-based Pathways Language Model is breaking new ground
- Apr 8 2022 (3 Shifts): Regulators want to destroy ill-gotten AI models and data when businesses break rules
- Last month, Chinese ultra-fast fashion giant Shein raised $1B-2B in a funding round valuing the company at $100B. With that, Shein became the 3rd-most valuable privately held startup, behind only ByteDance and SpaceX.
- The $100B valuation – more than the combined worth of established fast-fashion players H&M and Zara – is reflective of Shein’s growth to become a dominant apparel player. Founded in 2008 and re-branded in 2015, Shein has only risen to global prominence over the past few years. In 2016, its revenue was just $600M+. By 2019, Shein had reached $3.2B in revenue and $5B in valuation. Just two years later, its revenue had quintupled to $16B in 2021. This year, despite recent slowing growth, it’s projected to reach an eye-watering $20B in revenue.
- In May 2021, it reached a milestone in becoming the most downloaded US shopping app for the month on Android and iOS – ending the full year as the #2 most downloaded shopping app in the US. In every month from Feb-Apr 2022, downloads of Shein’s app surpassed both Amazon and Shopify’s apps. In Q1 2022, Shein made up nearly 1/3 of US fast-fashion sales, more than H&M and Zara combined. According to Morgan Stanley, Shein could become the world’s 4th-largest apparel retailer by revenue this year.
- Shein’s success comes from piggybacking on – and in some cases turbocharging – popular retail trends in near real-time. Its apparel selection is powered by AI, which identifies trends and analyzes demand patterns to inform new styles. Shein takes fast fashion to the next level, updating its website with an average of 6K new items every day – an endless torrent of current designs. For context, over a recent 12-month period, Gap listed 12K different items on its site, H&M had 25K, and Zara had 35K. Shein, in contrast, had 1.3M items. They also are more fresh – 40-70% of its assortment is under 3 months old.
- Shein is able to bring new styles to market extremely quickly. Located near one of China’s apparel manufacturing hubs in Guangzhou, Shein can closely manage its production chain from design to manufacturing, with high digitization and integration between each step. This means it can design a collection in as little as 3 days, and go from concept to production in under 2 weeks and as short as 5-7 days.
- Shein works with a large network of independent Chinese suppliers within a 5-hour drive of its headquarters (300+ core vendors, plus 1,000+ subcontractors), which provide both design and production. It keeps their loyalty by paying them on time (30 days vs. 45-90 days) and seeks to fully automate supplier interactions, providing them with supply-chain management systems and loans, if needed, for expansion.
- As a mostly-online, direct-to-consumer brand, Shein runs lean with limited overhead, initially placing small orders of just 30-200 pieces for new items. Its custom software automatically orders more if an item sells well or terminates production if not. Shein sends items directly from its Chinese factories to customers globally. In the US – Shein’s largest market – it takes advantage of a regulatory loophole that lets it avoid import tariffs on shipments worth less than $800. (The exemption was raised from $200 to $800 in 2016.)
- Because of the way it operates, Shein’s assortment is substantially cheaper than its rivals. The median price for a dress on Shein is $13, compared with $50 for Zara and $30 for H&M. Some categories routinely sell for $5 or less. While its revenue has been growing, its profit margins are relatively thin – reportedly just 5% as of last year.
- Wildly popular among the Gen Z Instagram/TikTok crowd, Shein deploys a marketing strategy centered on social commerce. It enrolls a large number of influencers (including lesser-known ones with medium-sized followings) who churn out content to help sell styles, providing them with 10-20% commission, free clothes, and follower discount codes. In addition, it also utilizes search engine optimization, livestream events, gamified interactions and a point-based incentive system. Shein is the most talked-about brand on Instagram, YouTube, and TikTok – on TikTok, #sheinhaul videos have amassed 5.2B+ views to date.
- Shein is controversial among many industry watchers. It has been criticized for the inconsistent quality of its products, copying of styles from independent designers, IP disputes with major apparel brands, 75-hour work weeks of its suppliers’ employees, reliance on high-carbon synthetic textiles, and promotion of fashion waste (the volume of textiles in US landfills has doubled over the past 20 years). 83% of Shein’s suppliers are operating with “major risks” such as long working hours and poor emergency preparedness.
- It also faces regulatory headwinds. The US Congress is currently considering legislation that, if passed, would eliminate the de minimis tax exemption for items imported from China worth less than $800. In Europe, a proposed law could set standards for apparel durability/reusability and require sustainability labels.
- Shein is taking steps to mitigate the risks to its business model. It has built a team of 100+ staff to review designs as well as a large in-house design team, since many of its copycat and IP issues stem from its independent-supplier ecosystem. It also recently opened a new distribution center in Indiana, which could offer some operational flexibility if the de minimis exemption is eliminated.
- For now, Shein still has room to grow – though perhaps slower than in prior years. It only has 7M+ monthly active users in the US, out of the 230M+ online-shopping consumers. Operationally, relatively few rivals can do what Shein can do – at least right now, though that will change. It has some time, especially given its most recent round of funding, to respond to its critics and adapt to the changing environment. In the meantime, its success with the “TikTokification” of retail is likely to influence rival retailers’ strategies, while its “fast fashion 2.0” is likely to inspire analogous “fast manufacturing” in other domains.
Related Content:
- Sep 24 2021 (3 Shifts): Livestream shopping – worth hundreds of billions globally – is coming to America
- Oct 1 2021 (3 Shifts): Warby Parker & other online DTC brands are seeing the value of physical retail
3. Shopify’s push into fulfillment and digital advertising – head-to-head against Amazon
- Last week, Shopify’s Q1 2022 earnings report fell short of expectations. Despite quarterly revenue growing to $1.2B (up 22% year-over-year), Shopify’s results missed analyst expectations for both revenue and earnings. Gross merchandise volume grew 16% year-over-year to $43B, short of the expected $46.5B. Perhaps in an effort to counter the disappointing report, Shopify coupled its earnings with major announcements in fulfillment and digital advertising.
- As part of its ongoing investment in its Shopify Fulfillment Network, Shopify announced the acquisition of ecommerce fulfillment player Deliverr for $2.1B – its largest-ever acquisition. Shopify has been investing heavily in its fulfillment network, which was originally announced in 2019 to offer two-day/next-day delivery and other services to its 2M merchants. Deliverr fulfills 1M+ orders per month through an asset-light, software-based platform and network of warehouse, carrier, and last-mile partners. It serves merchants selling through dozens of ecommerce channels, including Amazon, Walmart, and Shopify itself. Deliverr will continue to support existing customers and channels post-acquisition, while being integrated into Shopify’s fulfillment network and more than doubling the size of its fulfillment team.
- Shopify followed with this week’s announcement officially launching Shopify Audiences (Shopify Plus merchants got early access in late 2021) – marking its push into digital advertising. Shopify Audiences offers tools and data to help merchants target high-purchase-intent audiences on social platforms. The offering capitalizes on Shopify’s access to an enormous pool of data (e.g. conversions) aggregated across 2M merchants, which it can use to develop high-performing targeted audiences (with a lower cost per conversion) for advertising. It will first sync with Facebook and Instagram before expanding to TikTok, Snap, Pinterest, Microsoft Advertising, and Criteo – potentially diverting ad dollars away from Amazon.
- The two announcements put Shopify head-to-head against Amazon, which has substantial businesses in both fulfillment services and advertising. In Apr 2022, Amazon unveiled its Buy with Prime program – called “Amazon as a service” by some. The program extends the “fast, free delivery” of Prime delivery and convenience of Amazon-enabled checkout to Prime members shopping on merchants’ own websites. The program will eventually be rolled out to even merchants not selling on Amazon.com or using its fulfillment services. Similar to Amazon’s AWS, sellers’ data is protected and they pay only for what they use (e.g. fulfillment, storage, payment processing).
- Viewed as a direct shot against Shopify in going after direct-to-consumer brands, Buy with Prime ignited speculation that Shopify might opt to coexist with the program in its merchants’ best interests. Even if that turns out to be the case, the Deliverr announcement suggests that Shopify doesn’t plan on standing down its fulfillment ambitions – despite being up against the Amazon logistics juggernaut.
- With regards to advertising, Shopify had long avoided digital advertising, believing it might be viewed as pitting merchants against each other or requiring them to pay to get products in front of end-customers. (Some merchants on Amazon, for instance, have become resentful at its increasingly large cut.) Shopify has sought to discourage speculation that it would launch its own ad network. Its general stance is that it is on the side of its merchants – for instance, pushing back on Meta’s requests to turn on Facebook’s Conversion API by default.
- But advertising tools from Shopify could be good for its merchants, depending on how they are implemented. Shopify merchants already spend 10-30% of their revenue on Facebook and Instagram ads – collectively representing tens of billions of dollars (and the largest segment of ecommerce ad dollars going to Meta). Many merchants are looking for alternate venues for their ad spend, especially given the negative impact of Apple’s App Tracking Transparency on Facebook ad performance. While right now Shopify is just offering a value-added data exchange, there’s a pathway for it to build upon that foothold with an expanded suite of advertising tools and analytics.
- All the ecommerce players will see headwinds this year from inflation and interest-rate increases, supply chain issues, and consumers returning to stores. Even Amazon has been affected, reporting its slowest growth in 12 years during its Q1 2022 earnings. As in other sectors (e.g. video streaming), ecommerce players are under pressure to squeeze value from their existing customers. It remains to be seen whether Shopify – which wants to be a “100-year company” – will hold fast to its mission of putting merchants first.
Related Content:
- Jan 21 2022 (3 Shifts): Shopify partners with JD.com to help small businesses reach Chinese consumers
- Feb 26 2021 (3 Shifts): Walmart & Amazon team up with SaaS ecommerce platforms
Disclosure: Contributors have investment interests in Meta, Microsoft, Apple, and Alphabet. Amazon and Google are vendors of 6Pages.
Have a comment about this brief or a topic you'd like to see us cover? Send us a note at tips@6pages.com.
All Briefs
Get unlimited access to all our briefs.
Make better and faster decisions with context on far-reaching shifts.
Become a Member
Already a Member?Log In
Get unlimited access to all our briefs.
Make better and faster decisions with context on what’s changing now.
Become a Member
Already a Member?Log In