Making food detection systems inclusive of global audiences.

Context

Being able to detect food in images with artificial intelligence opens up significant potential value for food delivery services and social networks. However, the enormous variety of foods across countries and cultures makes it difficult for data scientists to identify what users of these services actually find valuable in specific contexts.

We set out to gain insights into where value can be created when tailoring foundation models to a multi-ethnic production context — covering food from 13 countries across four continents.

Adapting to circumstances of pandemic proportions

This work took place during the COVID-19 pandemic, which made it impossible to meet participants in person. As a solution, we invited people from 13 different countries to share a digital meal with us over video chat — asking them to prepare a meal they would otherwise also make at home.

During each meal, participants described their dish, shared images of it, and told us about the broader food culture of their country. Through secondary research, we gathered additional context on each country’s cuisine. We then ran all collected images through four widely used image recognition frameworks — Microsoft Azure, Google Vision, Amazon Rekognition, and IBM Watson — and analysed the results.

How data science teams can incorporate cultural nuances

1. Address sensitive predictions

Alcohol and weapons are two sensitive topics in multi-ethnic production contexts. The current systems often performed poorly on food items containing alcohol, such as a bottle of wine. Separately, dinner knives were repeatedly labelled as weapons — an edge case data scientists should actively explore and address.

2. Create more specific and relevant labels

A lot of value can be added by providing more relevant labels for specific production contexts. The highest impact would come from training models to correctly identify dishes such as spring rolls, bagels, bottle of wine, Cuy, Kaydos, and crab cakes — items that appeared frequently across participants but were consistently misidentified.

3. Address cultural misrepresentations

Some dishes are visually similar but come from culturally distinct contexts. Negative user experiences arise when predictions fail to acknowledge these differences. For example: a Thali should not skew Yemeni food results towards Indian food; a roti should be distinguishable from a tortilla or pita; and chopsticks should be as recognisable as forks and spoons.

4. Address visual misrepresentations

Some dishes are visually so similar that they confuse current recognition systems, despite being completely different foods. Spring rolls are misidentified as sausages; rice, Kaydos, and crab cakes as ice cream; sliced melon as bananas. Improving performance on these visually similar pairs would deliver significant user value.

5. Do not harm people through faulty meat labels

Meat is a sensitive topic across many contexts — for religious, medical, and moral reasons. Presenting faulty meat predictions to these audiences can be seriously detrimental. Among the most pressing issues: Cuy being recognised as duck or chicken; carrot cake being recognised as steak; and mock meat being labelled as real meat.

6. Recognise the limits of image recognition systems

Image recognition is powerful, but not unlimited. Data scientists building for production should be honest about what a system can and cannot reliably infer from an image alone. Can a system truly detect whether a waffle has Kaya and Margarine on it? Can it determine whether a meal is gluten-free or sugar-free? Acknowledging these limits is itself a form of responsible design.

Voedselherkenning die rekening houdt met elke cultuur en keuken.