Preface

Sometimes you’re bored on a Friday night. Sometimes you think of data visualization on a Friday night. Sometimes you think of (or, better yet, eat) dumplings on a Friday night. And sometimes, you do all of the above.

Dumplings

I am a fan of dumplings. Bread-type foods are awesome. Filled foods are awesome. Filled bread is awesome (if you’ve never had a Georgian lobiani, you’re missing out!). A dumpling, though not necessarily having to be filled, combines a lot of this. In my world, a dumpling is normally cooked in some liquid though – that is, boiled, simmered, steamed, or deep-fried, rather than, say, baked. We’ll come back to this.

I work as a researcher in linguistics, and I’m interested in comparing different languages. How are they similar and how are they different? Sometimes, I use multidimensional scaling (MDS) as a way of comparing such similarities or differences. Anyway, on this particular Friday night (August 23rd, 2019), I saw a Twitter post mentioning a “dumpling party” (omg, that in itself = love), at which one person had brought Italian arancini, which received “mixed reception” by the other guests. So I started thinking, what defines a “dumpling”, and can you group them based on various features. Turns out Emily Bender et al. had already established a definition of dumpling at their website promoting the International Day of Dumplings – I did not know of this as of this particular Friday night.

At this point, I thought what if I just look up some dumplings on Wikipedia, and code some of the specific features for each one? This could allow me to plot them in two-dimensional space using MDS methods. I was intrigued by the possible outcome, so I started reading through the Wikipedia article “Dumpling”

Sampling

My method wasn’t very sophisticated. I started by scrolling through the Wikipedia article Dumpling, and basically took each item that had its own article (I may have missed a few), transferred it to a spreadsheet, and started adding some columns for particular features. As I started getting more items, the number of features grew, and I have to be honest, they are far from perfect and just things that I thought of on the spot. A much better analysis could easily be done with a bit more sophistication, time, and systematicity. I mean, this was after all just a random Friday night and I was already past my bed time…

In the end, I collected data from 61 different dumplings. Note that when I say different, this categorization is not at all well-defined. I tried to exclude duplicates for which even the names themselves were cognates. Chinese jiǎozi (餃子) is related to Japanese gyōza (ギョーザ) and thus only the former was included. However, in the case of German Knödel and Czech knedlík, I decided to include both simply based on the fact that whereas the former was listed as potentially being served with a sweet filling, I did not observe this with the latter – NB: I promise this wasn’t a biased decision (see the Bonus section).

Beyond the dumplings sampled according to the above criteria, I also included a few items not traditionally seen as dumplings, because I wanted to see how they patterned with the dumplings proper. These items include calzone, bitterballen, and – of course – arancini. The resulting sample of 61 items can be seen below.

Dumpling sample

##  [1] "Banku"         "Tihlo"         "American"      "Manti"        
##  [5] "Joshpara"      "Jiaozi"        "Guotie"        "Wonton"       
##  [9] "Gujia"         "Samosa"        "Kozhukkatta"   "Modak"        
## [13] "Kachori"       "Momo"          "Pitha"         "Nevryo"       
## [17] "Siomay"        "Dango"         "Baozi"         "Buuz"         
## [21] "Khuushuur"     "Yomari"        "Empanada"      "Pastei"       
## [25] "Coxinha"       "Pantruca"      "Bunuelo"       "Tamale"       
## [29] "Pastel"        "Paime"         "British"       "Cotswold"     
## [33] "Clootie"       "Kn<U+00F6>del" "Maultasche"    "Halusky"      
## [37] "Shlishkes"     "Knedlik"       "Khinkali"      "Pierogi"      
## [41] "Tortellini"    "Arancini"      "Bitterbal"     "Kreplach"     
## [45] "Pelmeni"       "Kluski"        "Kalduny"       "Cepelinai"    
## [49] "Borek"         "Ravioli"       "Gnocchi"       "Calzone"      
## [53] "Pastizz"       "Kroppkaka"     "Palt"          "Asida"        
## [57] "Qatayef"       "Kibbeh"        "Mataz"         "MatzahBall"   
## [61] "Knish"

The features that I ended up settling on amounted to 15 different ones. Some are related but not mutually exclusive. The concern mainly content, form, and preparation. Some dumplings can be prepared in several ways (e.g., either boiled or steamed) and may be either sweet or savory; others seem to be more strict with regard to some of these features.

Dumpling features

  1. Flour: Is flour (any type) used in the dough?
  2. Bread: Are bread bits or breadcrumbs added?
  3. Battered: Is the dough battered (or itself a batter) before prepared?
  4. Starch: Is any starchy produce added to the dough (e.g., potatoes)
  5. Plain: Does it occur without filling?
  6. Filled: Does it occur with filling?
  7. Thick: Is the doughy coating “thick” (yes, this is quite subjective)
  8. Boiled: Can it be boiled?
  9. Steamed: Can it be steamed?
  10. DeepFried: Can it be deep-fried?
  11. Fried: Can it be fried (e.g., in a pan or a grill)?
  12. Baked: Can it be baked (e.g., in an oven)?
  13. Raw: Can it be served basically as is?
  14. Sweet: Is it sweet?
  15. Savory: Is it savory?

Each dumpling was assigned a binary value (0 or 1) per feature, resulting in the below table – again, not that this was not very serious work on a Friday night and I was tired; errors are to be expected (but feel free to re-work this yourself if you feel like it!).

Coded features
Name Flour Bread Battered Starch Plain Filled Thick Boiled Steamed DeepFried Fried Baked Raw Sweet Savory
Banku 1 0 0 1 1 0 1 1 0 0 0 0 0 0 1
Tihlo 1 0 0 0 1 0 1 0 0 0 0 0 1 0 1
American 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1
Manti 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1
Joshpara 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1
Jiaozi 1 0 0 0 0 1 0 1 1 0 0 0 0 0 1
Guotie 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1
Wonton 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1
Gujia 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0
Samosa 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1
Kozhukkatta 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1
Modak 1 0 0 0 0 1 0 1 1 0 0 0 0 1 0
Kachori 1 0 0 0 0 1 1 0 0 1 0 0 0 1 1
Momo 1 0 0 0 0 1 0 0 1 0 1 0 0 1 1
Pitha 1 0 0 0 1 1 1 0 1 1 1 1 0 1 1
Nevryo 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0
Siomay 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1
Dango 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0
Baozi 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1
Buuz 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1
Khuushuur 1 0 0 0 0 1 1 0 0 0 1 0 0 0 1
Yomari 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0
Empanada 1 0 0 0 0 1 1 0 0 1 1 1 0 0 1
Pastei 1 0 0 0 0 1 1 0 0 1 0 0 0 1 1
Coxinha 1 0 1 0 0 1 1 0 0 1 0 0 0 0 1
Pantruca 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1
Bunuelo 1 1 0 0 1 1 1 0 0 1 0 0 0 1 1
Tamale 1 0 0 1 0 1 0 0 1 0 0 0 0 1 1
Pastel 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1
Paime 1 0 0 0 1 0 1 0 1 0 0 0 0 0 1
British 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1
Cotswold 1 1 0 0 0 1 1 0 0 1 0 0 0 0 1
Clootie 1 1 0 0 0 1 1 1 0 0 0 0 0 1 0
Knodel 1 1 0 1 1 1 1 1 0 0 0 0 0 1 1
Maultasche 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1
Halusky 1 0 1 1 1 0 1 1 0 0 0 0 0 0 1
Shlishkes 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1
Knedlik 1 1 0 1 1 1 1 1 0 0 0 0 0 0 1
Khinkali 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1
Pierogi 1 0 0 0 0 1 0 1 0 0 1 0 0 1 1
Tortellini 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1
Arancini 1 1 1 0 0 1 1 0 0 1 0 0 0 0 1
Bitterbal 1 1 1 0 0 0 1 0 0 1 0 0 0 0 1
Kreplach 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1
Pelmeni 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1
Kluski 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1
Kalduny 1 0 0 0 0 1 0 1 0 0 1 1 0 0 1
Cepelinai 1 0 0 1 0 1 1 1 0 0 0 0 0 0 1
Burek 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1
Ravioli 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1
Gnocchi 1 1 0 1 1 0 1 1 0 0 0 0 0 0 1
Calzone 1 0 0 0 0 1 1 0 0 0 0 1 0 0 1
Pastizz 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1
Kroppkaka 1 0 0 1 0 1 1 1 0 0 0 0 0 0 1
Palt 1 0 0 1 0 1 1 1 0 0 0 0 0 0 1
Asida 1 0 0 0 1 0 1 1 0 0 0 0 0 1 1
Qatayef 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0
Kibbeh 0 0 0 0 0 1 1 1 0 0 1 1 0 0 1
Mataz 1 0 0 0 0 1 0 1 1 0 0 0 0 0 1
MatzahBall 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1
Knish 1 0 0 0 0 1 1 0 0 1 1 1 0 0 1

With this feature coding, I made a distance matrix with all the dumplings compared pairwise, giving each pair a similarity score based on how many features did not overlap between them. This table is seen below.

Distance matrix
X Banku Tihlo American Manti Joshpara Jiaozi Guotie Wonton Gujia Samosa Kozhukkatta Modak Kachori Momo Pitha Nevryo Siomay Dango Baozi Buuz Khuushuur Yomari Empanada Pastei Coxinha Pantruca Bunuelo Tamale Pastel Paime British Cotswold Clootie Knodel Maultasche Halusky Shlishkes Knedlik Khinkali Pierogi Tortellini Arancini Bitterbal Kreplach Pelmeni Kluski Kalduny Cepelinai Burek Ravioli Gnocchi Calzone Pastizz Kroppkaka Palt Asida Qatayef Kibbeh Mataz MatzahBall Knish
Banku 0 3 1 6 4 5 6 6 7 6 4 7 6 8 8 7 6 7 6 5 5 7 7 6 6 1 6 6 6 3 1 6 6 3 4 1 3 2 5 6 4 7 6 5 5 1 6 2 7 4 1 5 6 2 2 2 8 6 5 3 7
Tihlo 3 0 2 5 5 6 5 5 6 5 5 8 5 7 7 6 5 8 5 4 4 6 6 5 5 2 5 7 5 2 2 5 7 6 5 4 6 5 6 7 5 6 5 6 6 2 7 5 6 5 4 4 5 5 5 3 7 7 6 4 6
American 1 2 0 5 3 4 5 5 6 5 3 6 5 7 7 6 5 6 5 4 4 6 6 5 5 0 5 7 5 2 0 5 5 4 3 2 4 3 4 5 3 6 5 4 4 0 5 3 6 3 2 4 5 3 3 1 7 5 4 2 6
Manti 6 5 5 0 2 1 2 2 5 2 4 3 4 2 6 5 0 5 2 1 3 3 5 4 4 5 6 2 2 3 5 4 6 7 2 7 9 6 3 4 2 5 6 3 3 5 4 4 3 2 7 3 2 4 4 6 4 6 1 3 5
Joshpara 4 5 3 2 0 1 2 2 5 2 2 3 4 4 8 5 2 3 4 3 3 5 5 4 4 3 6 4 4 5 3 4 4 5 0 5 7 4 1 2 0 5 6 1 1 3 2 2 3 0 5 3 2 2 2 4 4 4 1 1 5
Jiaozi 5 6 4 1 1 0 3 3 6 3 3 2 5 3 7 6 1 4 3 2 4 4 6 5 5 4 7 3 3 4 4 5 5 6 1 6 8 5 2 3 1 6 7 2 2 4 3 3 4 1 6 4 3 3 3 5 5 5 0 2 6
Guotie 6 5 5 2 2 3 0 0 5 2 4 5 4 2 6 5 2 3 4 3 1 5 3 4 4 5 6 4 4 5 5 4 6 7 2 7 9 6 1 2 2 5 6 1 1 5 2 4 3 2 7 3 2 4 4 6 2 4 3 3 3
Wonton 6 5 5 2 2 3 0 0 5 2 4 5 4 2 6 5 2 3 4 3 1 5 3 4 4 5 6 4 4 5 5 4 6 7 2 7 9 6 1 2 2 5 6 1 1 5 2 4 3 2 7 3 2 4 4 6 2 4 3 3 3
Gujia 7 6 6 5 5 6 5 5 0 3 3 4 1 5 5 0 5 4 3 4 4 2 4 1 3 6 3 5 3 6 6 3 3 6 5 8 8 7 6 5 5 4 5 6 6 6 7 5 4 5 8 4 5 5 5 5 3 7 6 4 4
Samosa 6 5 5 2 2 3 2 2 3 0 4 5 2 4 6 3 2 5 4 3 3 5 3 2 2 5 4 4 4 5 5 2 6 7 2 7 9 6 3 4 2 3 4 3 3 5 4 4 3 2 7 3 2 4 4 6 4 6 3 3 3
Kozhukkatta 4 5 3 4 2 3 4 4 3 4 0 3 2 4 6 3 4 3 2 3 3 3 5 2 4 3 4 4 2 5 3 4 2 3 2 5 5 4 3 2 2 5 6 3 3 3 4 2 3 2 5 3 4 2 2 2 4 4 3 1 5
Modak 7 8 6 3 3 2 5 5 4 5 3 0 5 3 7 4 3 2 3 4 6 2 8 5 7 6 7 3 3 6 6 7 3 6 3 8 8 7 4 3 3 8 9 4 4 6 5 5 4 3 8 6 5 5 5 5 3 7 2 4 8
Kachori 6 5 5 4 4 5 4 4 1 2 2 5 0 4 4 1 4 5 2 3 3 3 3 0 2 5 2 4 2 5 5 2 4 5 4 7 7 6 5 4 4 3 4 5 5 5 6 4 3 4 7 3 4 4 4 4 4 6 5 3 3
Momo 8 7 7 2 4 3 2 2 5 4 4 3 4 0 4 5 2 3 2 3 3 3 5 4 6 7 6 2 2 5 7 6 6 7 4 9 9 8 3 2 4 7 8 3 3 7 4 6 3 4 9 5 4 6 6 6 2 6 3 5 5
Pitha 8 7 7 6 8 7 6 6 5 6 6 7 4 4 0 5 6 7 4 5 5 5 3 4 6 7 4 6 4 5 7 6 8 7 8 9 9 8 7 6 8 7 8 7 7 7 6 8 5 8 9 5 6 8 8 6 6 6 7 7 3
Nevryo 7 6 6 5 5 6 5 5 0 3 3 4 1 5 5 0 5 4 3 4 4 2 4 1 3 6 3 5 3 6 6 3 3 6 5 8 8 7 6 5 5 4 5 6 6 6 7 5 4 5 8 4 5 5 5 5 3 7 6 4 4
Siomay 6 5 5 0 2 1 2 2 5 2 4 3 4 2 6 5 0 5 2 1 3 3 5 4 4 5 6 2 2 3 5 4 6 7 2 7 9 6 3 4 2 5 6 3 3 5 4 4 3 2 7 3 2 4 4 6 4 6 1 3 5
Dango 7 8 6 5 3 4 3 3 4 5 3 2 5 3 7 4 5 0 5 6 4 4 6 5 7 6 7 5 5 8 6 7 3 6 3 8 8 7 2 1 3 8 9 2 2 6 3 5 4 3 8 6 5 5 5 5 1 5 4 4 6
Baozi 6 5 5 2 4 3 4 4 3 4 2 3 2 2 4 3 2 5 0 1 3 1 5 2 4 5 4 2 0 3 5 4 4 5 4 7 7 6 5 4 4 5 6 5 5 5 6 4 3 4 7 3 4 4 4 4 4 6 3 3 5
Buuz 5 4 4 1 3 2 3 3 4 3 3 4 3 3 5 4 1 6 1 0 2 2 4 3 3 4 5 3 1 2 4 3 5 6 3 6 8 5 4 5 3 4 5 4 4 4 5 3 4 3 6 2 3 3 3 5 5 5 2 2 4
Khuushuur 5 4 4 3 3 4 1 1 4 3 3 6 3 3 5 4 3 4 3 2 0 4 2 3 3 4 5 5 3 4 4 3 5 6 3 6 8 5 2 3 3 4 5 2 2 4 3 3 4 3 6 2 3 3 3 5 3 3 4 2 2
Yomari 7 6 6 3 5 4 5 5 2 5 3 2 3 3 5 2 3 4 1 2 4 0 6 3 5 6 5 3 1 4 6 5 3 6 5 8 8 7 6 5 5 6 7 6 6 6 7 5 4 5 8 4 5 5 5 5 3 7 4 4 6
Empanada 7 6 6 5 5 6 3 3 4 3 5 8 3 5 3 4 5 6 5 4 2 6 0 3 3 6 5 7 5 6 6 3 7 8 5 8 10 7 4 5 5 4 5 4 4 6 3 5 4 5 8 2 3 5 5 7 5 3 6 4 0
Pastei 6 5 5 4 4 5 4 4 1 2 2 5 0 4 4 1 4 5 2 3 3 3 3 0 2 5 2 4 2 5 5 2 4 5 4 7 7 6 5 4 4 3 4 5 5 5 6 4 3 4 7 3 4 4 4 4 4 6 5 3 3
Coxinha 6 5 5 4 4 5 4 4 3 2 4 7 2 6 6 3 4 7 4 3 3 5 3 2 0 5 4 6 4 5 5 2 6 7 4 5 7 6 5 6 4 1 2 5 5 5 6 4 5 4 7 3 4 4 4 6 6 6 5 3 3
Pantruca 1 2 0 5 3 4 5 5 6 5 3 6 5 7 7 6 5 6 5 4 4 6 6 5 5 0 5 7 5 2 0 5 5 4 3 2 4 3 4 5 3 6 5 4 4 0 5 3 6 3 2 4 5 3 3 1 7 5 4 2 6
Bunuelo 6 5 5 6 6 7 6 6 3 4 4 7 2 6 4 3 6 7 4 5 5 5 5 2 4 5 0 6 4 5 5 2 4 3 6 7 5 4 7 6 6 3 4 7 7 5 8 6 5 6 5 5 6 6 6 4 6 8 7 5 5
Tamale 6 7 7 2 4 3 4 4 5 4 4 3 4 2 6 5 2 5 2 3 5 3 7 4 6 7 6 0 2 5 7 6 6 5 4 7 7 6 5 4 4 7 8 5 5 7 6 4 3 4 7 5 4 4 4 6 4 8 3 5 7
Pastel 6 5 5 2 4 3 4 4 3 4 2 3 2 2 4 3 2 5 0 1 3 1 5 2 4 5 4 2 0 3 5 4 4 5 4 7 7 6 5 4 4 5 6 5 5 5 6 4 3 4 7 3 4 4 4 4 4 6 3 3 5
Paime 3 2 2 3 5 4 5 5 6 5 5 6 5 5 5 6 3 8 3 2 4 4 6 5 5 2 5 5 3 0 2 5 7 6 5 4 6 5 6 7 5 6 5 6 6 2 7 5 6 5 4 4 5 5 5 3 7 7 4 4 6
British 1 2 0 5 3 4 5 5 6 5 3 6 5 7 7 6 5 6 5 4 4 6 6 5 5 0 5 7 5 2 0 5 5 4 3 2 4 3 4 5 3 6 5 4 4 0 5 3 6 3 2 4 5 3 3 1 7 5 4 2 6
Cotswold 6 5 5 4 4 5 4 4 3 2 4 7 2 6 6 3 4 7 4 3 3 5 3 2 2 5 2 6 4 5 5 0 4 5 4 7 7 4 5 6 4 1 2 5 5 5 6 4 5 4 5 3 4 4 4 6 6 6 5 3 3
Clootie 6 7 5 6 4 5 6 6 3 6 2 3 4 6 8 3 6 3 4 5 5 3 7 4 6 5 4 6 4 7 5 4 0 3 4 7 5 4 5 4 4 5 6 5 5 5 6 4 5 4 5 5 6 4 4 4 4 6 5 3 7
Knodel 3 6 4 7 5 6 7 7 6 7 3 6 5 7 7 6 7 6 5 6 6 6 8 5 7 4 3 5 5 6 4 5 3 0 5 4 2 1 6 5 5 6 7 6 6 4 7 3 6 5 2 6 7 3 3 3 7 7 6 4 8
Maultasche 4 5 3 2 0 1 2 2 5 2 2 3 4 4 8 5 2 3 4 3 3 5 5 4 4 3 6 4 4 5 3 4 4 5 0 5 7 4 1 2 0 5 6 1 1 3 2 2 3 0 5 3 2 2 2 4 4 4 1 1 5
Halusky 1 4 2 7 5 6 7 7 8 7 5 8 7 9 9 8 7 8 7 6 6 8 8 7 5 2 7 7 7 4 2 7 7 4 5 0 2 3 6 7 5 6 5 6 6 2 7 3 8 5 2 6 7 3 3 3 9 7 6 4 8
Shlishkes 3 6 4 9 7 8 9 9 8 9 5 8 7 9 9 8 9 8 7 8 8 8 10 7 7 4 5 7 7 6 4 7 5 2 7 2 0 3 8 7 7 6 5 8 8 4 9 5 8 7 2 8 9 5 5 3 9 9 8 6 10
Knedlik 2 5 3 6 4 5 6 6 7 6 4 7 6 8 8 7 6 7 6 5 5 7 7 6 6 3 4 6 6 5 3 4 4 1 4 3 3 0 5 6 4 5 6 5 5 3 6 2 7 4 1 5 6 2 2 4 8 6 5 3 7
Khinkali 5 6 4 3 1 2 1 1 6 3 3 4 5 3 7 6 3 2 5 4 2 6 4 5 5 4 7 5 5 6 4 5 5 6 1 6 8 5 0 1 1 6 7 0 0 4 1 3 4 1 6 4 3 3 3 5 3 3 2 2 4
Pierogi 6 7 5 4 2 3 2 2 5 4 2 3 4 2 6 5 4 1 4 5 3 5 5 4 6 5 6 4 4 7 5 6 4 5 2 7 7 6 1 0 2 7 8 1 1 5 2 4 3 2 7 5 4 4 4 4 2 4 3 3 5
Tortellini 4 5 3 2 0 1 2 2 5 2 2 3 4 4 8 5 2 3 4 3 3 5 5 4 4 3 6 4 4 5 3 4 4 5 0 5 7 4