Computer vision in situ

A ‘video-based contextual inquiry’ with blind people shopping using smart glasses

Authors

  • Brian L. Due University of Copenhagen

DOI:

https://doi.org/10.1558/jircd.27885

Keywords:

gesture interface, conversation analysis, ethnomethodology, smart glasses, vision impairment, shopping, socio-materiality, post-praxiology, computer vision

Abstract

Background: This article shows a visually impaired person (VIP) trying to locate products while shopping using the commercially available computer vision device Orcam.

Method: Based on ethnomethodological conversation analysis and perspectives on materiality and agency from a post-praxiological position, the study shows, through detailed analysis of transcribed video excerpts, the observable sense-making practices related to this technology.

Results: The study shows the VIP and researcher-participants exploring what the device can and cannot do, and how the local body–object–environment relations need to be organized to make the device scan. The study shows the value of applying a post-praxiological approach to understanding socio-material practices. It also introduces the ‘video-based contextual inquiry’ method as a form of researcher engagement in producing the situation and the data collection.

Discussion/conclusion: The article provides two novel contributions: (1) to the field of ethnomethodology and conversation analysis research, with a critical reflection on semi-experimental data collection and the role of the researcher, the materials, and the distribution of agency; and (2) to the field of impairment and disability studies, with insights on locally organized body–object–environment relations and the design of artifacts for computer vision recognition technology.

Author Biography

  • Brian L. Due, University of Copenhagen

    Brian L. Due is an associate professor of communication and social interaction at the Department of Nordic Studies and Linguistics, University of Copenhagen. Due’s research is grounded in video ethnography, ethnomethodology, and multimodal conversation analysis. He does research on social interaction, space, materials, objects, and technologies focusing on action and perception as distributed achievements. His research has lately focused on socio-material organizations. He is co-editor of Social Interaction. Video-Based Studies of Human Sociality and has published in journals such as the Journal of Pragmatics, Semiotica, Discourse Studies, Space and Culture, and Mobilities. Due is editor of the book volume The practical accomplishment of everyday activities without sight (2023).

References

Barad, K. (2007). Meeting the universe halfway: Quantum physics and the entanglement of matter and meaning (2nd print ed.). Durham: Duke University Press Books. https://doi.org/10.2307/j.ctv12101zq

Boldu, R., Matthies, D. J. C., Zhang, H., and Nanayakkara, S. (2020). AiSee: An assistive wearable device to support visually impaired grocery shoppers. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(4), article 119, 1–25. https://doi.org/10.1145/3432196

Coulter, J., and Parsons, E. D. (1990). The praxiology of perception: Visual orientations and practical action. Inquiry, 33(3), 251–272. https://doi.org/10.1080/00201749008602223

Crabtree, A. (2001). Ethnography in participatory design. Lancaster: Lancaster University.

Deleuze, G. (1997). Immanence: A life ... Theory, Culture & Society, 14(2), 3–7. https://doi.org/10.1177/026327697014002002

Deleuze, G. (2001). Pure immanence: Essays on a life (trans. A. Boyman). Cambridge: MIT Press.

Due, B. L. (2017). Respecifying the information sheet: An interactional resource for decision-making in optician shops. Journal of Applied Linguistics and Professional Practice, 14(2), 127–148. https://doi.org/10.1558/jalpp.33663

Due, B. L. (2021). Distributed perception: Co-operation between sense-able, actionable, and accountable semiotic agents. Symbolic Interaction, 44(1), 134–162. https://doi.org/10.1002/symb.538

Due, B. L. (2023a). A walk in the park with Robodog: Navigating around pedestrians using a Spot robot as a ‘guide dog.’ Space and Culture. https://doi.org/10.1177/12063312231159215

Due, B. L. (2023b). Assemmethodology? A commentary. Social Interaction. Video-Based Studies of Human Sociality, 6(1). https://doi.org/10.7146/si.v6i1.137001

Due, B. L. (2023c). Guide dog versus robot dog: Assembling visually impaired people with non-human agents and achieving assisted mobility through distributed co-constructed perception. Mobilities, 18(1), 148–166. https://doi.org/10.1080/17450101.2022.2086059

Due, B. L. (2023d). Interspecies intercorporeality and mediated haptic sociality: Distributing perception with a guide dog. Visual Studies, 38(1), 3–16. https://doi.org/10.1080/1472586X.2021.1951620

Due, B. L. (2023e). Ocularcentric participation frameworks: Dealing with a blind member’s perspective. In P. Haddington, T. Eilittä, A. Kamunen, L. Kohonen-Aho, T. Oittinen, L. Rautiainen, and A. Vatanen (Eds.), Ethnomethodological conversation analysis in motion: Emerging methods and new technologies. (pp. 63–82). Abingdon: Routledge.

Due, B. L. (2023f). Situated socio-material assemblages: Assemmethodology in the making. Human Communication Research, 50(1), 123–142. https://doi.org/10.1093/hcr/hqad031

Due, B. L. (2024a). Assemmethodology: A third way between anthropocentrism and nonanthropocentrism. Magazin 3/4. https://34.sk/en/assemmethodology-a-third-way-between-anthropocentrism-and-non-anthropocentrism

Due, B. L. (2024b). The matter of math: Guiding the blind to touch the Pythagorean theorem. Learning, Culture and Social Interaction, 45, 100792. https://doi.org/10.1016/j.lcsi.2023.100792

Due, B. L. (2024c). The practical accomplishment of living with visual impairment: An EM/CA approach. In B. L. Due (Ed.), The practical accomplishment of everyday activities without sight (pp. 1–26). Abingdon: Routledge. https://doi.org/10.4324/9781003156819-1

Due, B. L., and Lange, S. B. (2018a). Semiotic resources for navigation: A video ethnographic study of blind people’s uses of the white cane and a guide dog for navigating in urban areas. Semiotica, 2018(222), 287–312. https://doi.org/10.1515/sem-2016-0196

Due, B. L., and Lange, S. B. (2018b). Troublesome objects: Unpacking ocular-centrism in urban environments by studying blind navigation using video ethnography and ethnomethodology. Sociological Research Online, 24(4), 475–495. https://doi.org/10.1177/1360780418811963

Due, B. L., and Lüchow, L. (2023). The intelligibility of haptic perception in instructional sequences: When visually impaired people achieve object understanding. Human Studies, 46, 163–182. https://doi.org/10.1007/s10746-023-09664-8

Due, B. L., and Trærup, J. (2018). Passing glasses: Accomplishing deontic stance at the optician. Social Interaction. Video-Based Studies of Human Sociality, 1(2). https://doi.org/10.7146/si.v1i2.110020

Due, B. L., Kupers, R., Lange, S. B., and Ptito, M. (2017). Technology enhanced vision in blind and visually impaired individuals. Synoptik Foundation Research Project. Copenhagen: University of Copenhagen.

Due, B. L., Nielsen, A. M. R., and Jacobsen, S. C. D. (2022). Den sociale konstruktion af uvidenhed: En medlemsskabskategori-analyse (MCA) af samskabelsen af identiteter når ældre møder ny teknologi. NyS, Nydanske Sprogstudier, 61, 9–39. https://doi.org/10.7146/nys.v1i61.132238

Due, B. L., Sakaida, R., Nisisawa, H. Y., and Minami, Y. (2024). From embodied scanning to tactile inspections: When visually impaired persons exhibit object understanding. In B. L. Due (Ed.), The practical accomplishment of everyday activities without sight (pp. 154-180). Abingdon: Routledge. https://doi.org/10.4324/9781003156819-8

Elgendy, M., Sik-Lanyi, C., and Kelemen, A. (2019). Making shopping easy for people with visual impairment using mobile assistive technologies. Applied Sciences, 9(6), 1061. https://doi.org/10.3390/app9061061

Enfield, N. J., and Kockelman, P. (2017). Distributed agency. New York: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780190457204.001.0001

Estrada, J. (2016). Visually impaired: Assistive technologies, challenges and coping strategies. New York: Nova Science Publishers, Incorporated.

Feng, C.-H., Hsieh, J.-Y., Hung, Y.-H., Chen, C.-J., and Chen, C.-H. (2020). Research on the visually impaired individuals shopping with artificial intelligence image recognition assistance. In M. Antona and C. Stephanidis (Eds.), Universal access in human-computer interaction. Applications and practice (pp. 518–531). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-49108-6_37

Galesi, G., Giunipero, L., Leporini, B., and Verdi, G. (2020). SelfLens: A portable tool to facilitate all people in getting information on food items. Proceedings of the International Conference on Advanced Visual Interfaces, article 93, 1–3. New York: ACM. https://doi.org/10.1145/3399715.3399941

Garfinkel, H. (1967). Studies in ethnomethodology. Englewood Cliffs: Prentice Hall.

Garfinkel, H. (1991). Respecification: Evidence for locally produced, naturally accountable phenomena of order, logic, reason, meaning, methods, etc. In and of the essential haecceity of immortal ordinary society. (I) An announcement of studies. In G. Button (Ed.), Ethnomethodology and the human sciences (pp. 10–19). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511611827.003

Garfinkel, H. (1996). Ethnomethodology’s program. Social Psychology Quarterly, 59(1), 5–21. https://doi.org/10.2307/2787116

Garfinkel, H. (2022). Harold Garfinkel: Studies of work in the sciences (Ed. M. E. Lynch). Abingdon: Routledge.

Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin.

Glenn, P., and LeBaron, C. (2011). Epistemic authority in employment interviews: Glancing, pointing, touching. Discourse & Communication, 5(1), 3–22. https://doi.org/10.1177/1750481310390161

Goeting, M. (2018). Seeing the world through machinic eyes: Reflections on computer vision in the arts. Proceedings of the European Conference on Computer Vision (ECCV) Workshops. https://openaccess.thecvf.com/content_eccv_2018_workshops/w13/html/Goeting_Seeing_the_World_Through_Machinic_Eyes_Reflections_on_Computer_Vision_ECCVW_2018_paper.html

Goffman, E. (1964). The neglected situation. American Anthropologist, 66(6), 133–136. https://doi.org/10.1525/aa.1964.66.suppl_3.02a00090

Goffman, E. (1978). Response cries. Language, 54(4), 787–815. https://doi.org/10.2307/413235

Goldstein, E. B., Humphreys, G. W., Shiffrar, M., and Yost, W. A. (2005). The Blackwell handbook of sensation and perception. Oxford: Blackwell Publishing.

Goodwin, C. (1995). Seeing in depth. Social Studies of Science, 25(2), 237–274. https://doi.org/10.1177/030631295025002002

Goodwin, C. (2000a). Action and embodiment within situated human interaction. Journal of Pragmatics, 32(10), 1489–1522. https://doi.org/10.1016/S0378-2166(99)00096-X

Goodwin, C. (2000b). Practices of color classification. Mind, Culture, and Activity, 7(1–2), 19–36. https://doi.org/10.1080/10749039.2000.9677646

Goodwin, C. (2003). Pointing as situated practice. In S. Kita (Ed.), Pointing: Where language, culture and cognition meet (pp. 217–241). Mahwah: Erlbaum:

Goodwin, C. (2006). Human sociality as mutual orientation in a rich interactive environment: Multimodal utterances and pointing in aphasia. In N. Enfield and S. C. Levinson (Eds.), Roots of human sociality (pp. 96–125). Berg Press.

Goodwin, C. (2007). Participation, stance and affect in the organization of activities. Discourse and Society, 18(1), 53–74. https://doi.org/10.1177/0957926507069457

Goodwin, C., and Smith, M. S. (2020). Calibrating professional perception through touch in geological fieldwork. In A. Cekaite and L. Mondada (Eds.), Touch in social interaction: Touch, language and body. Abingdon: Routledge. https://doi.org/10.4324/9781003026631-12

Haraway, D. (1988). Situated knowledges: The science question in feminism and the privilege of partial perspective. Feminist Studies, 14(3), 575–599. https://doi.org/10.2307/3178066

Haviland, J. B. (2000). Pointing, gesture spaces, and mental maps. In D. McNeill (Ed.), Language and gesture (pp. 13–46). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511620850.003

Heath, C., Hindmarsh, J., and Luff, P. (2010). Video in qualitative research: Analysing social interaction in everyday life. Los Angeles: SAGE Publications Ltd.

Heritage, J. (1984). Garfinkel and ethnomethodology. Cambridge: Polity Press.

Heritage, J. (2016). On the diversity of ‘changes of state’ and their indices. Journal of Pragmatics, 104, 207–210. https://doi.org/10.1016/j.pragma.2016.09.007

Hersh, M., and Johnson, M. A. (2010). Assistive technology for visually impaired and blind people. London: Springer Science & Business Media.

Heywood, P. (2017). The ontological turn. Cambridge encyclopedia of anthropology. Cambridge: University of Cambridge. https://doi.org/10.29164/17ontology

Hofstetter, E. (2021). Analyzing the researcher-participant in EMCA. Social Interaction. Video-Based Studies of Human Sociality, 4(2). https://doi.org/10.7146/si.v4i2.127185

Holtzblatt, K., and Beyer, H. (2014). Contextual design: Evolved. Synthesis Lectures on Human-Centered Informatics, 7(4), 1–91. https://doi.org/10.2200/S00597ED1V01Y201409HCI024

Holtzblatt, K., and Jones, S. (1993). Contextual inquiry: A participatory technique for system design. In D. Schuler and A. Namioka (Eds.), Participatory design: Principles and practices. Hillsdale: Lawrence Erlbaum Associates, Publishers.

Hull, J. M. (2013). Touching the rock: An experience of blindness. London: SPCK. http://ebookcentral.proquest.com/lib/kbdk/detail.action?docID=1184965

Ingold, T. (2000). The perception of the environment: Essays on livelihood, dwelling and skill. London: Routledge.

Jay, M. (1994). Downcast eyes: The denigration of vision in twentieth-century French thought. Berkeley: University of California Press. https://doi.org/10.1525/9780520915381

Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. H. Lerner (Ed.), Conversation analysis: Studies from the first generation (pp. 13–31). Amsterdam: John Benjamins Publishing Co. https://doi.org/10.1075/pbns.125.02jef

Johnson, E. S. (2020). Action research. In Oxford research encyclopedia of education. Oxford: Oxford University Press. https://doi.org/10.1093/acrefore/9780190264093.013.696

Katila, J., Gan, Y., Goico, S., and Goodwin, M. H. (2021). Researchers’ participation roles in video-based fieldwork: An introduction to a special issue. Social Interaction. Video-Based Studies of Human Sociality, 4(2). https://doi.org/10.7146/si.v4i2.127184

Kim, J.-E., Bessho, M., Kobayashi, S., Koshizuka, N., and Sakamura, K. (2016). Navigating visually impaired travelers in a large train station using smartphone and bluetooth low energy. Proceedings of the 31st Annual ACM Symposium on Applied Computing, 604–611. https://doi.org/10.1145/2851613.2851716

Klippi, A. (2015). Pointing as an embodied practice in aphasic interaction. Aphasiology, 29(3), 337–354. https://doi.org/10.1080/02687038.2013.878451

Kulyukin, V., and Kutiyanawala, A. (2010). Accessible shopping systems for blind and visually impaired individuals: Design requirements and the state of the art. Open Rehabilitation Journal, 3(1). https://doi.org/10.2174/1874943701003010158

Kulyukin, V., Gharpure, C., and Coster, D. (2008). Robot-assisted shopping for the visually impaired: Proof-of-concept design and feasibility evaluation. Assistive Technology, 20(2), 86–98. https://doi.org/10.1080/10400435.2008.10131935

Kusenbach, M. (2003). Street phenomenology: The go-along as ethnographic research tool. Ethnography, 4(3), 455–485. https://doi.org/10.1177/146613810343007

Laurier, E., Muñoz, D., Miller, R., and Brown, B. (2020). A bip, a beeeep, and a beep beep: How horns are sounded in Chennai traffic. Research on Language and Social Interaction, 53(3), 341–356. https://doi.org/10.1080/08351813.2020.1785775

Lave, J., and Wenger, E. (1991). Situated learning: Legitimate peripheral participation (1st ed.). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511815355

Lewin, K. (1946). Action research and minority problems. Journal of Social Issues, 2(4), 34–46. https://doi.org/10.1111/j.1540-4560.1946.tb02295.x

Lüchow, L., Due, B. L., and Nielsen, A. M. R. (2023). Smartphone tooling: Achieving perception by positioning a smartphone for object scanning. In D. vom Lehn, W. J. Gibson, and N. Ruiz-Junco (Eds.), People, technology, and social organization: Interactionist studies of everyday life. Abingdon: Routledge. https://doi.org/10.4324/9781003277750-16

Lynch, M. (2002). From naturally occurring data to naturally organized ordinary activities: Comment on Speer. Discourse Studies, 4(4), 531–537. https://doi.org/10.1177/14614456020040040801

Macbeth, D. (2022). On detail* and its conceptualisations. Ethnographic Studies, 19. https://doi.org/10.5281/ZENODO.7637998

Merleau-Ponty, M. (2002). Phenomenology of perception. London: Routledge. https://doi.org/10.4324/9780203994610

Mondada, L. (2014a). Conventions for multimodal transcription. https://www.lorenzamondada.net/_files/ugd/ba0dbb_3978d2a34cf44376adb7a341975d23aa.pdf

Mondada, L. (2014b). Pointing, talk, and the bodies. In M. Seyfeddinipur, M. Gullberg, and A. Kendon (Eds.), From gesture in conversation to visible action as utterance: Essays in honor of Adam Kendon (pp. 95–124). Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/z.188.06mon

Mondada, L. (2018). The multimodal interactional organization of tasting: Practices of tasting cheese in gourmet shops. Discourse Studies, 20(6), 743–769. https://doi.org/10.1177/1461445618793439

Mondada, L. (2019). Contemporary issues in conversation analysis: Embodiment and materiality, multimodality and multisensoriality in social interaction. Journal of Pragmatics, 145, 47–62. https://doi.org/10.1016/j.pragma.2019.01.016

Nevile, M., Haddington, P., Heinemann, T., and Rauniomaa, M. (Eds.) (2014). Interacting with objects: Language, materiality, and social activity. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/z.186

Nielsen, A. M. R. (2024). Mitigating responsibility. Attributing membership categories in the face of tech-related troubles. In B. L. Due (Ed.), The practical accomplishment of everyday activities without sight (pp. 112–131). Abingdon: Routledge. https://doi.org/10.4324/9781003156819-6

Nielsen, A. M. R., Due, B. L., and Lüchow, L. (2024). The eye at hand: When visually impaired people distribute ‘seeing’ with sensing AI. Visual Communication. https://doi.org/10.1177/14703572241227517

Nishizaka, A. (2020). Multi-sensory perception during palpation in Japanese midwifery practice. Social Interaction. Video-Based Studies of Human Sociality, 3(1). https://doi.org/10.7146/si.v3i1.120256

Nisisawa, H. Y., and Sakaida, R. (2024). Touching as pointing: How do persons with visual impairment achieve joint attention with sighted persons in Orientation and Mobility training? Journal of Interactional Research in Communication Disorders, 15(3).

Pelikan, H. (2023). Robot sound in interaction: Analyzing and designing sound for human-robot coordination. Linköping: Linköping University. https://doi.org/10.3384/9789180751179

Pelikan, H., Broth, M., and Keevallik, L. (2022). When a robot comes to life: The interactional achievement of agency as a transient phenomenon. Social Interaction. Video-Based Studies of Human Sociality, 5(3). https://doi.org/10.7146/si.v5i3.129915

Raudaskoski, P. (2021). Discourse studies and the material turn: From representation (facts) to participation (concerns). Zeitschrift für Diskursforschung, 2021(2), 244–269. https://doi.org/10.3262/ZFD2102244

Raudaskoski, P. (2023). Ethnomethodological conversation analysis and the study of assemblages. Frontiers in Sociology, 8. https://doi.org/10.3389/fsoc.2023.1206512

Rawls, A. W. (2008). Harold Garfinkel, ethnomethodology and workplace studies. Organization Studies, 29(5), 701–732. https://doi.org/10.1177/0170840608088768

Reeves, S. (2019). How UX practitioners produce findings in usability testing. ACM Transactions on Computer-Human Interaction, 26(1), article 3, 1–38. https://doi.org/10.1145/3299096

Reeves, S., Porcheron, M., and Fischer, J. (2018). ‘This is not what we wanted’: Designing for conversation with voice interfaces. Interactions, 26(1), 46–51. https://doi.org/10.1145/3296699

Sacks, H. L., Schegloff, E. A., and Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50(4), 696–735. https://doi.org/10.1353/lan.1974.0010; https://doi.org/10.2307/412243

Schegloff, E. A., and Sacks, H. L. (1973). Opening up closings. Semiotica, 8(4), 289–327. https://doi.org/10.1515/semi.1973.8.4.289

Stefani, E. D. (2013). The collaborative organisation of next actions in a semiotically rich environment: Shopping as a couple. In P. Haddington, L. Mondada, and M. Nevile (Eds.), Interaction and mobility: Language and the body in motion (pp. 123–151). Berlin: De Gruyter. https://doi.org/10.1515/9783110291278.123

Streeck, J., Goodwin, C., and LeBaron, C. D. (2011). Embodied interaction: Language and body in the material world. New York: Cambridge University Press.

Suchman, L. A. (2007). Human-machine reconfigurations: Plans and situated actions. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511808418

Whitehead, A. N. (1979). Process and reality (2nd revised ed.). New York: Free Press.

Zientara, P. A., Lee, S., Smith, G. H., Brenner, R., Itti, L., …, and Narayanan, V. (2017). Third eye: A shopping assistant for the visually impaired. Computer, 50(2), 16–24. https://doi.org/10.1109/MC.2017.36

Published

2024-10-07

How to Cite

Due, B. L. (2024). Computer vision in situ: A ‘video-based contextual inquiry’ with blind people shopping using smart glasses. Journal of Interactional Research in Communication Disorders. https://doi.org/10.1558/jircd.27885