Computer vision in situ
A ‘video-based contextual inquiry’ with blind people shopping using smart glasses
DOI:
https://doi.org/10.1558/jircd.27885Keywords:
gesture interface, conversation analysis, ethnomethodology, smart glasses, vision impairment, shopping, socio-materiality, post-praxiology, computer visionAbstract
Background: This article shows a visually impaired person (VIP) trying to locate products while shopping using the commercially available computer vision device Orcam.
Method: Based on ethnomethodological conversation analysis and perspectives on materiality and agency from a post-praxiological position, the study shows, through detailed analysis of transcribed video excerpts, the observable sense-making practices related to this technology.
Results: The study shows the VIP and researcher-participants exploring what the device can and cannot do, and how the local body–object–environment relations need to be organized to make the device scan. The study shows the value of applying a post-praxiological approach to understanding socio-material practices. It also introduces the ‘video-based contextual inquiry’ method as a form of researcher engagement in producing the situation and the data collection.
Discussion/conclusion: The article provides two novel contributions: (1) to the field of ethnomethodology and conversation analysis research, with a critical reflection on semi-experimental data collection and the role of the researcher, the materials, and the distribution of agency; and (2) to the field of impairment and disability studies, with insights on locally organized body–object–environment relations and the design of artifacts for computer vision recognition technology.
References
Barad, K. (2007). Meeting the universe halfway: Quantum physics and the entanglement of matter and meaning (2nd print ed.). Durham: Duke University Press Books. https://doi.org/10.2307/j.ctv12101zq
Boldu, R., Matthies, D. J. C., Zhang, H., and Nanayakkara, S. (2020). AiSee: An assistive wearable device to support visually impaired grocery shoppers. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(4), article 119, 1–25. https://doi.org/10.1145/3432196
Coulter, J., and Parsons, E. D. (1990). The praxiology of perception: Visual orientations and practical action. Inquiry, 33(3), 251–272. https://doi.org/10.1080/00201749008602223
Crabtree, A. (2001). Ethnography in participatory design. Lancaster: Lancaster University.
Deleuze, G. (1997). Immanence: A life ... Theory, Culture & Society, 14(2), 3–7. https://doi.org/10.1177/026327697014002002
Deleuze, G. (2001). Pure immanence: Essays on a life (trans. A. Boyman). Cambridge: MIT Press.
Due, B. L. (2017). Respecifying the information sheet: An interactional resource for decision-making in optician shops. Journal of Applied Linguistics and Professional Practice, 14(2), 127–148. https://doi.org/10.1558/jalpp.33663
Due, B. L. (2021). Distributed perception: Co-operation between sense-able, actionable, and accountable semiotic agents. Symbolic Interaction, 44(1), 134–162. https://doi.org/10.1002/symb.538
Due, B. L. (2023a). A walk in the park with Robodog: Navigating around pedestrians using a Spot robot as a ‘guide dog.’ Space and Culture. https://doi.org/10.1177/12063312231159215
Due, B. L. (2023b). Assemmethodology? A commentary. Social Interaction. Video-Based Studies of Human Sociality, 6(1). https://doi.org/10.7146/si.v6i1.137001
Due, B. L. (2023c). Guide dog versus robot dog: Assembling visually impaired people with non-human agents and achieving assisted mobility through distributed co-constructed perception. Mobilities, 18(1), 148–166. https://doi.org/10.1080/17450101.2022.2086059
Due, B. L. (2023d). Interspecies intercorporeality and mediated haptic sociality: Distributing perception with a guide dog. Visual Studies, 38(1), 3–16. https://doi.org/10.1080/1472586X.2021.1951620
Due, B. L. (2023e). Ocularcentric participation frameworks: Dealing with a blind member’s perspective. In P. Haddington, T. Eilittä, A. Kamunen, L. Kohonen-Aho, T. Oittinen, L. Rautiainen, and A. Vatanen (Eds.), Ethnomethodological conversation analysis in motion: Emerging methods and new technologies. (pp. 63–82). Abingdon: Routledge.
Due, B. L. (2023f). Situated socio-material assemblages: Assemmethodology in the making. Human Communication Research, 50(1), 123–142. https://doi.org/10.1093/hcr/hqad031
Due, B. L. (2024a). Assemmethodology: A third way between anthropocentrism and nonanthropocentrism. Magazin 3/4. https://34.sk/en/assemmethodology-a-third-way-between-anthropocentrism-and-non-anthropocentrism
Due, B. L. (2024b). The matter of math: Guiding the blind to touch the Pythagorean theorem. Learning, Culture and Social Interaction, 45, 100792. https://doi.org/10.1016/j.lcsi.2023.100792
Due, B. L. (2024c). The practical accomplishment of living with visual impairment: An EM/CA approach. In B. L. Due (Ed.), The practical accomplishment of everyday activities without sight (pp. 1–26). Abingdon: Routledge. https://doi.org/10.4324/9781003156819-1
Due, B. L., and Lange, S. B. (2018a). Semiotic resources for navigation: A video ethnographic study of blind people’s uses of the white cane and a guide dog for navigating in urban areas. Semiotica, 2018(222), 287–312. https://doi.org/10.1515/sem-2016-0196
Due, B. L., and Lange, S. B. (2018b). Troublesome objects: Unpacking ocular-centrism in urban environments by studying blind navigation using video ethnography and ethnomethodology. Sociological Research Online, 24(4), 475–495. https://doi.org/10.1177/1360780418811963
Due, B. L., and Lüchow, L. (2023). The intelligibility of haptic perception in instructional sequences: When visually impaired people achieve object understanding. Human Studies, 46, 163–182. https://doi.org/10.1007/s10746-023-09664-8
Due, B. L., and Trærup, J. (2018). Passing glasses: Accomplishing deontic stance at the optician. Social Interaction. Video-Based Studies of Human Sociality, 1(2). https://doi.org/10.7146/si.v1i2.110020
Due, B. L., Kupers, R., Lange, S. B., and Ptito, M. (2017). Technology enhanced vision in blind and visually impaired individuals. Synoptik Foundation Research Project. Copenhagen: University of Copenhagen.
Due, B. L., Nielsen, A. M. R., and Jacobsen, S. C. D. (2022). Den sociale konstruktion af uvidenhed: En medlemsskabskategori-analyse (MCA) af samskabelsen af identiteter når ældre møder ny teknologi. NyS, Nydanske Sprogstudier, 61, 9–39. https://doi.org/10.7146/nys.v1i61.132238
Due, B. L., Sakaida, R., Nisisawa, H. Y., and Minami, Y. (2024). From embodied scanning to tactile inspections: When visually impaired persons exhibit object understanding. In B. L. Due (Ed.), The practical accomplishment of everyday activities without sight (pp. 154-180). Abingdon: Routledge. https://doi.org/10.4324/9781003156819-8
Elgendy, M., Sik-Lanyi, C., and Kelemen, A. (2019). Making shopping easy for people with visual impairment using mobile assistive technologies. Applied Sciences, 9(6), 1061. https://doi.org/10.3390/app9061061
Enfield, N. J., and Kockelman, P. (2017). Distributed agency. New York: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780190457204.001.0001
Estrada, J. (2016). Visually impaired: Assistive technologies, challenges and coping strategies. New York: Nova Science Publishers, Incorporated.
Feng, C.-H., Hsieh, J.-Y., Hung, Y.-H., Chen, C.-J., and Chen, C.-H. (2020). Research on the visually impaired individuals shopping with artificial intelligence image recognition assistance. In M. Antona and C. Stephanidis (Eds.), Universal access in human-computer interaction. Applications and practice (pp. 518–531). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-49108-6_37
Galesi, G., Giunipero, L., Leporini, B., and Verdi, G. (2020). SelfLens: A portable tool to facilitate all people in getting information on food items. Proceedings of the International Conference on Advanced Visual Interfaces, article 93, 1–3. New York: ACM. https://doi.org/10.1145/3399715.3399941
Garfinkel, H. (1967). Studies in ethnomethodology. Englewood Cliffs: Prentice Hall.
Garfinkel, H. (1991). Respecification: Evidence for locally produced, naturally accountable phenomena of order, logic, reason, meaning, methods, etc. In and of the essential haecceity of immortal ordinary society. (I) An announcement of studies. In G. Button (Ed.), Ethnomethodology and the human sciences (pp. 10–19). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511611827.003
Garfinkel, H. (1996). Ethnomethodology’s program. Social Psychology Quarterly, 59(1), 5–21. https://doi.org/10.2307/2787116
Garfinkel, H. (2022). Harold Garfinkel: Studies of work in the sciences (Ed. M. E. Lynch). Abingdon: Routledge.
Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin.
Glenn, P., and LeBaron, C. (2011). Epistemic authority in employment interviews: Glancing, pointing, touching. Discourse & Communication, 5(1), 3–22. https://doi.org/10.1177/1750481310390161
Goeting, M. (2018). Seeing the world through machinic eyes: Reflections on computer vision in the arts. Proceedings of the European Conference on Computer Vision (ECCV) Workshops. https://openaccess.thecvf.com/content_eccv_2018_workshops/w13/html/Goeting_Seeing_the_World_Through_Machinic_Eyes_Reflections_on_Computer_Vision_ECCVW_2018_paper.html
Goffman, E. (1964). The neglected situation. American Anthropologist, 66(6), 133–136. https://doi.org/10.1525/aa.1964.66.suppl_3.02a00090
Goffman, E. (1978). Response cries. Language, 54(4), 787–815. https://doi.org/10.2307/413235
Goldstein, E. B., Humphreys, G. W., Shiffrar, M., and Yost, W. A. (2005). The Blackwell handbook of sensation and perception. Oxford: Blackwell Publishing.
Goodwin, C. (1995). Seeing in depth. Social Studies of Science, 25(2), 237–274. https://doi.org/10.1177/030631295025002002
Goodwin, C. (2000a). Action and embodiment within situated human interaction. Journal of Pragmatics, 32(10), 1489–1522. https://doi.org/10.1016/S0378-2166(99)00096-X
Goodwin, C. (2000b). Practices of color classification. Mind, Culture, and Activity, 7(1–2), 19–36. https://doi.org/10.1080/10749039.2000.9677646
Goodwin, C. (2003). Pointing as situated practice. In S. Kita (Ed.), Pointing: Where language, culture and cognition meet (pp. 217–241). Mahwah: Erlbaum:
Goodwin, C. (2006). Human sociality as mutual orientation in a rich interactive environment: Multimodal utterances and pointing in aphasia. In N. Enfield and S. C. Levinson (Eds.), Roots of human sociality (pp. 96–125). Berg Press.
Goodwin, C. (2007). Participation, stance and affect in the organization of activities. Discourse and Society, 18(1), 53–74. https://doi.org/10.1177/0957926507069457
Goodwin, C., and Smith, M. S. (2020). Calibrating professional perception through touch in geological fieldwork. In A. Cekaite and L. Mondada (Eds.), Touch in social interaction: Touch, language and body. Abingdon: Routledge. https://doi.org/10.4324/9781003026631-12
Haraway, D. (1988). Situated knowledges: The science question in feminism and the privilege of partial perspective. Feminist Studies, 14(3), 575–599. https://doi.org/10.2307/3178066
Haviland, J. B. (2000). Pointing, gesture spaces, and mental maps. In D. McNeill (Ed.), Language and gesture (pp. 13–46). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511620850.003
Heath, C., Hindmarsh, J., and Luff, P. (2010). Video in qualitative research: Analysing social interaction in everyday life. Los Angeles: SAGE Publications Ltd.
Heritage, J. (1984). Garfinkel and ethnomethodology. Cambridge: Polity Press.
Heritage, J. (2016). On the diversity of ‘changes of state’ and their indices. Journal of Pragmatics, 104, 207–210. https://doi.org/10.1016/j.pragma.2016.09.007
Hersh, M., and Johnson, M. A. (2010). Assistive technology for visually impaired and blind people. London: Springer Science & Business Media.
Heywood, P. (2017). The ontological turn. Cambridge encyclopedia of anthropology. Cambridge: University of Cambridge. https://doi.org/10.29164/17ontology
Hofstetter, E. (2021). Analyzing the researcher-participant in EMCA. Social Interaction. Video-Based Studies of Human Sociality, 4(2). https://doi.org/10.7146/si.v4i2.127185
Holtzblatt, K., and Beyer, H. (2014). Contextual design: Evolved. Synthesis Lectures on Human-Centered Informatics, 7(4), 1–91. https://doi.org/10.2200/S00597ED1V01Y201409HCI024
Holtzblatt, K., and Jones, S. (1993). Contextual inquiry: A participatory technique for system design. In D. Schuler and A. Namioka (Eds.), Participatory design: Principles and practices. Hillsdale: Lawrence Erlbaum Associates, Publishers.
Hull, J. M. (2013). Touching the rock: An experience of blindness. London: SPCK. http://ebookcentral.proquest.com/lib/kbdk/detail.action?docID=1184965
Ingold, T. (2000). The perception of the environment: Essays on livelihood, dwelling and skill. London: Routledge.
Jay, M. (1994). Downcast eyes: The denigration of vision in twentieth-century French thought. Berkeley: University of California Press. https://doi.org/10.1525/9780520915381
Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. H. Lerner (Ed.), Conversation analysis: Studies from the first generation (pp. 13–31). Amsterdam: John Benjamins Publishing Co. https://doi.org/10.1075/pbns.125.02jef
Johnson, E. S. (2020). Action research. In Oxford research encyclopedia of education. Oxford: Oxford University Press. https://doi.org/10.1093/acrefore/9780190264093.013.696
Katila, J., Gan, Y., Goico, S., and Goodwin, M. H. (2021). Researchers’ participation roles in video-based fieldwork: An introduction to a special issue. Social Interaction. Video-Based Studies of Human Sociality, 4(2). https://doi.org/10.7146/si.v4i2.127184
Kim, J.-E., Bessho, M., Kobayashi, S., Koshizuka, N., and Sakamura, K. (2016). Navigating visually impaired travelers in a large train station using smartphone and bluetooth low energy. Proceedings of the 31st Annual ACM Symposium on Applied Computing, 604–611. https://doi.org/10.1145/2851613.2851716
Klippi, A. (2015). Pointing as an embodied practice in aphasic interaction. Aphasiology, 29(3), 337–354. https://doi.org/10.1080/02687038.2013.878451
Kulyukin, V., and Kutiyanawala, A. (2010). Accessible shopping systems for blind and visually impaired individuals: Design requirements and the state of the art. Open Rehabilitation Journal, 3(1). https://doi.org/10.2174/1874943701003010158
Kulyukin, V., Gharpure, C., and Coster, D. (2008). Robot-assisted shopping for the visually impaired: Proof-of-concept design and feasibility evaluation. Assistive Technology, 20(2), 86–98. https://doi.org/10.1080/10400435.2008.10131935
Kusenbach, M. (2003). Street phenomenology: The go-along as ethnographic research tool. Ethnography, 4(3), 455–485. https://doi.org/10.1177/146613810343007
Laurier, E., Muñoz, D., Miller, R., and Brown, B. (2020). A bip, a beeeep, and a beep beep: How horns are sounded in Chennai traffic. Research on Language and Social Interaction, 53(3), 341–356. https://doi.org/10.1080/08351813.2020.1785775
Lave, J., and Wenger, E. (1991). Situated learning: Legitimate peripheral participation (1st ed.). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511815355
Lewin, K. (1946). Action research and minority problems. Journal of Social Issues, 2(4), 34–46. https://doi.org/10.1111/j.1540-4560.1946.tb02295.x
Lüchow, L., Due, B. L., and Nielsen, A. M. R. (2023). Smartphone tooling: Achieving perception by positioning a smartphone for object scanning. In D. vom Lehn, W. J. Gibson, and N. Ruiz-Junco (Eds.), People, technology, and social organization: Interactionist studies of everyday life. Abingdon: Routledge. https://doi.org/10.4324/9781003277750-16
Lynch, M. (2002). From naturally occurring data to naturally organized ordinary activities: Comment on Speer. Discourse Studies, 4(4), 531–537. https://doi.org/10.1177/14614456020040040801
Macbeth, D. (2022). On detail* and its conceptualisations. Ethnographic Studies, 19. https://doi.org/10.5281/ZENODO.7637998
Merleau-Ponty, M. (2002). Phenomenology of perception. London: Routledge. https://doi.org/10.4324/9780203994610
Mondada, L. (2014a). Conventions for multimodal transcription. https://www.lorenzamondada.net/_files/ugd/ba0dbb_3978d2a34cf44376adb7a341975d23aa.pdf
Mondada, L. (2014b). Pointing, talk, and the bodies. In M. Seyfeddinipur, M. Gullberg, and A. Kendon (Eds.), From gesture in conversation to visible action as utterance: Essays in honor of Adam Kendon (pp. 95–124). Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/z.188.06mon
Mondada, L. (2018). The multimodal interactional organization of tasting: Practices of tasting cheese in gourmet shops. Discourse Studies, 20(6), 743–769. https://doi.org/10.1177/1461445618793439
Mondada, L. (2019). Contemporary issues in conversation analysis: Embodiment and materiality, multimodality and multisensoriality in social interaction. Journal of Pragmatics, 145, 47–62. https://doi.org/10.1016/j.pragma.2019.01.016
Nevile, M., Haddington, P., Heinemann, T., and Rauniomaa, M. (Eds.) (2014). Interacting with objects: Language, materiality, and social activity. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/z.186
Nielsen, A. M. R. (2024). Mitigating responsibility. Attributing membership categories in the face of tech-related troubles. In B. L. Due (Ed.), The practical accomplishment of everyday activities without sight (pp. 112–131). Abingdon: Routledge. https://doi.org/10.4324/9781003156819-6
Nielsen, A. M. R., Due, B. L., and Lüchow, L. (2024). The eye at hand: When visually impaired people distribute ‘seeing’ with sensing AI. Visual Communication. https://doi.org/10.1177/14703572241227517
Nishizaka, A. (2020). Multi-sensory perception during palpation in Japanese midwifery practice. Social Interaction. Video-Based Studies of Human Sociality, 3(1). https://doi.org/10.7146/si.v3i1.120256
Nisisawa, H. Y., and Sakaida, R. (2024). Touching as pointing: How do persons with visual impairment achieve joint attention with sighted persons in Orientation and Mobility training? Journal of Interactional Research in Communication Disorders, 15(3).
Pelikan, H. (2023). Robot sound in interaction: Analyzing and designing sound for human-robot coordination. Linköping: Linköping University. https://doi.org/10.3384/9789180751179
Pelikan, H., Broth, M., and Keevallik, L. (2022). When a robot comes to life: The interactional achievement of agency as a transient phenomenon. Social Interaction. Video-Based Studies of Human Sociality, 5(3). https://doi.org/10.7146/si.v5i3.129915
Raudaskoski, P. (2021). Discourse studies and the material turn: From representation (facts) to participation (concerns). Zeitschrift für Diskursforschung, 2021(2), 244–269. https://doi.org/10.3262/ZFD2102244
Raudaskoski, P. (2023). Ethnomethodological conversation analysis and the study of assemblages. Frontiers in Sociology, 8. https://doi.org/10.3389/fsoc.2023.1206512
Rawls, A. W. (2008). Harold Garfinkel, ethnomethodology and workplace studies. Organization Studies, 29(5), 701–732. https://doi.org/10.1177/0170840608088768
Reeves, S. (2019). How UX practitioners produce findings in usability testing. ACM Transactions on Computer-Human Interaction, 26(1), article 3, 1–38. https://doi.org/10.1145/3299096
Reeves, S., Porcheron, M., and Fischer, J. (2018). ‘This is not what we wanted’: Designing for conversation with voice interfaces. Interactions, 26(1), 46–51. https://doi.org/10.1145/3296699
Sacks, H. L., Schegloff, E. A., and Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50(4), 696–735. https://doi.org/10.1353/lan.1974.0010; https://doi.org/10.2307/412243
Schegloff, E. A., and Sacks, H. L. (1973). Opening up closings. Semiotica, 8(4), 289–327. https://doi.org/10.1515/semi.1973.8.4.289
Stefani, E. D. (2013). The collaborative organisation of next actions in a semiotically rich environment: Shopping as a couple. In P. Haddington, L. Mondada, and M. Nevile (Eds.), Interaction and mobility: Language and the body in motion (pp. 123–151). Berlin: De Gruyter. https://doi.org/10.1515/9783110291278.123
Streeck, J., Goodwin, C., and LeBaron, C. D. (2011). Embodied interaction: Language and body in the material world. New York: Cambridge University Press.
Suchman, L. A. (2007). Human-machine reconfigurations: Plans and situated actions. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511808418
Whitehead, A. N. (1979). Process and reality (2nd revised ed.). New York: Free Press.
Zientara, P. A., Lee, S., Smith, G. H., Brenner, R., Itti, L., …, and Narayanan, V. (2017). Third eye: A shopping assistant for the visually impaired. Computer, 50(2), 16–24. https://doi.org/10.1109/MC.2017.36