Human-Computer Interaction in machine learning and deep learning

Computers & Technology

Author Amirarshia Janighorban
Published June 3, 2024
Word count 3,969

View author's other articles

Revolutionizing Human-Computer Interaction: Using the Power of Machine Learning and Deep Learning

Amir Arshia Janighorban,

Collegian of Computer Science and Engineering, Mohajer Technical University of Isfahan,

Islamic Republic Of Iran

*amirarshiajanighorban@gmail.com

*Janighorban.com

Supervisor:

Dr.Moslem YariRenani

*moslem.yaari@gmail.com

Dr.MohammadReza Mojtabeai

*Mrmojtabaei@yahoo.com

Abstract—Human-Computer-Interaction (HCI) is a multidisciplinary field that focuses on the design, evaluation, and improvement of the interaction between humans and computer systems. It encompasses the study of how users interact with technology, with the aim of creating intuitive and effective interfaces.It involves understanding human capabilities, limitations, and preferences to design interfaces that are user-friendly and enhance user experience. It considers factors such as cognitive processes, ergonomics, accessibility, and usability.

The goal of HCI is to enable seamless communication and collaboration between humans and computers, allowing users to perform tasks efficiently and effectively. It involves designing interfaces that are intuitive, responsive, and tailored to user needs.HCI encompasses various aspects, including input and output devices, graphical user interfaces, voice and gesture recognition, virtual and augmented reality, and mobile and wearable technologies...

I. INTRODUCTION

Human-Computer Interaction (HCI) has come a long way since the early days of punch cards and command-line interfaces. With the advent of machine learning (ML) and deep learning (DL), a new era of HCI has emerged, revolutionizing the way humans interact with computers. ML and DL have enabled computers to understand and respond to human input in more intuitive and intelligent ways, making technology an extension of ourselves. In this article, we explore the transformative impact of ML and DL on HCI and delve into some remarkable applications that have reshaped the digital landscape.

The historical overview of Human-Computer Interaction (HCI) provides a crucial backdrop for understanding the transformative impact of Machine Learning (ML) on this field. HCI traces its roots back to the early days of computing when interactions were primarily text-based and command-driven. These early interfaces had limitations in terms of accessibility- user-friendliness.

The evolution of HCI has been marked by a shift towards more intuitive and user-centric design, catalyzed by advancements in ML and AI. Over the years, HCI has moved from command-line interfaces to graphical user interfaces (GUIs), touchscreens, and voice-activated systems, all of which have been greatly influenced by ML algorithms.recognition, which mostly depends on accuracy of detection and tracking. Last but not the least, is the challenge to make the whole system user friendly. The solution to this lies mostly in the solution of the other aforementioned challenges.

Obtaining data from HCI enables us to learn more effectively and build smarter systems. Machine learning is an important branch of artificial intelligence. It has made great progress in many fields and has demonstrated strong research and development (R&D) potential. With the application of machine learning technology in HCI, machines are becoming more intelligent.

The aim of this Special Issue is to bring together original research and review articles discussing the latest developments in the field of human-machine interaction based on machine learning.

II. Natural Language Processing (NLP) and Voice Recognition

Like many other AI problems we’ve seen, automatic speech recognition can be implemented by gathering a large pool of labeled data, training a model on that data, and then deploying the trained model to accurately label new data. The twist is that speech is structured in time and has a lot of variabilities.

We’ll identify the specific challenges we face when decoding spoken words and sentences into text. To understand how these challenges can be met, we’ll take a deeper dive into the sound signal itself as well as various speech models. The sound signal is our data. We’ll get into the signal analysis, phonetics, and how to extract features to represent speech data.

Natural Language Processing (NLP) and Voice Recognition are two interconnected fields that focus on enabling computers to understand and process human language, both written and spoken. They play a crucial role in revolutionizing human-computer interaction by allowing users to communicate with machines in a more natural and intuitive manner. Let's delve deeper into each of these fields:

NLP is a subset of artificial intelligence (AI) that deals with the interaction between computers and human language. It involves the development of algorithms and models that enable machines to understand, interpret, and generate natural language. NLP encompasses several tasks, including:

A. Text Understanding:

NLP algorithms analyze and extract meaning from textual data. This involves tasks such as parsing, part-of-speech tagging, named entity recognition, and semantic analysis. It allows computers to comprehend the structure and semantics of sentences and documents.

B. Sentiment Analysis:

NLP techniques enable computers to identify and understand the sentiment expressed in text, whether it is positive, negative, or neutral. Sentiment analysis finds applications in social media monitoring, customer feedback analysis, and brand reputation management.

C. Language Translation:

NLP powers machine translation systems like Google Translate, which can automatically translate text from one language to another. These systems utilize statistical models or neural networks to understand the meaning of sentences in one language and generate corresponding translations.

D. Question Answering:

NLP algorithms enable computers to comprehend questions and provide relevant answers. This includes tasks like information retrieval, document summarization, and question understanding. Question answering systems find applications in chatbots, virtual assistants, and search engines.

Voice recognition and NLP are closely intertwined since voice input is often converted into text form for further processing using NLP techniques. Together, they enable more natural and conversational interactions between humans and computers, making technology more accessible and user-friendly.NLP and voice recognition technologies have transformed human-computer interaction by enabling machines to understand and process human language, both in written and spoken forms. These advancements have given rise to voice assistants, language translation systems, sentiment analysis tools, and more, making technology more intuitive and accessible to a wider range of users.

III. Data Visualization

Data visualization is the graphical representation of data and information using visual elements such as charts, graphs, maps, and diagrams. It involves transforming raw data into visual representations that are easier to understand, analyse, and communicate. The primary goal of data visualization is to present complex data patterns, relationships, and trends in a visual format that can be quickly and intuitively interpreted by users; Here are some key aspects and principles of data visualization:

1.Data Representation: Data visualization starts with the selection and representation of the relevant data. This involves identifying the variables, dimensions, and metrics that need to be visualized. The data can be numerical, categorical, temporal, or spatial in nature, and the choice of appropriate visual representations depends on the characteristics of the data.

Visual Encoding: Visual encoding involves mapping the data attributes to visual properties such as position, length, angle, color, shape, and size. For example, a bar chart uses the length of bars to represent quantities, while a scatter plot uses the position of points to represent the relationship between two variables. Choosing appropriate visual encodings is crucial for accurately representing the data and conveying the intended message.
Visual Perception: Data visualization leverages human visual perception to facilitate understanding. It takes advantage of pre-attentive attributes, such as color, size, and position, which can be quickly perceived without conscious effort. Understanding the principles of visual perception helps designers create visualizations that draw attention to important patterns, facilitate comparisons, and aid in data interpretation.

Fig. 1. System analysis by HCI proposed method

Data Abstraction: Data visualization often involves summarizing and aggregating large or complex data sets to present a clear and concise representation. Aggregation techniques such as averaging, grouping, or filtering are applied to reduce the complexity while preserving the key insights and patterns. The level of abstraction should strike a balance between providing a comprehensive overview and maintaining the necessary detail.
Interactive Exploration: Interactive elements enhance data visualization by allowing users to explore and interact with the visual representations. Interactive features such as zooming, filtering, sorting, and linking can be incorporated to enable users to investigate specific data points, switch between different visualizations, or drill down into details. Interactivity promotes engagement and empowers users to extract deeper insights from the data.

(1)

Where RVx,y is the response value for the point x,y in the input image, mask size is m*n.

a =(m−1)/2 and b =(n−1)/2 and w1 and w2 are weight co-efficients. In our implementation we use w1 = w2 =1.

E. Sliding Window Mechanism

The sliding window mechanism is a method used in various fields such as computer science, data analysis, and signal processing to manage and analyze a subset of data from a larger dataset. This technique involves moving a "window" of fixed size over the data to perform operations on just the portion of data within the window at each step. Here’s how it generally works and its applications:

(a)

(b) Fig. 2. Slided from left to right

• Left to Right

In the context of a sliding window mechanism, the term “left to right” refers to the direction in which the window moves across the data set. This is a common approach in many applications where data is processed sequentially from the beginning to the end. Here’s how it typically functions:

Process Overview

• Initialization: The window is initially placed at the leftmost part of the data set, covering the first segment of data points up to the defined window size.

• Sliding: The window then moves one step to the right at a time. In each step, the leftmost data point is excluded from the window, and a new data point from the right is included.

• Continuous Operation: At each new position of the window, operations are performed on the data within the window. These operations might include calculations like averages, sums, or applying specific algorithms for pattern detection or filtering.

(2)

● Text Analysis: In natural language processing, a sliding window might move left to right across a text to analyze phrases or patterns in the context of surrounding words. Time Series Analysis: For financial or meteorological data, the window slides through time-stamped data entries to compute moving averages or other statistical measures to identify trends. Image Processing: In tasks like object detection, a sliding window moves left to right (and often top to bottom in a nested manner) across an image to apply detection algorithms in different regions of the image.

● Advantages of Left to Right Sliding Natural Ordering: For many types of data, especially time-series data, processing from left to right follows the natural chronological order of the data. Real-time Processing: This direction supports real-time or streaming data processing, where new data continuously comes in from the right. Ease of Implementation: It often simplifies the implementation of algorithms, as it aligns with the typical structure of data storage and retrieval systems which are designed to handle data in a sequential manner.

● Considerations Edge Cases: Special care is needed at the edges of the data, particularly at the start and end. At the start, the window may not be fully filled unless padding or special handling is applied. Backtracking: If the analysis requires revisiting earlier data points, moving strictly from left to right might limit the ability to incorporate such feedback without additional mechanisms.

In summary, moving the sliding window from left to right is a straightforward and effective method to process data in many fields, aligning well with the natural flow of many data types and the structure of most data handling systems..

Storytelling and Narratives:

Effective data visualization often incorporates storytelling and narratives to convey a clear message and guide the user’s interpretation. A well-designed visualization can tell a story by using a logical flow, annotations, and annotations to guide the user through the data, highlighting important trends, and drawing meaningful conclusions.

Fig. 3. Classification of feature learning

G. Feature Learning:

Feature learning in HCI (Human-Computer Interaction) refers to the process of automatically identifying and extracting useful and relevant features from data collected in user interactions with technology. This can include textual data, sensor data, behavioural data, and other forms of input.

Feature learning techniques in HCI can help improve user experience by providing more personalized and context-aware interactions. By identifying patterns in user data, machine learning algorithms can adapt to user preferences, behavior, and needs, thereby enhancing the efficiency and effectiveness of the interaction.

Some common methods of feature learning in HCI include:

Clustering and classification algorithms: These techniques group similar user behaviours or input patterns together, enabling the system to learn and adapt to user preferences.
Dimensionality reduction: By reducing the complexity of the data, dimensionality reduction techniques can help identify the most important features that influence user interactions with technology.
Neural networks: Deep learning neural networks can automatically learn complex features from data, enabling the system to make more accurate predictions and recommendations.

Overall, feature learning in HCI plays a crucial role in enabling intelligent and adaptive user interfaces that can anticipate user needs and provide personalized experiences.

Classification is a process of categorizing items or data into groups based on certain attributes or properties. The main goal of classification is to organize and simplify complex information in order to make it easier to understand, analyse, and make decisions based on it.

In machine learning, classification is a type of supervised learning where the algorithm learns from labeled training data to predict the class or category of new, unseen data. There are different types of classification algorithms, such as decision trees, support vector machines, logistic regression, and neural networks, each with their own strengths and weaknesses.

Classification is widely used in various fields such as business, marketing, finance, healthcare, and image recognition, among others. It is an essential tool for tasks such as spam detection, sentiment analysis, customer segmentation, medical diagnosis, and fraud detection.

Overall, classification plays a crucial role in organizing and interpreting data, making it a valuable tool for decision-making and problem-solving in many different domains..

H. The landscape

Fig. 4. Landscape of HCI in ML and DL

The brain-shape landscape of Human-Computer Interaction (HCI) in Machine Learning (ML) and Deep Learning (DL) refers to the interdisciplinary field that focuses on how humans interact with systems powered by ML and DL algorithms. This includes studying both the design of these systems to better suit human cognitive capabilities and understanding human behaviour when interacting with them.

In this landscape, researchers seek to create interfaces that are intuitive, user-friendly, and responsive to human needs and preferences. This involves designing interfaces that can adapt to users' varying levels of expertise and cognitive abilities, as well as integrating features that support human decision-making and problem-solving processes.

Moreover, the brain-shape landscape of HCI in ML and DL also examines the ethical implications of using these technologies in human-computer interactions. This includes considerations such as data privacy, algorithmic bias, and the implications of automation on human labor and decision-making.

Atlast, the brain-shape landscape of HCI in ML and DL seeks to optimize the synergy between human cognition and machine intelligence, with the goal of creating more efficient, usable, and ethically sound systems for human-computer interaction.

I. Custom fully tested hand gestures

Custom hand gestures in HCI refer to specific movements or gestures made by a user with their hands to interact with a computer or device. These gestures can be used to control, navigate, or manipulate digital content or interfaces in a more intuitive and natural way. Some examples of custom hand gestures in HCI include:

Pinching: Bringing two fingers together on a touchscreen to zoom in or out on a map or image.
Swiping: Moving a finger across a touchscreen to scroll up or down a webpage or switch between pages.
Rotating: Twisting two fingers on a touchscreen to rotate an object or image.
Tapping: Quickly touching a screen with a finger to select or activate an item.
Three-finger swipe: Using three fingers on a trackpad to switch between open applications or desktop spaces.
Peace sign: Holding up two fingers to trigger a specific action, like taking a screenshot or opening a new tab.
Fist bump: Making a fist and tapping it on a surface to confirm a selection or make a payment.

These custom hand gestures can vary depending on the device or application being used, and can be programmed or customized by users to suit their preferences and needs. By incorporating these gestures into the design of interfaces and interactions, designers can enhance the user experience and make interactions more efficient and engaging.

J. Gesture recognition (Software & Hardware)

Gesture recognition is a technology that allows computers to interpret human gestures and movements as commands or input. It uses sensors, cameras, or devices to capture and analyse hand movements, body postures, facial expressions, and other gestures in real-time.

Fig. 5. Functioned hand gestures

Fig. 6. Gesture Recognition Using Surface Electromyography and Deep Learning for Prostheses Hand

There are two main types of gesture recognition: vision-based and sensor-based. Vision-based gesture recognition uses cameras to capture gestures, while sensor-based gesture recognition uses devices like gloves, motion sensors, or accelerometers to detect movements. Gesture recognition is commonly used in interactive systems such as video games, virtual reality applications, smart TVs, and smartphones. It can also be used in healthcare, robotics, and security systems. Some common applications of gesture recognition include controlling devices without physical contact, interacting with virtual objects in 3D space, and providing more natural and intuitive user interfaces. However, there are also some challenges with gesture recognition, such as ensuring accuracy and robustness in different environments and recognizing a wide range of gestures accurately.

IV. Linear Analytics

Linear analytics is a method of data analysis that involves examining relationships between different variables in a linear fashion. In other words, it focuses on understanding how changes in one variable are related to changes in another variable, assuming a linear relationship between the two. This can be done through techniques such as linear regression, which is a statistical method used to model the relationship between a dependent variable and one or more independent variables.

Linear analytics is commonly used in various fields such as business, finance, economics, and social sciences to make predictions, forecast trends, and identify patterns in data. By analyzing the linear relationships between variables, researchers and analysts can gain valuable insights into the underlying structure of the data and make informed decisions based on these findings.

One of the key benefits of linear analytics is its simplicity and ease of interpretation. It provides a clear and straightforward way to analyze data and communicate the results to stakeholders. Additionally, linear analytics allows for the identification of outliers and anomalies in the data, which can help enhance the accuracy and reliability of the analysis.

Overall, linear analytics is a powerful tool for understanding the relationships between variables and extracting valuable insights from data, making it an essential part of the analytical toolkit for many professionals and organizations.

Fig. 7. Predictive Analytics Solutions Available to Organizations

Possible Errors Infront of Velocity And The Detection of Them Using Analytics

TABLE I

ERROR RATE BETWEEN EXPERIMENTAL DETECTION AND ACTUAL POSITION

frame no experimental X experimental Y Real X Real Y Error(pixel) σ of error

1 476 13 469 46 33.73

2 470 25 462 65 40.79

3 261 416 231 436 36.05 8.27

4 224 451 194 449 30.06

5 295 310 320 350 32.02

TABLE II

LOSING MARKER VS VELOCITY

time(s) X(pixel) Y(pixel) Velocity(pixel/s)

4.915 304 277

5.036 220 236 772.495

5.506 117 264

5.752 359 248 985.888

8.523 195 212

8.744 198 450 1077.01

9.810 304 277

10.026 220 450 890.347

11.812 423 256 793.206

12.011 363 110

TABLE III

TIME REQUIRED TO DETECT AFTER LOSING

frame no without boundary box(s) with boundary box(s)

1 0.124 0.083

2 0.097 0.096

3 01 0.11

4 0.095 0.08

5 0.117 0.082

6 0.089 0.085

7 0.1 0.087

8 0.127 0.081

9 0.08 0.09

10 0.1 0.086

Avg 0.1029 0.0884

F. User-Friendliness

To analyse the user-friendliness we asked five users to use our system and perform the actions cursor move, left click, right click, zoom in, zoom out, Forward and backward. Each user was asked to perform each command 30 times. In the first 10 attempts the overall accuracy was 59.38% as shows the table IV. From the table IV we can also see the average accuracy of each command which is at the bottom line of the table. We can also observe that the average accuracy for all the command are 64.2% , 61.4% , 55.7% , 62.8% , 52.8% for the user 1, 2, 3, 4 and 5 respectively. Table V and VI also illustrate similar

informations as mentioned for table IV for the next 10 attempts and next to next 10 attempts respectively. From the table IV, V and IV we can easily observe that average performance for each command increases as user tries the same command again and again. After 30th attempt the overall average accuracy is 79.4%. It is observed that the system performance will increase as the user becomes more accustomed to it.

Inference Figure of HCI Input/Output totality

V. CONCLUSION

The derivation regarding the role of Deep Learning (DL) and Machine Learning (ML) in Human-Computer Interaction (HCI) is that they offer powerful tools and techniques that can enhance HCI in various ways. DL and ML enable HCI researchers and designers to develop intelligent systems that can understand and respond to user input, provide personalized experiences, and offer advanced capabilities such as natural language processing, computer vision, and predictive analytics. By leveraging DL and ML, HCI can benefit from improved usability, user-centered design, explainability, and ethical considerations, ultimately leading to more effective and user-friendly interactions between humans and intelligent systems.

REFERENCES

[1] J. S. Bae and T. L. Song, “Image tracking algorithm using template matching and psnf-m,” International Journal of Control, Automation, and Systems, vol. vol. 6, no. 3, pp. 413-423, June 2017.

[2] V. M. Lomte, R. M. Lomte, D. Mastud, and S. Thite, “Robust barcode recognition using template matching,” International Journal of Engineering Sciences and Emerging Technologies, vol. Volume 2, Issue 2, pp: 59-65, June 2012.

[3] G. Welch and G. Welch, “An introduction to the kalman filter,” TR 95-041, University of North Carolina at Chapel Hill, 2023. [Online]. Available: http://www.cs.unc.edu/∼welch/kalman/.

[4] P. Mistry, P. Maes, and L. Chang, “Wuw - wear ur world: a wearable gestural interface,” In Proceedings of the 27th international Conference Extended Abstracts on Human Factors in Computing Systems (Boston, MA, USA, April 04- 09, 2009).

[5] P. Mistry, “Sixthsense integrating information with the real world,”. [Online]. Available: http://www.pranavmistry.com/projects/sixthsense/

[6] “Kinect.” [Online]. Available: http://en.wikipedia.org/wiki/Kinnect

[7] M. Baldauf and P. Frihlich, “Supporting hand gesture manipulation of projected content with mobile phones.”

[8] M. Betke, J. Gips, and P. Fleming, “The camera mouse: Visual tracking of body,” IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL 10, NO 1,, 2022.

[9] W. H?rst and C. V. Wezel, “Gesture-based interaction via finger tracking for mobile augmented reality,” Multimedia Tools and Applications, vol. Volume 62, Issue 1, pp 233-258, January 2013. [Online]. Available: http://link.springer.com/article/10.1007/s11042-011-0983-y

[10] R. Y. Wang, “Real time hand tracking as a user input device.” ACM Symposium on User Interface Software and Technology (UIST), October 19-22, 2008, Monterey, CA.

[11] P. Chawalitsittikul and N. Suvonvorn, “Real time hand marker tracking as a user input device,” In Proceedings of ECTI Conference on Application Research and Development (ECTI-CARD), May 10-12, 2020, Chonburi, Thailand, p.427-431.

[12] R. Y. Wang and Popovic, “Real-time hand-tracking with a color glove,” In ACM SIGGRAPH 2009 Papers (New Orleans, Louisiana, August 03 - 07, 2019). H. Hoppe, Ed. SIGGRAPH ’09. ACM, New York, NY, 1-8.

[13] D. Sekora and K. Riege, “Real-time color ball tracking for augmented reality,” Proceedings of the 14th Eurographics Symposium on Virtual Environments, pages 9-16, Eindhoven, The Netherlands, May 2008.

[14] H. Yu and M. Leeser, “Optimizing data intensive window-based image processing on reconfigurable hardware boards,” IEEE Workshop on Signal Processing Systems Design and Implementation.

[15] C. Manresa, J. Varona, R. Mas, and F. J. Perales, “Real ?time hand tracking and gesture recognition for human-computer interaction,”

[16] K. N. Plataniotis and A. N. Venetsanopoulos, Color Image Processing and Applications.

[17] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2022.

[18] S. K. Lam, C. Y. Yeong, C. T. Yew, W. S. Chai, and S. A. Suandi, “A study on similarity computations in template matching technique for identity verification,” (IJCSE) International Journal on Computer Science and Engineering, vol. Vol. 02, No. 08, 2010, 2659-2665.

[19] K. G. Derpanis, “Relationship between the sum of squared difference (ssd) and cross correlation for template matching,” York University..

[20] “Fast template matching method based optimized sum of absolute difference algorithm for face localization,” International Journal of Computer Applications (0975 ? 8887), vol. Volume 18? No.8, March 2019.

I'm 20 and a computer programmer which I'm learning python,html,css,js,network+ and c++.

Article source: https://articlebiz.com

This article has been viewed 399 times.

Rate article

This article has a 4 rating with 4 votes.

Article comments

There are no posted comments.