Due to the limitations of Pyodide which does not include support for the librosa Python library, we will exclusively work with WAV files, specifically those with a sample rate of 44100 Hz. No other types of audio files will be considered.

Independent Component Analysis (ICA) Algorithm (primarily derived from the Book: "Independent Component Analysis, A Tutorial Introduction" by James V. Stone)

(A) Introduction to the Fundamentals of ICA: Distinguishing It from PCA

Independent Component Analysis (ICA) is a computational technique designed to separate a multivariate signal into its additive, independent non-Gaussian components. This method is particularly effective for analyzing mixed signals that originate from multiple sources. In contrast to Principal Component Analysis (PCA), which seeks orthogonal components that maximize variance under the assumption of Gaussian source signals, ICA identifies components based on their statistical independence rather than their variance. Unlike PCA, which is significantly influenced by the orientation of the first eigenvector, ICA does not prioritize components based on variance. Instead, it focuses on the statistical independence of components, ensuring that each component's amplitude at any given point is unrelated to that of any other at the same time, indicative of their origins from distinct physical processes.

PCA reduces data dimensionality by emphasizing the variance captured by the first few eigenvectors, potentially leading to a reliance on the first principal component. This reliance is due to the first eigenvector's orientation, determined by the direction associated with the maximum variance principal component (PC). In contrast, ICA's approach to signal separation is not constrained by the variance or order of components. It seeks to identify independent sources within the signal mixtures, guided by the statistical properties of the signals rather than their variance. This fundamental difference underlines ICA's utility in separating mixed signals into components that represent original, independent sources, free from the hierarchical constraints of variance prioritization seen in PCA.

(B) Mathematical Foundations and Properties of Signal Mixtures

ICA is rooted in fundamental mathematical concepts such as entropy maximization, likelihood estimation, and the utilization of non-linear functions, notably the hyperbolic tangent (tanh), to approximate the cumulative distribution functions (CDFs) of source signals. A hallmark of successful separation through ICA is the alignment of the joint distribution of signals with the product of their marginal distributions, signifying statistical independence. Key characteristics of signal mixtures include (1) the independence of source signals versus the dependence observed in mixtures, (2) the presence of non-Gaussian histograms for each source signal in contrast to Gaussian histograms for mixtures, and (3) the lower complexity found in the simplest source signal compared to any of its mixtures. These principles underscore the theoretical and practical underpinnings of ICA, distinguishing it from other signal processing methods by emphasizing the non-Gaussian nature and statistical independence of source signals.

(C) ICA Applications and Its Spatial-Temporal Versatility

Independent Component Analysis (ICA) boasts a wide array of applications, from speech processing and brain imaging using functional Magnetic Resonance Imaging (fMRI) to the analysis of electrical brain signals via Electroencephalography (EEG). Its core advantage lies in the ability to discern and isolate source signals from their mixtures based on unique properties — particularly their statistical independence, non-Gaussian distribution, and comparative simplicity. This distinction is crucial in fields requiring precise signal separation, such as distinguishing individual voices in a noisy environment or isolating specific brain activities from complex imaging data. Furthermore, ICA's adaptability is showcased through its spatial and temporal variants: Spatial ICA (sICA) excels in analyzing data points like image pixels to segregate different visual sources, making it indispensable in image processing, while Temporal ICA (tICA) is pivotal for unraveling sequences over time, such as isolating distinct audio tracks or analyzing temporal brain signal patterns. This dual capability highlights ICA's comprehensive approach to addressing analytical challenges across a spectrum of domains, enhancing its utility in both spatially and temporally oriented data analysis.

(D) Algorithmic Steps of ICA

Preliminary Steps: The preprocessing of signals is critical before optimizing the unmixing matrix for signal separation.

Centering: Each signal within the mixtures is centered by subtracting its mean. Centering ensures that the signals adhere to a zero-mean, which is a prerequisite for any linear transformation of data in ICA, akin to PCA.
Whitening: The process starts with the computation of the covariance matrix of the centered signals, followed by eigenvalue decomposition. The subsequent transformation, using the eigenvectors and eigenvalues, renders the signals uncorrelated with unit variance. Whitening is a pivotal step as it reduces the complexity of isolating independent components by converting the signals into a space where they are linearly uncorrelated. This mirrors the preprocessing in PCA but is tailored to meet the distinctive requirements of ICA, which is the facilitation of component independence.

Optimizing the Unmixing Matrix: This is the iterative core of ICA, where the unmixing matrix is refined to separate the mixed signals.

Estimating Independent Components: At each iteration, the whitened signals are transformed by the unmixing matrix to estimate the independent components. This is based on the ICA premise that signals are linear combinations of independent components, and the unmixing matrix is the key to reversing their mixture.
Non-linearity Application: The estimates are then processed through the non-linear function tanh. This crucial step injects non-Gaussianity into the model, a cornerstone of ICA, enabling the distinction of the independent components from the mixtures. This is in contrast to PCA, which operates under the Gaussianity assumption and does not factor in independence.
Updating the Unmixing Matrix: The update rule integrates the outcomes of the non-linear function and its derivative. This follows the gradient ascent strategy, aiming to maximize the non-Gaussianity and consequently, the independence of the estimated components. It is anchored in the statistical tenet that independent components are discoverable by maximizing entropy or, equivalently, minimizing mutual information.

Convergence and Extraction: The final steps involve confirming the independence of the components and extracting them.

Convergence Check: The iterative process persists until the adjustments in the unmixing matrix diminish below a pre-set threshold, signifying that the components' independence will not significantly improve with further matrix alterations. This step is crucial to prevent an endless loop and to ascertain that the components are as independent as practically attainable.
Extracting Independent Components: With an optimized unmixing matrix, it is applied to the whitened signals to derive the final independent components. This marks the ICA process's culmination, where the theoretical and statistical frameworks are realized by segregating the mixed signals into their independent elements.

(E) Characteristics of the FastICA Algorithm (Hyvärinen, A. & Oja, E. (2000). Neural Networks, 13(4-5), 411-430.)

FastICA, while aligned with the fundamental objective of the ICA to separate mixed signals into independent components, introduces specific nuances in its approach, theoretical focus, and practical execution, making it a variant of the broader family of ICA algorithms.

Distinctly, FastICA is tailored for rapid convergence by employing a negentropy approximation to optimize for non-Gaussianity. Its fixed-point iteration scheme for updating the unmixing matrix stands out for its efficiency. Although the implementation may slightly differ concerning the choice of non-linear functions (e.g., logcosh or exp) to approximate negentropy, the underlying principle of non-Gaussianity maximization remains intact. Other salient features include:

"Fixed-Point Iteration Scheme": Utilizing specific non-linear functions and their derivatives to enhance non-Gaussianity.
"Symmetric Decorrelation": Ensured within each iteration to preserve the orthogonality and thus independence of the separated components.
Efficient "Convergence" Criterion: Based on the alterations in the unmixing matrix, it guarantees the prompt conclusion of the algorithm post the adequate extraction of independent components.

(F) Innovative Uses of ICA in Enhancing Alexa's Sound Recognition and Interaction

Amazon's Alexa is equipped with an innovative feature through its LED ring atop the device, which plays a crucial role beyond mere aesthetic appeal. When activated with the wake word "Alexa," this LED ring serves as a directional indicator, illuminating to point towards the source of the voice. This functionality not only adds an element of interaction by mimicking human-like responsiveness to sound sources but also signifies the sophisticated audio processing capabilities of the device. The directional illumination of the LED ring is a direct outcome of the device's ability to analyze and locate the origin of the sound, a process rooted deeply in the principles of independent component analysis (ICA).

ICA is pivotal for two primary functions within Alexa's operational framework. Firstly, it aids in pinpointing the direction from which the voice command is issued. By separating the mixed audio signals received by Alexa's microphones into their independent components, the system can identify the specific direction of the sound source. This process allows Alexa to "focus" on the speaker, enhancing user interaction by providing visual feedback through the LED ring.

Secondly, ICA plays a vital role in background noise reduction, a critical factor in far-field speech recognition. Voice commands captured from a distance inherently carry more noise and reverberation compared to those spoken directly into a device, such as a smartphone. These disturbances, primarily caused by sound waves reflecting off surfaces like walls or windows, can significantly hinder speech recognition accuracy. Through ICA, Alexa can effectively isolate the voice command from the background noise and reverberation. This isolation not only improves the clarity of the signal being processed but also enhances Alexa's ability to understand and respond to the user accurately, even in acoustically challenging environments.

(G) Reference

Original FastICA academic paper: Hyvärinen, A., & Oja, E. (2000). Independent Component Analysis: Algorithms and Applications. Neural Networks, 13(4-5), 411-430. Neural Networks Research Centre, Helsinki University of Technology.
ICA Introduction book: Stone, J.V. (2004). Independent Component Analysis: A Tutorial Introduction. Bradford Books. ISBN 978-0262693158.
scikit-learn. (n.d.). FastICA. Retrieved from https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html
CRAN. (n.d.). fastICA. Retrieved from https://cran.r-project.org/web/packages/fastICA/index.html

Convert MP3 to WAV on Mac and Linux

Since Python Pyscript does not support the use of the librosa or pydub libraries in a browser-based environment, it is recommended to convert MP3 files to WAV format before using this browser-based Pyscript code, if WAV files are not already available.

(A) Using `ffmpeg`

Mac: Install using Homebrew:

brew install ffmpeg

Linux: Install using apt for Ubuntu or Debian:

sudo apt-get install ffmpeg

Once ffmpeg is installed, you can convert an MP3 file to WAV format with the following command:

ffmpeg -i input.mp3 output.wav

(B) Using `sox`

Mac: Install using Homebrew:

brew install sox

Linux: Install using apt for Ubuntu or Debian:

sudo apt-get install sox

With sox installed, convert an MP3 file to WAV format with the following command:

sox input.mp3 output.wav

FastICA Python Source Code with Detailed Comments

Prototype Python Script: Heart and Lung Sound Separation with Scikit-learn's FastICA

This explores the field of audio processing with our Independent Component Analysis (ICA) tool, crafted with medical professionals and researchers in mind. This software aids in the separation of complex audio signals, with a focus on heart and lung sounds.

Audio Signal Processing: Our tool expertly processes audio files, particularly heart and lung sounds, to analyze and separate these complex signals.
Display Waveforms: It visualizes the waveforms of these sounds, providing a clear, graphical representation of audio data.
Merge Audio for Analysis: The software can merge two different audio signals for detailed analysis using ICA, ensuring comprehensive examination of overlapping sounds.
Independent Component Analysis: Utilizing ICA, the tool separates mixed audio signals into their individual components, allowing for in-depth analysis of each signal.
Audio Saving Capabilities: It offers functionality to save the separated audio components in both MP3 and WAV formats for further use and examination.

(A) Individual Heart and Lung Sounds

(B) Mixed Heart and Lung Sounds

This attached Python script first generates the mixed wavelets of heart and lung sounds. You can view and listen to this mixed sound:

(C) ICA Iteration Outputs, Separating Heart and Lung Sounds from the Mixed Sounds

Each iteration of the ICA algorithm generates two wavelet visualization files and two sound files:

First Round of Iteration

Second Round of Iteration

Third Round of Iteration

Fourth Round of Iteration

Fifth Round of Iteration

Sixth Round of Iteration

Direct Integration of R Scripts in Python: Executing FastICA.R for Signal Separation

In the pursuit of advancing and streamlining signal processing workflows, two new Python modules have been developed: nGene_rpy2 and nGene_Waveform. These modules are designed to facilitate the integration of R scripts into Python applications and to efficiently manage waveform data. The following sections provide an overview of these modules and demonstrate their practical applications.

(A) nGene_rpy2: Bridging R and Python

nGene_rpy2 is a Python class that utilizes the rpy2 library to seamlessly integrate R scripts and packages into Python applications. This class enables the loading of R code from files or strings, the importation of R packages, and the invocation of R functions directly from Python. Such integration is instrumental in leveraging R's advanced statistical and signal processing capabilities within a Python environment.

Load R Scripts: Facilitates the loading of R code from files or strings, creating callable modules within Python.
Import R Packages: Manages the importation of R packages, including automatic installation if packages are not already present.
Call R Functions: Executes R functions from loaded scripts or packages with automatic data conversion between R and Python data structures.
Execute R Commands: Allows the execution of raw R commands directly from Python, providing flexibility in scripting.

The following example demonstrates the utilization of nGene_rpy2 to perform Independent Component Analysis (ICA) using R's fastICA function:

nGene_Waveform is a Python class dedicated to the processing and visualization of waveform data. This class simplifies tasks such as reading audio files, normalizing signals, saving audio data, and plotting waveforms. It is essential for applications involving audio signal processing, particularly within the context of biomedical signals like heart and lung sounds.

Read and Normalize Audio Data: Facilitates the loading of .wav files and normalization of audio signals for processing.
Save Audio Signals: Enables the saving of processed or separated signals as .wav files.
Plot Waveforms: Provides visualization of audio signals using matplotlib for analysis and presentation purposes.
Handle Multiple Signals: Efficiently manages and processes multiple audio signals or components.

The example below illustrates how to utilize nGene_Waveform to read audio files and plot their waveforms:

main.py serves as an example script demonstrating the application of the nGene_rpy2 and nGene_Waveform classes to perform Independent Component Analysis (ICA) on mixed audio signals, such as heart and lung sounds. Users may adapt this script to their specific requirements by renaming it accordingly.

Initialize Handlers: Establishes instances of nGene_rpy2 and nGene_Waveform.
Load and Normalize Audio Data: Reads heart and lung sound files and normalizes the signals.
Mix Signals: Combines the original signals using a predefined mixing matrix.
Perform ICA: Utilizes R's fastICA function to separate the mixed signals into independent components.
Plot and Save Results: Visualizes the original, mixed, and separated signals, and saves the separated components as .wav files.

The complete script is available in main.py. Users may adapt this script as needed:

Initialize Handlers:
- Import and instantiate nGene_rpy2 and nGene_Waveform.
- Load the R script for fastICA using nGene_rpy2.
Load and Normalize Audio Data:
- Use nGene_Waveform to read and normalize heart and lung sound files.
- Ensure both audio files have matching sample rates.
Mix Signals:
- Combine the original signals using a predefined mixing matrix to create mixed signals.
Perform ICA:
- Employ nGene_rpy2 to execute the fastICA function from R on the mixed signals.
- Extract the separated independent components from the ICA result.
Plot and Save Results:
- Visualize the original, mixed, and separated signals using nGene_Waveform.
- Save the separated components as .wav files for further analysis or playback.

Upon executing main.py, the script generates both console output and waveform plots. These outputs demonstrate the successful integration of R scripts within Python and the effective separation of audio signals using Independent Component Analysis (ICA).

D-1) Console Output

The console output provides detailed logs of the script's execution process, including information about loading scripts, processing audio data, performing ICA, and saving the results. You can download the complete console output for further inspection:

/Users/frank/nGeneDL20241116/pythonProject/.venv/bin/python /Users/frank/nGeneDL20241116/pythonProject/main.py 
Using R code from file: fastICA.R
2024-11-17 06:50:35,274 - nGene_rpy2 - INFO - Detected R installation: R version 4.4.1 (2024-06-14)
2024-11-17 06:50:35,274 - nGene_rpy2 - DEBUG - Activated numpy2ri and pandas2ri for automatic data conversion.
2024-11-17 06:50:35,274 - nGene_rpy2 - DEBUG - Initialized internal dictionaries for packages and scripts.
2024-11-17 06:50:35,276 - nGene_rpy2 - INFO - Loaded R script into package 'DefaultPackage'.
2024-11-17 06:50:35,276 - nGene_rpy2 - INFO - Loaded R script from file 'fastICA.R' into package 'DefaultPackage'.
2024-11-17 06:50:35,276 - nGene_Waveform - DEBUG - Initialized nGene_Waveform instance.
2024-11-17 06:50:35,278 - nGene_Waveform - INFO - Fetched and normalized audio data from 'heart_sound.wav'.
2024-11-17 06:50:35,278 - nGene_Waveform - INFO - Fetched and normalized audio data from 'lung_sound.wav'.
2024-11-17 06:50:35,279 - nGene_rpy2 - DEBUG - Retrieved fastICA package successfully.
2024-11-17 06:50:35,281 - nGene_rpy2 - DEBUG - Converted NumPy array to R matrix.
2024-11-17 06:50:35,281 - nGene_rpy2 - DEBUG - Retrieved R function 'fastICA' from package.
2024-11-17 06:50:35,319 - nGene_rpy2 - INFO - Called R function 'fastICA' successfully.
2024-11-17 06:50:35,319 - nGene_rpy2 - INFO - Executed fastICA function successfully.
2024-11-17 06:51:13,560 - nGene_Waveform - INFO - Plotted waveforms successfully.
2024-11-17 06:51:13,562 - nGene_Waveform - INFO - Saved separated signal 1 to 'separated_signal_1.wav'.
2024-11-17 06:51:13,564 - nGene_Waveform - INFO - Saved separated signal 2 to 'separated_signal_2.wav'.

D-2) Waveform Plot

The waveform plot visualizes the original, mixed, and separated audio signals, providing a clear representation of the ICA process. You can view the waveform plot below or download the image for your records:

Waveform Analysis

Independent Component Analysis (ICA), for Separating Heart and Lung Sounds

Custom-Built Lightweight Open-Source FastICA Algorithm for In-Browser Use

Custom-Built FastICA on PyScript Methodology, to Separate Mixed Heart and Lung Sounds

Log Console

ICA PyScript Example