| Topic | Details |
|---|---|
| Purpose |
Two-column HTML5 studio for audio/video playback, live signal visualization, lightweight tempo analysis, and simple source–mixture experiments. Pure vanilla JS; SVG-only waveforms; no frameworks or Canvas. New in v 3.3.5 line (3.3.1–3.3.5): Trim (loop-based clip creation with auto-download), matrix-based Mix of the last two items, and ICA separation of stereo mixes into two mono sources. |
| Layout |
Left column: Player (seek/loop, playhead cursor, volume, speed, transport controls, playlist, uploads). Right column: Trim · Mix · ICA toolbar, Tempo details panel, and Signal views (Overview, Mid, Micro, and band rows: Low/Mid/High). |
| File locations |
Place nws.html anywhere.Primary playlist: ./playlist.json (same folder as nws.html).Legacy/fallback playlist and tempo meta: optional sibling folder /media/ containing playlist.json and tempo_meta.json.
Files should be world-readable (e.g., chmod 644 *).
|
| Playlist |
On load, the player first attempts ./playlist.json (array of media entries, order preserved); if unavailable, a legacy /media/playlist.json is attempted.Absent JSON → starts empty and awaits uploads (drag-&-drop or picker). Uploaded files are referenced via blob-URLs only (no disk writes). |
| Playlist ordering |
Each row contains a dedicated ↓ button that sends that item directly to the bottom of the playlist while preserving the order of all others. The currently selected row remains highlighted; index bookkeeping is adjusted so that the audible selection is preserved when possible. |
| Trim |
Trim cuts the current loop range of the selected item into a new media item and appends it to the playlist, then immediately plays it. Audio items: decoded into an AudioBuffer, sliced in the loop interval, given short fade-in/fade-out ramps, encoded as 16-bit PCM WAV, and added as a new playlist entry.Video items: preferred path uses MediaRecorder on a captureStream() of the element over the loop range, targeting MP4 when supported and falling back to WebM; a pure audio WAV fallback is used when capturing A/V is not possible.New in v 3.3.4–3.3.5: the trimmed clip is auto-downloaded using the same filename shown in the playlist (WAV or MP4/WebM), immediately after creation. |
| Mix (matrix A) |
Mix combines the last two playlist entries into a stereo mixture using a fixed 2×2 mixing matrix:A = [[1, 1], [0.5, 2]], where rows index output channels (L,R) and columns index sources (S1,S2).Processing: each source is downmixed to mono, linearly resampled to a common sample rate, then mixed by A with automatic peak-based scaling to avoid clipping.Output: stereo WAV blob (L = mixture#1, R = mixture#2), auto-named as MixA_S1+S2_YYYYMMDDhhmmss.wav, appended to the playlist, and auto-selected for playback. Tempo metadata and overview are computed for the mix and stored under its filename.
|
| ICA separation |
ICA operates on the currently selected stereo item (e.g., a Mix result). Internals: 2×N mixtures are centered, whitened via a 2×2 symmetric eigendecomposition, then separated with a 2-component FastICA (tanh nonlinearity, symmetric decorrelation between components, Frobenius-norm convergence). Output: two mono WAV signals ( ICA_A_of_* and ICA_B_of_*), normalized with modest headroom and short fades, appended to the playlist as independent entries with their own tempo and overview metadata.
|
| Decoding & fallback |
Primary decoding path: decodeAudioData on fetched/uploaded bytes. For playlist URLs, fetch is attempted first.Fallback: full-length or range-limited capture via MediaElementSource → AudioWorklet (preferred) or ScriptProcessor, routed through a zero-gain node to keep the capture path inaudible. The muted property is never used in logic.
|
| Tempo metadata |
If present, /media/tempo_meta.json (keyed by filename) provides BPM and auxiliary fields (confidence, beat period, half/double suggestions, textual tempo class), which are reflected both in the playlist badge and the Tempo details panel.Otherwise, an internal estimator runs on decoded buffers or short capture segments, yielding approximate BPM and beat-period values sufficient for exploratory work. |
| Uploads |
Accessible uploader with ➕ Upload button and drag-&-drop support; the uploader itself is keyboard-focusable. Typical formats: MP3, M4A, FLAC, WAV, OGG, AAC, and common video containers such as MP4, MOV, WebM, MKV, and AVI. |
| First-30-second cue | Uploader border and hint text gently pulse every 2 s for the first 30 s after load, encouraging an initial user gesture that reliably resumes the AudioContext on modern browsers. |
| A–B Looping |
Seek bar shows cerulean A (“[”) and B (“]”) handles plus a thin ultramarine loop fill, always constrained within the gray full-track bar. ✖ Clear restores full-length playback. During playback, when the playhead reaches B, it wraps to A (with a small tolerance) as long as the loop is active. |
| Playhead | Current time is indicated by a vertical “I”; the center of that stroke corresponds to the true position. The playhead is draggable and is clamped within the current loop range. |
| Click-to-toggle video | Single-click on the video element toggles play/pause; double-click toggles fullscreen. The central ⏸︎/▶︎ transport button remains synchronized with element state. |
| Autoplay | The first playlist item may start automatically depending on browser autoplay policy. The AudioContext resumes on the first user interaction (click, drag, drop, or keyboard action) to ensure consistent audio routing. |
| Repeat Mode |
Repeat cycles between One (🔁 with “1”), All (🔁), and Off (⛔). With an A–B loop active, playback wraps within the loop regardless of repeat mode. When the loop is cleared, Repeat = All advances across playlist items; Repeat = One replays the same item. |
| Controls | ⏮︎ Prev • ⏸︎/▶︎ Toggle • ⏭︎ Next • 🔁/⛔ Repeat • ✖ Loop-Clear • ⛶ Fullscreen (video). |
| Seek & Time |
Smooth range input with live “elapsed / total” time label, draggable A–B handles, thin loop fill, and a precise “I”-shaped cursor.Loop bounds constrain both seeking and continuous playback; a small, duration-dependent epsilon avoids stickiness at the upper boundary during wrap. |
| Volume |
0–500 % via WebAudio GainNode (primary route, single audible path).If WebAudio is unavailable, a graceful fallback uses native element volume (0–100 %). The design avoids double-routing and unintended parallel audio paths. |
| Speed | 0.05× – 2.00× with − / + step buttons (0.01 increments) and a 1× reset button. The same playback rate is applied to both audio and video media elements. |
| Tempo details |
Tempo panel presents BPM (with confidence), beat period (ms), half/double candidates, tempo class (Slow/Moderate/Fast), and effective BPM at the current playback speed (BPM × rate). The panel is visible whenever either file-based metadata or the internal estimator provides data for the selected item. |
| Overview (playlist.json-aware) |
Overview is a whole-file SVG representation built from min/max envelopes over fixed buckets. In v 3.3.5, an internal helper ensures that an Overview is generated for the currently selected item even when it comes from playlist.json loaded at startup (audio or video).Once constructed, the same Overview supports both the main Overview view and the centered Micro view around the playhead. |
| Signal views |
Overview (entire file, absolute timebase, interactive loop brackets and cursor), Mid (live trailing window, default 8 s), and Micro (centered ±3 s around the playhead; falls back to trailing when no Overview is available). Band rows (Low ≤~200 Hz, Mid ~200–2000 Hz, High ≥~2 kHz) use a simple one-pole filter bank per band and share the same trailing length as the Mid window, with distinct color-coded strokes for quick visual discrimination. |
| Live tap |
AudioWorklet-based collector (preferred) or ScriptProcessor fallback receives data from the shared MediaElementSource nodes via an inaudible zero-gain branch.Envelope rings are filled at an effective rate of ~2 kHz and decimated to maintain responsiveness while limiting CPU load. Tap operations do not alter the audible signal. |
| Resizable wrapper |
Outer .wrapper uses resize:both; the default width is governed by --w (980 px), suitable for dual-column layouts on desktop screens.The playlist panel is vertically resizable, allowing adaptation to longer track lists or small windows. |
| Accent colour |
Changing --accent (default #1e90ff) rebrands key UI elements, including buttons, sliders, pulse highlights, and active playlist rows, while preserving structural CSS.
|
| Fullscreen | The ⛶ button and the F key toggle fullscreen for video items only; audio items retain the compact layout. The output route is re-applied on fullscreen changes to maintain consistent gain behaviour. |
| Source-code reveal | Embedded “Full Source Code” accordion shows the entire page’s HTML/JS/CSS, syntax-highlighted via Highlight.js, allowing inspection, copy-paste, and regression testing from a single file. |
| Namespace |
All logic resides inside a single IIFE; public surface is limited to instantiation of the WaveformStudio class against the #box container. CSS is scoped by class names to minimize interaction with surrounding pages or frameworks.
|
| Notes & caveats |
Decoding and cross-origin fetching depend on server CORS configuration; when direct decoding fails, the capture-based fallback is used instead. Some exotic codecs or DRM-protected streams may remain unsupported. Mixed, trimmed, and ICA-derived outputs are held as in-memory blobs and appear as playlist entries; only Trim explicitly triggers a download by default in v 3.3.5. |
| Topic | Details |
|---|---|
| Purpose |
Two-column HTML5 studio for audio/video playback, live signal visualization, and lightweight tempo analysis.
Pure vanilla JS; SVG-only waveforms; no frameworks or Canvas. New in v 3.1.0: Mix button (right column) that combines the last two playlist items into a headroom-safe WAV and appends it to the playlist for immediate playback. |
| Layout |
Left column: Player (seek/loop, volume, speed, transport, playlist, uploads). Right column: Mix toolbar, Tempo details panel, and Signal views (Overview, Mid, Micro, and band rows). |
| File locations |
Place nws.html anywhere.Optional sibling folder /media/ for playlist.json and tempo_meta.json.
Ensure readable permissions (e.g., chmod 644 *).
|
| Playlist |
Optional /media/playlist.json — array of media paths (order preserved).Absent JSON → starts empty and awaits uploads (drag-&-drop or picker). Uploaded files are referenced via blob-URLs (no disk writes). |
| Mix (new) |
Click Mix to combine the last two playlist entries (audio or the audio track of video). Processing: OfflineAudioContext offline render; per-track gain = 0.5 for headroom; linear sum; length = max(duration).Output: in-memory WAV blob, auto-named as Mix - A + B.wav, appended to the playlist, and auto-played. Status text reports progress or errors (e.g., CORS/decoding).
|
| Decoding & fallback |
Primary: decodeAudioData on fetched/uploaded bytes.Fallback: full-length capture via MediaElementSource → Worklet/ScriptProcessor (kept inaudible through a zero-gain node; no muted property used).
|
| Tempo metadata |
If available, /media/tempo_meta.json (keyed by filename) populates BPM and related fields in the list and Tempo panel.
When absent, a quick internal estimator computes approximate BPM/beat period from short decoded segments or short captures.
|
| Uploads | ➕ Upload button and drag-&-drop; keyboard focusable uploader. Uploaded audio/video formats commonly supported: MP3/M4A/FLAC/WAV and MP4/MOV/WEBM/MKV/AVI. |
| First-30-second cue | Uploader border and hint gently pulse every 2 s for the first 30 s after load to encourage interaction (resumes AudioContext reliably). |
| A–B Looping |
Seek bar shows cerulean A (“[”) and B (“]”) handles and a thin ultramarine loop fill, always inside the gray full-track bar. ✖ Clear restores full-length playback instantly. |
| Playhead | Current time indicated by a vertical “I”; the center of the line is the true position. Draggable, clamped within the loop. |
| Click-to-toggle video | Single-click on video toggles play/pause; double-click toggles fullscreen. The ⏸︎/▶︎ control remains synchronized. |
| Autoplay | First item may start automatically (per browser policy). AudioContext resumes on first user gesture (click, drag, drop) for consistent sound. |
| Repeat Mode |
Cycles: One (🔁 with “1”) → All (🔁) → Off (⛔). With a loop active, playback wraps to loop start. After clearing loop and with Repeat = All, playback advances to the next track. |
| Controls | ⏮︎ Prev • ⏸︎/▶︎ Toggle • ⏭︎ Next • 🔁/⛔ Repeat • ✖ Loop-Clear • ⛶ Fullscreen (video). |
| Seek & Time |
Smooth range input with live “elapsed / total”, draggable A–B handles, thin loop fill, and precise “I” cursor.
Loop bounds clamp seeking and playback, with edge-aware wrap to loop start.
|
| Volume |
0–500 % via WebAudio GainNode (primary route).
Graceful fallback uses element volume (0–100 %) if WebAudio is unavailable. Single audible route is always maintained.
|
| Speed | 0.05× – 2.00× with − / + step buttons (0.01) and 1× reset. Applies to audio and video uniformly. |
| Tempo details | BPM (with confidence), beat period (ms), half/double suggestions, tempo class (Slow/Moderate/Fast), and effective BPM at current speed. Panel appears when data are available (from metadata file or internal estimator). |
| Signal views | Overview (whole file; absolute “now” marker), Mid (live trailing window, default 8 s), Micro (centered ±3 s around playhead; falls back to trailing if overview not ready), and Band rows (Low ≤~200 Hz, Mid ~200–2000 Hz, High ≥~2 kHz) with color-coded strokes. Window lengths selectable; ✖ clears live buffers. |
| Live tap |
AudioWorklet collector (preferred) or ScriptProcessor fallback feeds envelope rings at ~2 kHz sampling for responsive SVG paths.
Capture remains inaudible through a zero-gain branch; no reliance on muted.
|
| Resizable wrapper |
Outer .wrapper uses resize:both; default width from --w (980 px for two columns).
Track list is vertically resizable.
|
| Accent colour |
Adjust --accent (default #1e90ff) to rebrand buttons, sliders, uploader, and active highlights.
|
| Fullscreen | Dedicated ⛶ button and keyboard F toggle fullscreen for video items. |
| Source-code reveal | Built-in “Full Source Code” accordion displays the whole page, syntax-highlighted via Highlight.js, for sharing and tests. |
| Namespace | All logic is encapsulated in an IIFE; CSS classes are locally scoped. Safe to embed alongside other pages and scripts. |
| Notes & caveats | Decoding and cross-origin fetching depend on server CORS policies; when decoding fails, the inaudible capture fallback is attempted. Mixed output is stored as an in-memory blob (download prompt is not issued automatically). |
| Topic | Details |
|---|---|
| Purpose |
Self-contained, resizable HTML5 media player for audio
(MP3/M4A/FLAC/WAV) and video (MP4/MOV/WEBM/MKV/AVI).
Pure vanilla JS—no frameworks. New since v 2.6: vertical “I” playhead (center = true position), refined A–B loop visuals, hardened uploads/drag-&-drop, reliable play/pause with AudioContext resume. |
| File locations |
Place nmp.html anywhere.Media files live in sibling /media/.Ensure readable permissions, e.g., chmod 644 *.
|
| Playlist |
Optional /media/playlist.json — array of media paths (order preserved).
If absent, player starts empty and waits for user uploads.
|
| Tempo metadata |
Player reads tempo_meta.json (keyed by filename)
to show integer-rounded BPM beside each track
and in the title line (e.g., “128 BPM”).
|
| Uploads | ➕ Upload button and drag-&-drop. Files are played via blob-URLs (no disk writes). The dashed uploader box is clickable and keyboard-focusable. |
| First-30-second attention cue | Uploader border and hint softly pulse/glow every 2 s for the first 30 s after load. |
| A–B Looping |
Seek bar shows two cerulean brackets: • A handle “[” — loop start. • B handle “]” — loop end. Ultramarine blue loop bar (thinner) fills the loop region and is always fully inside the gray full-length bar (entire track). ✖ Clear resets loop to full-length instantly. |
| Playhead | Current position is a vertical “I” line; its center is the true time point. It can be dragged, and is always clamped inside the blue loop bar. |
| Click-to-toggle video | Click anywhere on the visible video to play/pause; ⏸︎/▶︎ stays in sync. Double-click toggles fullscreen. |
| Autoplay | First item starts automatically (subject to browser policy). AudioContext is resumed on first user gesture (e.g., button, drag, drop) for reliable playback. |
| Repeat Mode |
Cycles: 🔂 One → 🔁 All → ⛔ Off. With a loop active, playback wraps to the loop start. After you press ✖ to clear loop and Repeat = All, the player advances to the next track at end (not the same track). |
| Controls | ⏮︎ Prev • ⏸︎/▶︎ Toggle • ⏭︎ Next • 🔂/🔁/⛔ Repeat • ✖ Loop-Clear • ⛶ Fullscreen (video). |
| Seek & Time |
Sleek seek bar with live “elapsed / total” timer,
A–B handles, thin blue loop bar, and draggable “I” playhead.
|
| Volume | 0–200 % gain via WebAudio (gain node). Default is 33 %. If WebAudio is unavailable, falls back to element volume (0–100 %). |
| Speed | 0.05× – 2.00× slider with − / + step buttons (0.01) and 1× reset. Applies to both audio and video. |
| Resizable wrapper |
Outer .wrapper uses resize:both;
default width from --w (360 px).
Track-list is vertically resizable.
|
| Accent colour |
Edit --accent (default #1e90ff)
to rebrand buttons, slider thumbs, uploader, and active track highlight.
|
| Fullscreen | Dedicated ⛶ button and keyboard F toggle fullscreen for video items. |
| Source-code reveal | Built-in “Full Source Code” accordion shows the entire page, syntax-highlighted via Highlight.js (for easy sharing/tests). |
| Namespace | All logic wrapped in an IIFE; CSS uses scoped class names. Safe to embed alongside other scripts and styles. |
| Topic | Details |
|---|---|
| Purpose | Self-contained, resizable HTML5 player for audio (MP3/M4A) and video (MP4/MOV/WEBM). Pure vanilla JS—no frameworks required. New since v 1.8: tempo-aware track-list showing BPM (integer-rounded), auto-loading from tempo_meta.json; initial volume defaults to 17 % at page-load.
|
| File locations | Place nmp.html anywhere.Media files live in a sibling /media/ folder.Ensure readable permissions with chmod 644 *.
|
| Playlist | Optional /media/playlist.json—an array of paths (order preserved). If absent, the player simply waits for user uploads. |
| Tempo metadata | Run extract_meta_from_media.py v 2.4 to generate tempo_meta.json (single integer-rounded bpm). Player displays it beside each track and in the title-bar as “### BPM”. |
| Uploads | ➕ Upload button and drag-&-drop. Files become blob-URLs, so nothing is written to disk. |
| First-30-second attention cue | Uploader border, hint-text and container gently pulse, glow and scale every 2 s for the first 30 s after page-load. |
| A-B Looping | Seek-bar sports two cerulean “brackets”: • A handle “[” — left edge marks loop-start. • B handle “]” — right edge marks loop-end. Drag to set; ultramarine bar fills the loop range. ✖ Clear button instantly resets the loop. |
| Click-to-toggle video | Click anywhere on the visible video to play/pause; the ⏸︎/▶︎ button stays synchronised. |
| Autoplay | The first track auto-starts; subsequent behaviour follows Repeat Mode. |
| Repeat Mode | Begins at 🔂 One (loop current). Button cycles: 🔂 One → 🔁 All → 🔁 Off. |
| Controls | ⏮︎ Prev • ⏸︎/▶︎ Toggle • ⏭︎ Next • Repeat — plus ✖ Loop-Clear beside the seek-bar. |
| Seek & Time | Sleek seek-bar with live “elapsed / total” timer, integrated A-B loop handles and ultramarine fill. |
| Volume | Smooth 0–100 % slider with live percentage label; initial default 17 % (0.17). |
| Speed | 0.70× – 2.00× slider with − / + step buttons and 1× reset. Applies to audio & video. |
| Resizable wrapper | Outer .wrapper uses resize:both; default width governed by --w (360 px). Track-list is vertically resizable. |
| Accent colour | Edit --accent (default #1e90ff) to rebrand buttons, slider thumbs, active-track row and uploader pulse. |
| Source-code reveal | Built-in “Full Source Code” accordion shows the entire page, syntax-highlighted via Highlight.js. |
| Namespace | All logic wrapped in an IIFE; CSS uses local class names—safe to embed anywhere. |
| Topic | Details |
|---|---|
| Purpose | Self‑contained, resizable HTML5 player for audio (MP3/M4A) and video (MP4/MOV/WEBM). Pure vanilla JS—no frameworks. New since v 1.6 (c): draggable cerulean‑blue “bracket” handles for precise A‑B looping, ultramarine loop‑fill, and click‑to‑toggle playback directly on the video surface. |
| File locations | Place nmp.html anywhere.Media files live in a sibling /media/ folder.Ensure readable permissions with chmod 644 *.
|
| Playlist | Optional /media/playlist.json—an array of paths (order preserved). If absent, the player simply waits for user uploads. |
| Uploads | ➕ Upload button and drag‑&‑drop. Files become blob‑URLs, so nothing is written to disk. |
| First‑30‑second attention cue | Uploader border, hint‑text and container gently pulse, glow and scale every 2 s for the first 30 s after page‑load. |
| A‑B Looping (1.8 series) | Seek‑bar sports two cerulean “brackets”: • A handle “[” — left edge marks loop‑start. • B handle “]” — right edge marks loop‑end. Drag to set; ultramarine bar fills the loop range. ✖ Clear button instantly resets the loop. |
| Click‑to‑toggle video | Click anywhere on the visible video to play/pause; the ⏸︎/▶︎ button stays synchronised. |
| Autoplay | The first track auto‑starts; subsequent behaviour follows Repeat Mode. |
| Repeat Mode (default) | Begins at 🔂 One (loop current). Button cycles: 🔂 One → 🔁 All → 🔁 Off. |
| Controls | ⏮︎ Prev • ⏸︎/▶︎ Toggle • ⏭︎ Next • Repeat — plus ✖ Loop‑Clear beside the seek‑bar. |
| Seek & Time | Sleek seek‑bar with live “elapsed / total” timer. Integrates A‑B loop handles and ultramarine fill described above. |
| Volume | Smooth 0–100 % slider with live percentage label. |
| Resizable wrapper | Outer .wrapper uses resize:both; default width governed by --w (360 px). Track‑list is vertically resizable. |
| Accent colour | Edit --accent (default #1e90ff) to rebrand buttons, slider thumbs, active‑track row and uploader pulse. |
| Source‑code reveal | Built‑in “Full Source Code” accordion shows the entire page, syntax‑highlighted via Highlight.js. |
| Namespace | All logic wrapped in an IIFE; CSS uses local class names—safe to embed anywhere. |
Modern media players should support a variety of audio and video file formats. Below is an overview of commonly used formats, including their typical use cases, compatibility considerations, licensing issues, technical notes, and recommendations for use. Emphasis is placed on desktop and HTML5/JavaScript environments.
<audio>
element. Likewise, almost every media player and mobile device supports MP3 out-of-the-box. This wide compatibility makes MP3 a safe choice for any web-based player.
<audio>
in Chrome, Firefox, Safari, Edge, etc. (Firefox historically relied on OS codecs for AAC but on modern systems this is seamless). Virtually all smartphones and tablets support AAC playback (it's the default for iOS devices). In summary, AAC in MP4/M4A has near-universal support similar to MP3, except old browsers or very old devices may lack it.
<audio>
element (.ogg files). Safari historically did not support Ogg Vorbis until recently – as of Safari 15 (on macOS Monterey and iOS 15), Safari added support for WebM and also for Opus in WebM, but it still does not natively play .ogg Vorbis files unless additional components are installed. Therefore, Vorbis support is almost universal on desktop except older Safari versions. Opus is supported in Chrome, Firefox, Opera, and Edge; Safari added Opus support when contained in WebM (Safari 15+). However, Safari (even latest) may not play a standalone .opus file or Ogg Opus file, as its Opus support is tied to WebM container. On desktop, most third-party audio players support Vorbis, and many now support Opus as it gains popularity. In summary, for HTML5: Vorbis is widely supported except older Apple browsers; Opus is supported by all major browsers except older Safari (though Safari is catching up via WebM).
<audio>
(Chrome has since version 56, Firefox since 51). Safari added FLAC support in version 11 (around macOS High Sierra). This means modern versions of all major browsers can play .flac files directly. However, older browsers or old mobile devices might not support it. Outside the browser, FLAC is supported by many desktop music players (e.g., VLC, foobar2000, etc.) and even by some car audio systems and high-end portable music players. On Windows and macOS, FLAC can be played with native or easily available codecs (Windows 10 added native support for FLAC in its media player). One caveat: some browsers may only support FLAC in certain container forms (usually .flac extension with FLAC codec; FLAC-in-Ogg might have different support matrix). In general, .flac files (the official container/extension) are recognized by modern browsers.
<audio>
element will progressively download it), but users will experience delays if the connection is not fast, due to file size. Seeking in FLAC is typically good because FLAC frames contain markers that allow jumping to approximate positions, and most players build a seek table. For our context (desktop web player), if a user opens a local FLAC file, it should play smoothly. If a FLAC is hosted online, the browser will download a large amount but can start playing once a bit is buffered. There’s no technical issue playing partial FLAC data aside from the bandwidth concern.
<video>
element. This was a cornerstone of HTML5 video adoption — while initially there was debate over open formats, all browser vendors converged on supporting H.264 in MP4 by around mid-2010s (with the lone holdout Firefox eventually relying on OS decoders to avoid licensing fees). In practical terms, any user on a modern browser (whether on Windows, Mac, Linux, or mobile) can play MP4 video. Additionally, all desktop and mobile operating systems have native support: e.g., Windows’ Movies & TV app, macOS QuickTime, iOS, Android, Smart TVs, etc., all handle MP4/H.264. This ubiquity is unmatched by any other video format currently.
<video>
element, one is reliant on browser support. If the browser supports it, the media player just needs to be ready to supply an AV1 source. If not, the player might need to fall back to a different format. Also, detecting support might be necessary (using
canPlayType()
or similar). For local files: if a user opens an .mkv or .webm with AV1 content in nGene Media Player, Chrome or Firefox would likely play it, but if they tried in a non-supporting environment, it would fail. Thus, handling it gracefully (maybe an error that “this format is not supported on your system”) could be needed.
<video>
. Edge (Chromium) similarly. Safari does not support MKV. Therefore, one generally cannot rely on dragging an MKV file into a browser and having it play, unless it's in a special case where the MKV contains exactly the same streams as a WebM (VP9/Opus) and even then it might fail due to container recognition. That said, some users have reported Chrome can play certain MKV files, but this is not officially documented. On desktop platforms, MKV is well supported by third-party players (VLC, etc.), but not by default OS players (Windows Media Player doesn’t natively play MKV without codec packs; older QuickTime on Mac didn’t either). New Windows 10/11 Movies & TV app does support MKV to an extent (since Microsoft added MKV support in 2015 to their player). This means a Windows user might double-click an MKV and it could play in the Movies & TV app if codecs inside are supported by OS (H.264, etc.). Overall, for a web app, MKV is not a safe format to rely on without conversion or using a custom player library that can demux MKV in JavaScript.
<video>
tag. There’s virtually no push to include AVI support in browsers because it’s outdated and the codecs inside might be unsupported (e.g., MPEG-4 ASP, which browsers don’t decode, or various obscure codecs). So an AVI file will not play in an HTML5 player without conversion. On desktop, Windows has native support for AVI (since it was a native format for a long time): if the proper codec is installed, Windows Media Player can play an AVI. By default, Windows can play AVIs that use older standard codecs (like Cinepak, or uncompressed, or DV). For DivX/Xvid AVIs, users often needed to install codec packs or use third-party players like VLC. macOS never supported AVI natively in QuickTime without plugins; again, third-party players are used. In summary, a web app cannot directly play .avi, and users themselves often rely on software like VLC to play their AVI files.
Recommended Default Formats: Considering the above, for broadest compatibility and ease of use in a web-based desktop player, the recommended default formats are MP3 for audio and MP4 (H.264/AAC) for video. These two cover nearly all browsers and platforms with no special setup. In practice, this means the player should primarily handle MP3 for music and MP4 for video. However, to make nGene Media Player more robust and appealing, it should also support the common alternatives: including AAC (M4A) ensures high-quality audio support, Ogg Vorbis/Opus provides open-format options, and FLAC allows for lossless audio playback. On the video side, adding support for WebM (VP8/VP9) is advisable for modern browsers, and being mindful of AV1 will keep the player up-to-date with emerging standards. Less common or legacy formats like MKV, AVI, and MOV can be acknowledged, but the strategy should be to handle them via conversion or not at all, rather than as primary supported formats. By focusing on MP3 and MP4 as the core, and supplementing with the next tier of formats, the player will cater to most use cases while maintaining reliability.
Written on March 9, 2025
A media player like nGene Media Player not only plays audio and video but often also presents information about the media to the user. This includes basic details (duration, title) and possibly more advanced metadata (like album name, video resolution, etc.). Below, we outline what metadata can be obtained from media files and discuss methods to extract this information using web technologies (JavaScript in the browser) and Python (which could be used server-side or via PyScript in-browser). We also provide guidance on when to use client-side vs. server-side (or local) analysis based on the depth of metadata required.
Most of the above metadata can be accessed or computed with the right tools. The next sections describe how to retrieve these details using JavaScript in the browser and using Python, respectively.
In a purely browser-based environment (vanilla JavaScript), one can extract a subset of the above information. The HTML5 media elements and additional libraries are the primary means to do so:
<audio>
and
<video>
elements provide some basic metadata once a media file is loaded. For example, after setting
audio.src = URL.createObjectURL(file)
(for a File object) and waiting for it to load metadata, the property
audio.duration
gives the length in seconds. For video,
video.videoWidth
and
video.videoHeight
provide the pixel dimensions, and
video.duration
the length. There’s also
video.poster
attribute (for an assigned poster image) but not for embedded thumbnails. The
readyState
and
networkState
can tell if metadata is loaded. Additionally, the
textTracks
,
audioTracks
, and
videoTracks
properties can list tracks (like subtitle tracks or multiple audio tracks) if the format/container supports it (for instance, an MP4 with subtitles might expose textTracks). However, the HTMLMediaElement does
not
expose detailed codec info (no direct way to get “this is MP3” or “this is H.264” from the element) and does not give access to content tags like title or artist. It is limited to playback-related info. So, while this API easily gives duration, resolution, and allows for retrieving current playback time (for sync or manual analysis), it won’t retrieve, say, ID3 tags.
mediainfo.js
which is MediaInfo library compiled to WASM. Using these, a web app can get extremely detailed information. For example, MediaInfo will return data like: video codec profile (Main@L4.1 for H.264), bit depth (8-bit vs 10-bit video), chroma subsampling, exact frame rate (e.g., 23.976), encoder library name, audio channel layout (5.1, 2.0), etc., along with tags like title and chapters if present. The output can be JSON or text. The trade-off is that loading these WASM libraries (which might be 1-2 MB or more) adds overhead, and running them is somewhat heavy (parsing a large file in WASM takes a bit of time and CPU). But they are very powerful. For instance, if nGene Media Player wants to display a “Media Info” panel similar to what VLC or Media Player Classic shows, using MediaInfo.js would be ideal. You’d feed it the file (File object or ArrayBuffer) and get a structured report.
ffprobe.wasm
similarly could be invoked with arguments to show streams and format info. A practical approach is to load such a tool on-demand (e.g., only if the user opens a “Details” pane) to avoid unnecessary performance cost during normal playback.
AudioContext
, one can take an audio file (via fetch or FileReader) and call
decodeAudioData
to get an AudioBuffer containing raw PCM samples. This is limited by file size (very large files might be too much to hold in memory at once), but for moderate files it’s fine. Once the AudioBuffer is obtained, the script can analyze it: e.g., compute a waveform array (by sampling the amplitude periodically or calculating RMS levels for segments), which is great for drawing waveforms. It could also do an FFT to get frequency data for visualization or even attempt auto BPM detection (by looking for periodic peaks in the time domain or using autocorrelation techniques). The Web Audio API can also be used in real-time: connecting the media element to an AnalyserNode allows you to get real-time frequency data for visualization (good for showing a live EQ or bars that jump with the music). However, this is more about visuals; it’s not a robust way to get static metadata like “this song’s BPM is 120” (that would require a bit more algorithmic work in JS or a library). Still, it's client-side and leverages the browser’s audio decoding capabilities. Note that for protected content or some streaming formats, decodeAudioData might not work due to CORS or codec restrictions. But for local files and common codecs, it should. Overall, the Web Audio API complements metadata parsing by providing the means to derive new data (waveforms, loudness, etc.) from the raw audio.
Using the above methods, a web-based media player can gather a wealth of information without leaving the browser. For instance, on loading a file, the player could immediately display the duration via the duration property, show the title/artist by parsing tags with music-metadata, show the resolution via videoWidth/Height , and perhaps generate a waveform preview using Web Audio – all done client-side. The main constraints are performance (very large files or very detailed analysis can be slow) and the necessity to include libraries or WASM modules (increasing app size). When extremely detailed info or heavy computation is needed, one might then consider Python or server-side tools, as described next.
Python has a rich ecosystem for media processing, and it can be used in two ways: on a backend server (or a local machine, outside the browser) to preprocess or analyze media, or via PyScript/WebAssembly to run Python code in the browser. Here we outline how Python libraries can extract metadata and do deeper analysis, and how that might fit into the architecture of the media player.
ffprobe
on a media file can output details in a structured format (e.g., JSON or XML) that includes essentially everything about the file. This includes: container format, file size, duration, bitrates, codec names and profiles, frame rate, resolution, pixel format, audio sample rate, channel layout, and even the contents of metadata tags (like title, artist, etc., if present). For example,
ffprobe -v quiet -print_format json -show_format -show_streams file.mp4
will produce a JSON with a “format” section (with tags and duration/size/bitrate) and a “streams” array (each stream having codec type, codec name, width, height, channel count, language, etc.). In a Python context, one could invoke ffprobe via
subprocess
and parse this JSON. There are also wrapper libraries (like
ffmpeg-python
or
pymediainfo
for MediaInfo) that can retrieve similar info. If nGene Media Player has access to a local Python environment or a server, using ffprobe is one of the most straightforward ways to get a comprehensive metadata dump. The output can then be filtered to display relevant info in the UI. For instance, one could show “Video: H.264, 1080p, 30fps, ~5 Mbps; Audio: AAC, 2 channels, 128 kbps; File Size: 700MB; Duration: 01:30:00”. FFmpeg can also extract thumbnails (e.g., generate an image at a certain timestamp) or even waveforms (generate a waveform image), which can be part of metadata enrichment (though those outputs are more content-derived). If running Python server-side, the player could send the file (or its path) to the server to analyze with ffprobe and return results. If using PyScript, one could compile ffprobe or use a Python binding (though likely you’d just call a JavaScript ffprobe as mentioned earlier to avoid double overhead).
from mutagen.mp3 import MP3
audio = MP3("song.mp3")
print(audio.info.length, audio.info.bitrate)
# duration in seconds, bitrate in bps
print(audio.tags.get("TIT2"), audio.tags.get("TPE1"))
# Title and Artist ID3 frames
Mutagen would read the ID3 frames and allow access by frame identifiers or via a common interface (mutagen also has
mutagen.File()
which auto-detects the format and gives a generic object). Similarly, for FLAC:
import mutagen.flac
audio = mutagen.flac.FLAC("file.flac")
print(audio.info.length)
print(audio.tags["artist"], audio.tags["title"])
This will give the Vorbis comment tags. Mutagen also handles pictures in tags (it can extract the image bytes). The library is lightweight and fast for tag reading. Another library,
eyeD3
, is specialized for MP3 and focuses on ID3 v2. It provides a slightly higher-level interface for MP3 metadata and can also do things like calculate BPM (if a plugin is used) or manage cover art. EyeD3 could tell you, for instance, if an ID3 tag has a certain encoding or if there are multiple tag versions. However, for most use cases, mutagen suffices and works across formats. In context, Python with mutagen could be used to scan a library of songs and build a database of metadata that the JS player then uses. Or if PyScript is considered, one could load a smaller subset (maybe just mutagen’s logic for ID3) to parse a file in-browser. But that might be redundant if JS libraries exist. Mutagen truly shines server-side or in batch processing scenarios.
y, sr = librosa.load("song.mp3")
# decodes audio to waveform (requires ffmpeg or audioread backend)
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
print("Estimated tempo:", tempo, "BPM")
This will output an estimated BPM for the track. Librosa might mis-estimate if the track has variable tempo or unclear beats, but it’s generally good for reasonably rhythmic music. To get the key, one approach is to compute the chroma (which gives an energy for each pitch class over time) and then use a heuristic or a simple algorithm to guess the key from the aggregate chroma. Librosa doesn’t directly give “key = C# minor” in one call, but it provides tools to derive it. Essentia (next point) does have a built-in for key.
import essentia.standard as es
loader = es.MonoLoader(filename='song.wav')
audio = loader()
rhythm_extractor = es.RhythmExtractor2013(method="multifeature")
bpm, beats, beats_confidence, _, _ = rhythm_extractor(audio)
key_extractor = es.KeyExtractor()
key, scale, key_strength = key_extractor(audio)
print("BPM:", bpm, "Key:", key, scale)
This might output “BPM: 127.9, Key: G, scale: minor” for example. Essentia is very powerful but also heavy; running it in real-time in a browser via PyScript would be challenging. It’s more suited to offline analysis or backend processing.
When implementing metadata extraction in nGene Media Player, it’s important to choose the right tool for the job to provide a good user experience without unnecessary overhead. Here are some guidelines on when to use client-side JS vs. Python/back-end solutions:
audio.duration
). This keeps the interface responsive. Also, showing waveforms or simple visualizations using Web Audio can be done progressively (e.g., decode small chunks or downsampled audio) so that the UI remains interactive.
In conclusion, the strategy for metadata should match the needs of the user base and the resources available. For a relatively small-scale or personal project, sticking to client-side solutions keeps things simple and respects user privacy. For a larger-scale application with many users and files, investing in backend services for richer metadata could greatly enhance the user experience. nGene Media Player can start by extracting what’s easy (duration, basic tags via JS) and progressively incorporate more advanced metadata features using Python tools as needed, ensuring that the architecture remains flexible for such upgrades.
Written on March 9, 2025
With the functionality in place, attention turns to improving the user interface and experience of nGene Media Player. A desktop-focused web media player should leverage the larger screen and input options (mouse, keyboard) to provide an engaging and efficient experience. Below are suggestions for design and UX enhancements, organized into layout/visual improvements, interaction improvements, and the use of modern libraries to add polish. The tone of these suggestions is to enhance usability and aesthetics in a professional, subtle way without overwhelming the user.
canvas
element drawing video frames during load, but can be complex and heavy). Even without thumbnails, a tooltip with the timestamp at the cursor is very useful (e.g., hover at a point on the timeline and see “1:23:45” so the user knows where they will jump if clicked). If chapters or track markers are known (say an album is playing and each track in a DJ mix has a timestamp, or a video has chapter metadata), the timeline could incorporate small tick marks or icons to indicate these. Clicking on those could jump to that chapter. Another idea is “speed scrubbing”: if the user drags the seek handle, moving the cursor farther up or down while dragging could change the seek speed (some video editors do this, but it might be too advanced for a simple player). At the very least, ensuring that clicking on the progress bar is easy and precise (perhaps making the clickable area tall enough, etc.) is important. Additionally, for long audio tracks, one could allow direct input of a time to jump to (e.g., clicking the elapsed time display could turn it into an editable field where the user types a timestamp like “5:00” to jump to 5 minutes). Such features cater to advanced usage scenarios.
<media-player>
main component might contain sub-components like
<media-controls>
(play, pause, volume, timeline),
<media-playlist>
, and
<media-metadata-display>
. Each of those could be a Lit component with its own styles and reactive properties. The advantage is modularity: the code for the playlist doesn’t directly interfere with the code for controls, and each component can be developed and tested somewhat in isolation. Lit makes it straightforward to reflect properties to the DOM and update when data changes (for example, if the track title changes, the
media-metadata-display
component will automatically re-render that part). Web Components also ensure that if the player is integrated into a larger page or reused, it has a self-contained scope (shadow DOM can prevent styles from leaking in or out). Since nGene Media Player is desktop-focused, you might not need to worry about other page content, but the developer ergonomics of Lit still apply. It’s a humble framework in the sense that it doesn’t impose heavy structures; it just helps create elements. Another benefit is theming: with Web Components, one can define CSS custom properties for theming that users of the component (or a global theme) can set. For instance, the player could expose
--player-accent-color
which would propagate to the play button, progress bar, etc., to easily change the color scheme. In summary, adopting Lit can future-proof the codebase as the UI grows and ensure performance (Lit updates are efficient) and maintainability.
<sl-button>
– which can be used for play/pause or other buttons (it supports variants, toggling, icons, etc.). For example, a play button could be
<sl-button pill icon="play-fill"></sl-button>
(using an icon pack integration) giving a nice circular icon button with hover effects built-in.
<sl-slider>
– a stylized slider which could serve as the volume or progress bar. Shoelace sliders are themable and accessible, and they can have tooltips showing the value on hover if enabled.
<sl-range>
– similar to slider, might be used for volume with a min-max display of value or for brightness if needed (for video).
<sl-dialog>
– could be used if you implement a “Preferences” or “About” dialog, or a confirmation (like “Are you sure you want to clear the playlist?”). It provides a responsive, accessible modal out of the box.
<sl-menu>
and
<sl-menu-item>
– can help build a context menu or dropdown menus for settings.
<sl-icon>
– Shoelace comes with an icon library (or you can plug in your own SVGs) for consistent icons.
<sl-slider>
already works with keyboard arrow keys and is screen-reader friendly, whereas a custom range input might need additional handling to reach the same level. The end result is a more polished UI with less effort. One just has to be mindful to include the Shoelace script and define the custom elements; after that, it’s plug-and-play.
--accent-color
could be used for progress bar fill, button highlights, etc. For desktop, often a dark theme is preferred for media apps (think of VLC, Spotify, etc.), but it should be a tasteful dark: dark gray backgrounds with light text, using accent colors for highlights (like play button when hovering or progress). Provide enough contrast for readability. Additionally, ensure the design scales for High-DPI screens (using SVG icons or font icons so they don’t blur). All text should use a clear font (the default system font is usually fine, or a clean sans-serif). If wanting a high-end feel, subtle shadows and blurs can be used (for example, a slight shadow behind the control bar to ensure it’s visible over a video). The goal is a UI that feels cohesive; every element’s style should appear part of the same family. Modern libraries like Shoelace already follow a coherent design system, which helps. If custom-building, one can draw inspiration from material design or fluent design systems but adapt them lightly to avoid a generic look. Testing the UI on different screens and lighting conditions (monitor vs laptop, bright room vs dark room) can inform tweaks in contrast or sizing to ensure usability.
By implementing these design and UX improvements, nGene Media Player will not only be functionally robust but also user-friendly and visually appealing. It will feel like a modern desktop application, with responsive controls, rich visuals like waveforms, and thoughtful details (like shortcuts and drag-drop) that desktop users appreciate. The use of web technologies and libraries means the player can achieve a high level of polish comparable to native apps, while remaining customizable and lightweight. As always, incremental enhancement is wise: features can be added step by step, gathering user feedback to refine the UX. Over time, these improvements can significantly elevate the user’s enjoyment and efficiency when using the media player, fulfilling the goal of a comprehensive and professional media playback experience.
Written on May 9, 2025
The Fourier Transform is a fundamental tool that converts a time-domain signal into a frequency-domain representation. In essence, it decomposes a waveform into a sum of sinusoidal components of various frequencies. Mathematically, for a continuous signal \(x(t)\), the Fourier transform \(X(f)\) is defined by an integral that sums \(x(t)\) against complex exponentials \(e^{-j 2\pi f t}\) across time. This operation produces a complex function \(X(f)\) indicating the amplitude and phase of each frequency component present in the original signal. In the context of digital audio (with discrete samples), one uses the discrete Fourier transform (DFT), which similarly expresses a finite sequence as a combination of sinusoidal basis functions.
By revealing the frequency content of a waveform, the Fourier transform provides insights that are difficult to obtain from raw time-domain data. In audio analysis scripts, applying a Fourier transform enables spectral visualization– for example, generating a frequency spectrum or spectrogram that shows how energy is distributed across frequencies (and over time, in the case of a spectrogram). The frequency-domain view makes it easy to identify prominent frequency components: one can readily spot the dominant pitch (fundamental frequency) of a sound and its harmonics, or recognize different sound sources by their distinct spectral patterns.
Fourier analysis also aids in segmentation and feature extraction. Different sections of an audio signal (such as phonemes in speech or notes in music) often exhibit distinct frequency profiles; thus, a script can detect transitions or segment the waveform by looking for changes in the spectrum. Moreover, many audio features and processing techniques are based on the Fourier transform. For instance, one can filter out unwanted noise by zeroing out specific frequency bands in the spectrum, or compute descriptive metrics like the spectral centroid (the “center of mass” of the spectrum) and spectral bandwidth. In summary, the Fourier transform is a cornerstone of waveform analysis, transforming complex time-domain data into a form that is more amenable to visualization, measurement, and algorithmic manipulation.
While the term Fourier Transform refers broadly to the mathematical conversion between time-domain and frequency-domain representations, the Fast Fourier Transform (FFT) is a specific efficient algorithm for computing the Fourier transform (particularly the DFT) in practice. The FFT leverages symmetries in the calculation to greatly speed up the transformation. The comparison below highlights key differences and roles of each:
| Aspect | Fourier Transform (FT) | Fast Fourier Transform (FFT) |
|---|---|---|
| Definition | A general mathematical transform mapping a signal from the time domain to the frequency domain. Can be formulated as an integral (continuous case) or a summation (DFT for discrete signals). | An algorithm (family of algorithms) to compute the discrete Fourier transform rapidly. It gives the same result as the DFT but far more efficiently. |
| Computation | Conceptually involves integrating or summing over all time samples with complex exponentials. Direct computation of an N-point DFT has complexity on the order of O(N 2 ). | Uses a divide-and-conquer approach (e.g. the Cooley-Tukey algorithm) to reduce computational workload. Achieves roughly O(N log N) complexity, which is substantially faster for large N. |
| Usage | Provides the theoretical foundation for frequency analysis; used in analytical derivations and definitions (e.g. defining the spectrum of a signal). | Used for practical computation in software and scripts. In almost all real applications (audio analysis, signal processing), one calls an FFT routine to obtain the frequency spectrum of a dataset. |
Practical note: In scripting and signal processing work, the FFT is the de facto method to perform Fourier analysis on data. One rarely computes a Fourier transform “by hand” except for theoretical work; instead, built-in FFT functions efficiently yield the frequency-domain data. Both FT and FFT produce the same kind of output (frequency-domain representation), but the FFT makes it feasible to analyze long signals and even to do real-time spectral processing thanks to its speed.
Sound waves have several measurable properties that correspond to how we perceive sound. A simple sinusoidal waveform can be expressed as \(x(t) = A \sin(2\pi f t + \phi)\), where \(A\) is the amplitude, \(f\) is the frequency, and \(\phi\) is the phase. These physical parameters relate directly to key auditory attributes: amplitude corresponds to perceived loudness, frequency corresponds to perceived pitch, and phase influences the waveform’s alignment (which can affect how waves interfere or combine). Real-world sounds are usually not single pure tones, but combinations of many frequency components; this gives rise to additional characteristics like timbre(the quality of sound that distinguishes different sources or instruments) and the amplitude envelope(how a sound’s loudness changes over time). Below, several fundamental waveform attributes are described:
Beyond the basic Fourier transform and the attributes of waveforms, there are several advanced techniques that can further assist in analyzing and processing audio signals. These methods either provide more detailed time-frequency information or apply statistical decomposition to extract meaningful components from complex data. Key techniques include the following:
Each of the above techniques offers unique benefits for audio processing. Time-frequency methods like STFT and wavelet transforms allow detailed examination of when certain frequencies occur, addressing limitations of a plain Fourier transform for non-stationary signals. Statistical methods like PCA and ICA enable the extraction of patterns or sources from multivariate data, which is valuable when dealing with complex mixtures or reducing data dimensionality. Other specialized analyses such as cepstral processing and NMF target specific types of structure (periodicity in spectrum, or additive parts of a mixture) that are not immediately apparent from a basic FFT. By combining these approaches – Fourier-based transforms for spectral content, wavelets for multi-scale timing, and component analysis for pattern separation – an audio analysis script can be significantly enhanced, yielding richer insights and more powerful processing capabilities.
Written on November 13, 2025
Heart sound analysis is the study of the audible noises produced by the heart (the phonocardiogram (PCG)) to detect health conditions or even identify individuals. Traditionally, doctors use a stethoscope to listen to heart sounds for diagnosing murmurs, valve problems, or other cardiac issues. With modern technology, these sounds can be recorded as digital audio, enabling computerized analysis using signal processing and deep learning. Focusing on audio-only data (without additional signals like ECG or imaging) is a practical approach, especially since heart sounds alone carry rich information about cardiac function. Below, we discuss the sources of heart sound recordings, challenges in using them, and how data augmentation and synthetic recordings (including simulator-based audio) are improving heart sound analysis.
Collecting real heart sound recordings is the first step for any audio-based analysis. Heart sounds are typically recorded using electronic stethoscopes or microphones placed on the chest. Over the years, several datasets of these audio-only heart recordings have been compiled for research and education:
For example, the Heart Sound and Murmur Library (University of Michigan, 2015) is an open collection of stethoscope recordings. It contains examples of normal heartbeats and various murmurs. Such libraries are relatively small (a few dozen recordings) and meant for teaching, but they provide clear samples of different heart sound types.
A large public dataset assembled for a heart sound classification challenge. It comprises thousands of PCG recordings collected from multiple sources and countries. The recordings include both normal and abnormal heart sounds (murmurs, etc.), captured with different devices in varied environments. This diversity makes it valuable for training models, though it also introduces noise and heterogeneity.
One of the largest heart sound datasets to date, with over 5,000 recordings, focused on pediatric patients. It was created for a recent PhysioNet challenge on murmur detection. Importantly, this dataset provides multiple recording spots per patient (various chest locations) and includes labels for murmurs. Being a big audio-only collection, it supports deep learning models that require lots of data.
Researchers have also used smaller collections from hospitals or labs. Some include specific conditions (e.g., only certain valve diseases) or specific populations. The general trend is that purely audio heart datasets are much smaller than, say, image datasets in other domains, due to the effort needed to record and label each patient's heart sounds.
All these recordings are pure sound (PCG) data. They capture the lub-dub of heartbeats and any extra sounds (murmurs, clicks) but no additional signals. Working with audio-only data is appealing because recording audio is non-invasive and simple compared to imaging or other tests. However, relying on sound alone means the analysis must overcome some challenges inherent to audio data, as discussed next.
Using only real heart sound recordings for automated analysis comes with several challenges:
Compared to fields like image or speech recognition, heart sound datasets are quite limited in size. Collecting heart audio requires clinical access and expertise (for labeling what is normal vs abnormal). Privacy and consent issues also limit sharing patient data. As a result, researchers often have only a few thousand recordings or less, which can be insufficient for training complex deep learning models.
In many heart sound datasets, normal recordings far outnumber abnormal ones. For example, there are many recordings of healthy heartbeats, but relatively fewer examples of rare murmurs or conditions. This imbalance makes it hard for a model to learn the subtleties of abnormalities – it might simply learn to always predict "normal". The model’s performance on detecting actual pathological cases can suffer as a result.
Heart audio recorded in real-life settings often contains noise. There can be background sounds (hospital room noise, stethoscope friction, patient movement) and other body sounds (lung sounds overlapping the heart sounds). Additionally, different stethoscope devices and placement sites produce variations in sound quality and frequency content. This high variability means a model trained on one dataset might not perform well on another if the noise profiles differ. It’s a challenge to make models robust to these differences using limited real data.
Determining the ground truth (what exactly the heart sound signifies) often requires expert listening. Labeling a murmur or diagnosing a condition from sound is sometimes subjective and error-prone. So, real datasets may have label noise or inconsistencies. For tasks like biometric identification using heart sounds, labeling who the sound belongs to is easier, but such use-cases are less common and still experimental.
Because of these challenges, researchers seek ways to enhance and expand the available audio data without having to gather countless new patient recordings. This is where data augmentation and synthetic data generation become crucial.
Data augmentation refers to taking existing real recordings and modifying them in various ways to create "new" training examples. The key idea is to expand the dataset artificially and introduce variations that improve a model’s generalization. For heart sound (audio) data, common augmentation techniques include:
Overlaying recordings with additional noise can help a model learn to focus on the relevant heart sound patterns and become noise-tolerant. For instance, one can add white noise, ambient hospital sounds, or respiratory noises at various levels to a clean heartbeat recording. This teaches the model to handle different signal-to-noise scenarios.
Slightly changing the speed of the audio without altering pitch can simulate different heart rates. A recording can be time-stretched to sound a bit slower or faster (within realistic limits) which is like having the patient’s heart beating at a different rate. This augmentation helps the model cope with heart rate variability.
Although heart sounds don’t exactly have a “pitch” like music, one can alter the frequency content a bit – for example, simulating the effect of different stethoscope frequency responses or chest anatomy. A mild pitch shift can make the sound a bit higher or lower in frequency, which may help the model to not be overly tuned to one particular frequency profile.
Long heart sound recordings can be split into shorter segments (which provides more training samples). Conversely, one might concatenate beats from different recordings to create a new sequence. This can be tricky for preserving realism, but sometimes mixing segments helps ensure the model sees a variety of beat patterns.
Changing the volume (amplitude) simulates varying auscultation pressure or device gain. Applying filters (like bass boost or treble cut) can mimic using different stethoscope hardware. These augmentations ensure the model doesn’t get thrown off by recordings that are louder, quieter, or slightly filtered relative to the training data.
By augmenting the available heart sound recordings in these ways, researchers can greatly increase the number of training examples and the diversity of conditions. For example, a dataset of a few hundred real recordings can be expanded to thousands of augmented samples by applying combinations of these techniques. This has been shown to improve performance; the model learns to recognize the underlying heart sound patterns (normal or abnormal) under various noise and distortion conditions, rather than overfitting to the exact original recordings.
However, augmentation can only produce variations of what already exists in the data. It doesn’t create entirely new heart sound events that were never recorded. For generating completely new heart sound samples (especially of rare conditions), researchers turn to synthetic data generation.
Synthetic generation involves creating artificial heart sound signals that imitate real ones. Unlike simple augmentation (which modifies real recordings), synthetic data can provide brand-new examples, potentially including pathological patterns that are under-represented in real data. Several approaches have emerged for synthesizing heart sounds:
Earlier attempts used mathematical models of the heart’s mechanics and blood flow to synthesize phonocardiograms. For instance, one can model the heart valves opening/closing and generate corresponding sound waves. These models could produce basic normal heartbeat sounds and some murmur-like effects by altering parameters (like simulating a leaky valve). While insightful, purely mathematical models often struggle to capture the full complexity and natural variability of real heart sounds.
In recent years, GANs have been applied to heart sound data. A GAN is a deep learning model with two parts (generator and discriminator) that can learn to create realistic fake samples. Researchers have trained GANs on collections of real heart sounds so that the generator can output new audio waveforms that sound like heartbeats. One notable use-case is generating abnormal heart sounds (e.g., murmurs indicative of disease) because these are less common in datasets. By creating synthetic abnormal samples, the training set can be balanced. Studies have shown that using GAN-generated heart sounds as additional training data improves a model’s ability to detect cardiac abnormalities. The synthetic sounds, if high-quality, can introduce subtle variations of murmurs that the model might not see in the limited real dataset. Progressive GAN architectures have been reported to produce fairly realistic heart cycles, and when classifiers are trained on a mix of real and GAN-generated data, their accuracy on detecting conditions improved compared to training on real data alone.
Beyond GANs, new generative frameworks like diffusion probabilistic models have been explored for heart sound synthesis. Diffusion models gradually add and remove noise to/from data in a learning process, and they have achieved excellent fidelity in audio generation (they are used in some speech synthesis tasks). Researchers have begun applying these to heart sounds, sometimes in creative ways – for example, generating a heart sound conditioned on an ECG signal. In one recent approach, a diffusion model was used to create artificial heart sound waves (PCG) from corresponding ECG recordings. This effectively augments existing ECG datasets with synthetic heart sound data. Even without conditioning on ECG, diffusion models can be trained to generate heart sound clips that are hard to distinguish from real stethoscope recordings. The key advantage of these advanced generative models is the quality of synthetic output: they can capture the timing and timbre of real heartbeats, including subtle murmurs or extra sounds, more convincingly than older methods.
VAEs and similar generative networks have also been tried for creating heart sound spectrograms or waveforms. These tend to produce slightly blurrier outputs compared to GANs or diffusion, but can still add variety to the dataset.
Synthetic heart sounds generated by these methods can significantly increase the training data, especially for rare conditions. For example, if the real dataset has only a handful of recordings of a particular murmur type, a GAN or diffusion model trained on them might produce dozens of plausible new examples of that murmur. These can then be added to training. It is crucial, however, that synthetic sounds are realistic. Poor-quality synthetic data might contain artifacts or unrealistic patterns that could confuse the model. Therefore, researchers usually validate synthetic samples (e.g., have experts or algorithms check that they resemble real heartbeats) before trusting them for model training.
Another source of augmented audio-only data is using clinical simulators or manikins. Medical training manikins often have built-in speakers and software that can emulate heart and lung sounds for different conditions. These simulator-based recordings occupy a middle ground between real and fully synthetic data:
A digital stethoscope can be placed on a training manikin (or a specialized simulator device) which is programmed to play a specific heart sound scenario (such as a murmur of a certain type, or a normal heart with a particular rate). The resulting recording is an audio file that is technically "real" in the sense that it was recorded through a stethoscope, but the source of the sound is an artificial simulation. One publicly available dataset, for instance, includes over 500 recordings from a clinical manikin, covering various normal and abnormal heart and lung sounds. These are useful because the exact diagnosis or condition for each recording is known (since the scenario was programmed). They also allow repetition – researchers can generate as many recordings as needed of a certain condition by replaying it or adjusting the simulator.
Simulator-based sounds are consistent (which is good for focused training data on a specific condition) but can lack some variation present in real patients. For example, a manikin’s “aortic stenosis murmur” might always have the same character, whereas real patients with the same condition could have slight differences in their murmur sounds due to anatomy or comorbidities. Therefore, while manikin recordings enhance data volume and provide ground-truth labels, they may not capture the full diversity of real heart sound presentations.
Interestingly, one can also apply the earlier augmentation techniques to simulator recordings. For instance, taking a clear manikin-generated murmur sound and adding noise or slight filtering could make it more realistic. In this way, simulator data can serve as a base which is then diversified through augmentation.
Simulator-based recordings are especially valuable for training and initial algorithm development. They ensure that at least the algorithm has heard examples of the condition it’s supposed to detect. Later on, fine-tuning with real patient recordings can adjust the model to real-world idiosyncrasies. Overall, simulators provide a safe, repeatable, and cost-effective way to get more heart sound data without needing to find numerous patients with each condition.
Incorporating augmented and synthetic heart sound recordings has shown clear benefits for machine learning models:
By training on a larger and more diverse dataset (real + augmented + synthetic), models generalize better. Studies have reported that classifiers for detecting abnormal heart sounds achieved higher accuracy when rare abnormal examples were bolstered with synthetic instances. Even modest gains in accuracy can be significant in a clinical context – for example, catching a few more cases of disease that might have been missed.
Perhaps the biggest advantage is improved robustness. A model trained on varied data (different noises, different simulated conditions) is less likely to be thrown off by a slightly different recording. In fact, experiments have shown that when a model is tested on an entirely new dataset (from a different hospital or recorded with a different device), those trained with extensive augmentation/synthesis maintain performance much better. One report noted dramatic improvements in cross-dataset evaluation: a classifier trained with synthetic augmented data saw its performance on an external test set jump considerably (indicating it wasn’t overfit to the quirks of the original training set). This robustness is crucial for real-world deployment, where a heart sound AI might encounter sounds from many environments.
Synthetic generation specifically helps address the class imbalance problem. By generating more samples of under-represented classes (e.g. various murmur types, heart defect sounds), the training data becomes more balanced. A model trained on a balanced set is less biased and more sensitive to detecting those abnormal cases. In practical terms, this means fewer false negatives (missing a pathology) because the model had plenty of examples to learn what that pathology sounds like.
With more data available through augmentation, researchers have begun exploring ambitious applications like heart sound biometric identification (using a person’s unique heart sound as an ID). This is a challenging task because each recording can vary with conditions, but having lots of audio data (including simulated variations of an individual’s heart sound) could help algorithms discern person-specific patterns. Augmented data also supports training deep neural networks for tasks like segmentation (finding exact timing of heartbeats) and multi-condition classification (distinguishing between different murmur types), where large datasets are needed for the model to learn fine-grained differences.
Another benefit is the ability to try out scenarios that are rare in reality. For instance, if one wants to test an algorithm’s ability to detect an extremely rare heart defect, creating a synthetic version of that defect sound and inserting it into various backgrounds can allow preliminary testing of the model’s sensitivity. This way, researchers aren't entirely constrained by what they can collect in clinics.
It’s worth noting that while augmented and synthetic data improve models, they must be used carefully. If the synthetic data is too artificial or if augmentation is overdone (creating sounds that no longer resemble real physiological signals), models might learn wrong patterns. The best practice is to combine real and synthetic data and validate the model extensively on real-world recordings to ensure it performs as intended.
In summary, audio-only heart sound recordings are a powerful resource for non-invasive cardiac diagnosis and potentially for biometric identification. Numerous datasets of heart sounds have been gathered, but they are often limited in size and scope. By focusing on sound alone, one avoids the complexity of additional sensors, but this places more importance on having rich and sufficient audio data. Data augmentation techniques have become a standard tool to enrich heart sound datasets, introducing variability in noise, timing, and frequency that help machine learning models learn robust features. Beyond that, synthetic heart sound generation – through advanced AI models or simulator-based recordings – has opened new avenues to significantly expand the training data with realistic examples of normal and pathological heart sounds. These approaches help overcome the challenges of data scarcity and imbalance, leading to models with higher accuracy and better generalization to real-world conditions.
The combination of real heart recordings with augmented and synthetic data is enabling more reliable heart sound analysis systems. Researchers have demonstrated that this approach can improve detection of abnormalities (like murmurs) and make the algorithms more resilient to variations between different hospitals or recording devices. Looking forward, as generative models continue to improve, we can expect even more lifelike synthetic heart sounds to augment datasets. This will further reduce the dependency on large-scale clinical data collection and allow rapid development of heart sound AI tools. In essence, using sound-only data, enhanced with creative augmentation and synthetic generation, is a promising strategy to advance digital stethoscope applications – helping screen for heart conditions accurately and possibly verifying identity through the subtle acoustics of the heart. This audio-focused approach maintains the simplicity and non-invasiveness of the stethoscope while leveraging modern computational techniques to extract as much information as possible from the heartbeat sound.
Written on November 14, 2025
This document describes extract_meta_from_media.py (v1.1), an enhanced Python script that
computes the global BPM of every .m4a file in ~/Desktop/m4a
and—new in this release—extracts tempo metadata and an instantaneous tempo curve
for deeper musical analysis.
The script will:
.m4a files in the m4a folder on your Desktop.~/Desktop:
python3 -m venv venv source venv/bin/activate pip install --upgrade pip
pip install librosa mutagen numpyOptional but wise: librosa benefits from FFmpeg for broad codec support:
brew install ffmpeg
Desktop/
├── extract_meta_from_media.py
└── m4a/
├── song1.m4a
├── song2.m4a
└── …
The complete v1.1 source code is reproduced below.
#!/usr/bin/env python3
"""
Filename : extract_meta_from_media.py
Version : 1.1
Author : Hyunsuk Frank Roh
Description
-----------
Walk through ~/Desktop/m4a, estimate the *global* BPM of every .m4a file,
**and** (new in v1.1) extract extra tempo information:
• Embedded tempo/BPM tag from the file’s metadata (iTunes ‘tmpo’ atom).
• An instantaneous tempo curve so you can see how BPM fluctuates over time.
Dependencies
------------
pip install librosa mutagen numpy
Usage
-----
python extract_meta_from_media.py
"""
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
import os
from typing import List, Tuple, Optional
import numpy as np
import librosa
from mutagen.mp4 import MP4
# --------------------------------------------------------------------------- #
# Core routines #
# --------------------------------------------------------------------------- #
def compute_tempo(
audio_file_path: str,
sr_target: int | None = None
) -> Tuple[float, List[float]]:
"""
Return (global_bpm, tempo_curve).
Parameters
----------
audio_file_path : str
Path to an audio file (.m4a).
sr_target : int | None
Target sample-rate for librosa.load (None = original file rate).
Returns
-------
global_bpm : float
Single BPM estimate from librosa’s beat tracker.
tempo_curve : list[float]
Frame-level BPMs returned by librosa.beat.tempo(..., aggregate=None).
"""
y, sr = librosa.load(audio_file_path, sr=sr_target)
# Global BPM via beat tracking
global_bpm, _ = librosa.beat.beat_track(y=y, sr=sr)
# Instantaneous tempo curve
tempo_curve = librosa.beat.tempo(y=y, sr=sr, aggregate=None)
return float(global_bpm), tempo_curve.tolist()
def read_tagged_tempo(audio_file_path: str) -> Optional[float]:
"""
Fetch embedded tempo/BPM tag (iTunes ‘tmpo’ atom) if present.
Returns None when no tag is found or the file type is unsupported.
"""
try:
audio = MP4(audio_file_path)
if "tmpo" in audio.tags: # ‘tmpo’ is usually a single int
return float(audio.tags["tmpo"][0])
except Exception:
pass # Unsupported container or no tag
return None
# --------------------------------------------------------------------------- #
# Main driver #
# --------------------------------------------------------------------------- #
def main() -> None:
desktop_path = os.path.join(os.path.expanduser("~"), "Desktop")
m4a_folder = os.path.join(desktop_path, "m4a")
if not os.path.isdir(m4a_folder):
print(f"Folder not found: {m4a_folder}")
return
m4a_files = sorted(
f for f in os.listdir(m4a_folder) if f.lower().endswith(".m4a")
)
if not m4a_files:
print(f"No .m4a files found in {m4a_folder}")
return
for filename in m4a_files:
file_path = os.path.join(m4a_folder, filename)
print(f"\nProcessing {filename} …")
try:
global_bpm, tempo_curve = compute_tempo(file_path)
tagged_tempo = read_tagged_tempo(file_path)
print(f"Estimated global BPM : {global_bpm:.2f}")
if tagged_tempo is not None:
print(f"Embedded tempo tag : {tagged_tempo:.2f} BPM")
else:
print("Embedded tempo tag : – (none)")
if tempo_curve:
arr = np.array(tempo_curve)
print(
"Instantaneous tempo stats:"
f" min {arr.min():.2f}"
f" | mean {arr.mean():.2f}"
f" | max {arr.max():.2f} BPM"
)
# Uncomment if you want to peek at the first few entries
# print('Tempo curve (first 10):', ', '.join(f'{v:.2f}' for v in arr[:10]))
except Exception as exc:
print(f"Error processing {filename}: {exc}")
if __name__ == "__main__":
main()
| Component | v1.0 Behaviour | v1.1 Upgrade |
|---|---|---|
read_tagged_tempo() |
— | Uses mutagen to pull the iTunes BPM tag (tmpo) if it exists. |
compute_tempo() |
Returned a single BPM value. | Also returns a frame-level tempo curve via librosa.beat.tempo(..., aggregate=None). |
| Console output | Only global BPM printed. | Adds embedded tag (if present) plus min/mean/max of the tempo curve for quick insight. |
| Dependencies | librosa, soundfile |
Now librosa, mutagen, numpy (soundfile is still auto-pulled by librosa). |
┌────────────────────────────┐
│ Start Script │
└────────────────────────────┘
│
▼
┌────────────────────────────┐
│ 1. Verify ~/Desktop/m4a │
└────────────────────────────┘
│
▼
┌────────────────────────────┐
│ 2. List all .m4a files │
└────────────────────────────┘
│
┌────────┴─────────┐
│ Any files found? │
└────────┬─────────┘
Yes │ No
│
▼
┌────────────────────────────────────┐
│ 3. For each file: │
│ • Estimate global BPM │
│ • Read embedded BPM tag │
│ • Compute tempo curve │
│ • Print results │
└────────────────────────────────────┘
│
▼
┌────────────────────────────┐
│ End │
└────────────────────────────┘
~/Desktop):
source venv/bin/activate
python extract_meta_from_media.py
Processing song1.m4a … Estimated global BPM : 128.12 Embedded tempo tag : 128.00 BPM Instantaneous tempo stats: min 127.50 | mean 128.05 | max 128.60 BPM
deactivate
Written on May 18, 2025
This document presents extract_meta_from_media.py (v1.2), an upgraded
Python script that scans ~/Desktop/media for audio-capable files
(.m4a, .mp3, .mp4), computes each track’s global BPM, and
extracts embedded tempo tags plus an instantaneous tempo curve for detailed
musical analysis.
The script will:
.m4a, .mp3, .mp4)
in the media folder on your Desktop.tmpo atom for .m4a/.mp4TBPM frame (or EasyID3 “bpm”) for .mp3
~/Desktop:
python3 -m venv venv source venv/bin/activate pip install --upgrade pip
pip install librosa mutagen numpyTip: Install FFmpeg for wider codec support:
# macOS (Homebrew) brew install ffmpeg
Desktop/
├── extract_meta_from_media.py
└── media/
├── song1.m4a
├── track2.mp3
├── clip3.mp4
└── …
The complete v1.2 source code is reproduced below.
#!/usr/bin/env python3
"""
Filename : extract_meta_from_media.py
Version : 1.2
Author : Hyunsuk Frank Roh
Description
-----------
Walk through ~/Desktop/media, estimate the *global* BPM of every audio-capable
file (.m4a, .mp3, .mp4), **and** extract extra tempo information:
• Embedded tempo/BPM tag from the file’s metadata
– iTunes 'tmpo' atom for .m4a / .mp4
– ID3 'TBPM' (or EasyID3 "bpm") for .mp3
• An instantaneous tempo curve so you can see how BPM fluctuates over time.
Dependencies
------------
pip install librosa mutagen numpy
Usage
-----
python extract_meta_from_media.py
"""
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
import os
from typing import List, Tuple, Optional
import numpy as np
import librosa
from mutagen.mp4 import MP4
from mutagen import File as MutagenFile
# --------------------------------------------------------------------------- #
# Core routines #
# --------------------------------------------------------------------------- #
def compute_tempo(
audio_file_path: str,
sr_target: int | None = None
) -> Tuple[float, List[float]]:
"""
Return (global_bpm, tempo_curve).
"""
y, sr = librosa.load(audio_file_path, sr=sr_target, mono=True)
# Global BPM via beat tracking
global_bpm, _ = librosa.beat.beat_track(y=y, sr=sr)
# Instantaneous tempo curve
tempo_curve = librosa.beat.tempo(y=y, sr=sr, aggregate=None)
return float(global_bpm), tempo_curve.tolist()
def read_tagged_tempo(audio_file_path: str) -> Optional[float]:
"""
Return embedded BPM tag (if any) or None.
"""
ext = os.path.splitext(audio_file_path)[1].lower()
try:
if ext in {".m4a", ".mp4"}:
audio = MP4(audio_file_path)
if "tmpo" in audio.tags:
return float(audio.tags["tmpo"][0])
elif ext == ".mp3":
audio = MutagenFile(audio_file_path)
if audio and audio.tags:
if "bpm" in audio.tags:
return float(audio.tags["bpm"][0])
if "TBPM" in audio.tags:
return float(audio.tags["TBPM"].text[0])
except Exception:
pass
return None
# --------------------------------------------------------------------------- #
# Main driver #
# --------------------------------------------------------------------------- #
def main() -> None:
desktop_path = os.path.join(os.path.expanduser("~"), "Desktop")
media_folder = os.path.join(desktop_path, "media")
if not os.path.isdir(media_folder):
print(f"Folder not found: {media_folder}")
return
audio_exts = {".m4a", ".mp3", ".mp4"}
media_files = sorted(
f for f in os.listdir(media_folder)
if os.path.splitext(f)[1].lower() in audio_exts
)
if not media_files:
print(f"No supported audio files found in {media_folder}")
return
for filename in media_files:
file_path = os.path.join(media_folder, filename)
print(f"\nProcessing {filename} …")
try:
global_bpm, tempo_curve = compute_tempo(file_path)
tagged_tempo = read_tagged_tempo(file_path)
print(f"Estimated global BPM : {global_bpm:.2f}")
if tagged_tempo is not None:
print(f"Embedded tempo tag : {tagged_tempo:.2f} BPM")
else:
print("Embedded tempo tag : – (none)")
if tempo_curve:
arr = np.array(tempo_curve)
print(
"Instantaneous tempo stats:"
f" min {arr.min():.2f}"
f" | mean {arr.mean():.2f}"
f" | max {arr.max():.2f} BPM"
)
except Exception as exc:
print(f"Error processing {filename}: {exc}")
if __name__ == "__main__":
main()
| Component | v1.1 Behavior | v1.2 Upgrade |
|---|---|---|
| Target folder | ~/Desktop/m4a |
~/Desktop/media with mixed formats |
| Supported extensions | .m4a |
.m4a, .mp3, .mp4 |
read_tagged_tempo() |
iTunes tmpo only |
Adds ID3 TBPM / EasyID3 “bpm” for .mp3 |
| Error handling | Basic | Robust across multiple formats |
| Console output | Per-track stats for .m4a |
Same stats for all supported formats |
┌────────────────────────────┐
│ Start Script │
└────────────────────────────┘
│
▼
┌────────────────────────────┐
│ 1. Verify ~/Desktop/media │
└────────────────────────────┘
│
▼
┌────────────────────────────┐
│ 2. List .m4a/.mp3/.mp4 │
└────────────────────────────┘
│
┌────────┴─────────┐
│ Any files found? │
└────────┬─────────┘
Yes │ No
│
▼
┌──────────────────────────────────────────────┐
│ 3. For each file: │
│ • Estimate global BPM │
│ • Read embedded BPM tag (if any) │
│ • Compute tempo curve │
│ • Print results │
└──────────────────────────────────────────────┘
│
▼
┌────────────────────────────┐
│ End │
└────────────────────────────┘
source venv/bin/activate
python extract_meta_from_media.py
Processing track2.mp3 … Estimated global BPM : 124.37 Embedded tempo tag : 125.00 BPM Instantaneous tempo stats: min 123.90 | mean 124.25 | max 125.10 BPM
deactivate
Happy beat tracking!