npm install whisper-node

Node.js bindings for OpenAI's Whisper. Runs local on CPU.

About whisper-node

Whisper-node is an essential npm package that provides Node.js bindings for OpenAI’s revolutionary Whisper model, enabling developers to integrate advanced speech-to-text capabilities directly into their Node.js applications. This package is especially powerful as it allows the Whisper model to run locally on a CPU, ensuring that developers can maintain control over their data while benefiting from OpenAI's state-of-the-art machine learning model for audio processing. The ability to run locally is crucial for applications requiring data privacy or where constant internet connectivity is a challenge.

To get started with integrating whisper-node into your project, you can simply run the command `npm install whisper-node` in your Node.js environment. This command installs the package and all necessary dependencies, setting up your application to leverage Whisper's robust audio transcription capabilities. The installation process is streamlined and designed for ease of use, ensuring that even developers new to Node.js or machine learning can easily add sophisticated audio processing functionalities to their applications.

The benefits of using whisper-node in your projects are manifold. By incorporating this package, developers can create applications that convert spoken language into written text with remarkable accuracy, which is ideal for creating transcriptions, subtitles, or even enabling command-and-control through voice. Moreover, since whisper-node works with the CPU and does not require a GPU, it is accessible to a wider range of developers, including those who may not have access to high-end hardware. This opens up opportunities for a variety of applications, from educational software and accessibility tools to advanced voice-operated systems, all while ensuring user privacy and data security.

ariym npm packages

Find the best node modules for your project.

Search npm

whisper-node

Node.js bindings for OpenAI's Whisper. Runs local on CPU...

Dependencies

Core dependencies of this npm package and its dev dependencies.

readline-sync, shelljs, @types/node, nodemon, ts-node, typescript

Documentation

A README file for the whisper-node code repository. View Code

whisper-node

Node.js bindings for OpenAI's Whisper. Transcription done local.

Features

Output transcripts to JSON (also .txt .srt .vtt)
Optimized for CPU (Including Apple Silicon ARM)
Timestamp precision to single word

Installation

Add dependency to project

npm install whisper-node

Download whisper model of choice [OPTIONAL]

npx whisper-node download

Requirement for Windows: Install the make command from here.

Usage

import whisper from 'whisper-node';

const transcript = await whisper("example/sample.wav");

console.log(transcript); // output: [ {start,end,speech} ]

Output (JSON)

[
  {
    "start":  "00:00:14.310", // time stamp begin
    "end":    "00:00:16.480", // time stamp end
    "speech": "howdy"         // transcription
  }
]

Full Options List

import whisper from 'whisper-node';

const filePath = "example/sample.wav"; // required

const options = {
  modelName: "base.en",       // default
  // modelPath: "/custom/path/to/model.bin", // use model in a custom directory (cannot use along with 'modelName')
  whisperOptions: {
    language: 'auto'          // default (use 'auto' for auto detect)
    gen_file_txt: false,      // outputs .txt file
    gen_file_subtitle: false, // outputs .srt file
    gen_file_vtt: false,      // outputs .vtt file
    word_timestamps: true     // timestamp for every word
    // timestamp_size: 0      // cannot use along with word_timestamps:true
  }
}

const transcript = await whisper(filePath, options);

Input File Format

Files must be .wav and 16Hz

Example .mp3 file converted with an FFmpeg command: ffmpeg -i input.mp3 -ar 16000 output.wav

Made with

Roadmap

Support projects not using Typescript
Allow custom directory for storing models
Config files as alternative to model download cli
Remove path, shelljs and prompt-sync package for browser, react-native expo, and webassembly compatibility
fluent-ffmpeg to automatically convert to 16Hz .wav files as well as support separating audio from video
Pyanote diarization for speaker names
Implement WhisperX as optional alternative model for diarization and higher precision timestamps (as alternative to C++ version)
Add option for viewing detected langauge as described in Issue 16
Include typescript typescript types in d.ts file
Add support for language option
Add support for transcribing audio streams as already implemented in whisper.cpp

Modifying whisper-node

npm run dev - runs nodemon and tsc on '/src/test.ts'

npm run build - runs tsc, outputs to '/dist' and gives sh permission to 'dist/download.js'

About whisper-node

ariym npm packages

whisper-node

Dependencies

Documentation

whisper-node

Features

Installation

Usage

Output (JSON)

Full Options List

Input File Format

Made with

Roadmap

Modifying whisper-node

Acknowledgements