birdnet_tiny_forge.features.microspeech package¶
- class birdnet_tiny_forge.features.microspeech.MicroSpeechExtractor(params)¶
Bases:
FeatureExtractorBase- run(sample_rate, audio_slice)¶
Run feature extraction on raw audio data and return features.
Submodules¶
birdnet_tiny_forge.features.microspeech.extractor module¶
MicroSpeechExtractor uses tflite’s microspeech example feature extraction chain
- class birdnet_tiny_forge.features.microspeech.extractor.MicroSpeechExtractor(params)¶
Bases:
FeatureExtractorBase- run(sample_rate, audio_slice)¶
Run feature extraction on raw audio data and return features.
birdnet_tiny_forge.features.microspeech.tflite_micro_frontend module¶
- class birdnet_tiny_forge.features.microspeech.tflite_micro_frontend.AudioPreprocessor(params: FeatureParams, detail: str = 'unknown')¶
Bases:
objectAudio Preprocessor
- Args:
params: FeatureParams, an immutable object supplying parameters for the AudioPreprocessor instance detail: str, used for debug output (optional, for debugging only)
- generate_feature_using_tflm(audio_frame: tensorflow.Tensor) tensorflow.Tensor¶
Generate a single feature for a single audio frame. Uses TensorFlow graph execution and the TensorFlow model converter to generate a TFLM compatible model. This model is then used by the TFLM MicroInterpreter to execute a single inference operation.
- Args:
audio_frame: tf.Tensor, a single audio frame (self.params.window_size_ms) with shape (1, audio_samples_count)
- Returns:
tf.Tensor, a tensor containing a single audio feature with shape (self.params.filter_bank_number_of_channels,)
- generate_tflite_file() Path¶
Create a .tflite model file
The model output tensor type will depend on the ‘FeatureParams.use_float_output’ parameter.
- Returns:
Path object for the created model file
- reset_tflm()¶
Reset TFLM interpreter state
Re-initializes TFLM interpreter state and the internal state of all TFLM kernel operators. Useful for resetting Signal library operator noise estimation and other internal state.
- class birdnet_tiny_forge.features.microspeech.tflite_micro_frontend.FeatureParams(*, sample_rate: int = 16000, window_size_ms: int = 30, window_stride_ms: int = 20, window_scaling_bits: int = 12, filter_bank_number_of_channels: int = 40, filter_bank_lower_band_limit_hz: float = 125.0, filter_bank_upper_band_limit_hz: float = 7500.0, filter_bank_scaling_bits: int = tflite_micro.python.tflite_micro.signal.ops.filter_bank_ops.FILTER_BANK_WEIGHT_SCALING_BITS, filter_bank_alignment: int = 4, filter_bank_channel_block_size: int = 4, filter_bank_post_scaling_bits: int = 6, filter_bank_spectral_subtraction_bits: int = 14, filter_bank_smoothing_bits: int = 10, filter_bank_even_smoothing: float = 0.025, filter_bank_odd_smoothing: float = 0.06, filter_bank_min_signal_remaining: float = 0.05, filter_bank_clamping: bool = False, pcan_strength: float = 0.95, pcan_offset: float = 80.0, pcan_gain_bits: int = 21, pcan_smoothing_bits: int = 10, legacy_output_scaling: float = 25.6, use_float_output: bool = False)¶
Bases:
BaseModelFeature generator parameters
Defaults are configured to work with the micro_speech_quantized.tflite model
- filter_bank_alignment: int¶
filter bank alignment, updates filter bank constant
- filter_bank_channel_block_size: int¶
filter bank channel block size, updates filter bank constant
- filter_bank_clamping: bool¶
filter bank noise reduction clamping
- filter_bank_even_smoothing: float¶
filter bank noise reduction even smoothing
- filter_bank_lower_band_limit_hz: float¶
filter bank lower band limit
- filter_bank_min_signal_remaining: float¶
filter bank noise reduction minimum signal remaining
- filter_bank_number_of_channels: int¶
filter bank channel count
- filter_bank_odd_smoothing: float¶
filter bank noise reduction odd smoothing
- filter_bank_post_scaling_bits: int¶
filter bank output log-scaling bits
- filter_bank_scaling_bits: int¶
filter bank weight scaling bits, updates filter bank constant
- filter_bank_smoothing_bits: int¶
filter bank noise reduction smoothing bits
- filter_bank_spectral_subtraction_bits: int¶
filter bank noise reduction spectral subtration bits
- filter_bank_upper_band_limit_hz: float¶
filter bank upper band limit
- legacy_output_scaling: float¶
Final output scaling, legacy from training
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- pcan_gain_bits: int¶
PCAN gain control bits
- pcan_offset: float¶
PCAN gain control offset
- pcan_smoothing_bits: int¶
PCAN gain control smoothing bits
- pcan_strength: float¶
PCAN gain control strength
- sample_rate: int¶
audio sample rate
- use_float_output: bool¶
Use float output if True, otherwise int8 output
- window_scaling_bits: int¶
input window shaping: scaling bits
- window_size_ms: int¶
input window size in milliseconds
- window_stride_ms: int¶
input window stride in milliseconds