birdnet_tiny_forge.features.microspeech package

class birdnet_tiny_forge.features.microspeech.MicroSpeechExtractor(params)

Bases: FeatureExtractorBase

run(sample_rate, audio_slice)

Run feature extraction on raw audio data and return features.

Submodules

birdnet_tiny_forge.features.microspeech.extractor module

MicroSpeechExtractor uses tflite’s microspeech example feature extraction chain

class birdnet_tiny_forge.features.microspeech.extractor.MicroSpeechExtractor(params)

Bases: FeatureExtractorBase

run(sample_rate, audio_slice)

Run feature extraction on raw audio data and return features.

birdnet_tiny_forge.features.microspeech.tflite_micro_frontend module

class birdnet_tiny_forge.features.microspeech.tflite_micro_frontend.AudioPreprocessor(params: FeatureParams, detail: str = 'unknown')

Bases: object

Audio Preprocessor

Args:

params: FeatureParams, an immutable object supplying parameters for the AudioPreprocessor instance detail: str, used for debug output (optional, for debugging only)

generate_feature_using_tflm(audio_frame: tensorflow.Tensor) tensorflow.Tensor

Generate a single feature for a single audio frame. Uses TensorFlow graph execution and the TensorFlow model converter to generate a TFLM compatible model. This model is then used by the TFLM MicroInterpreter to execute a single inference operation.

Args:

audio_frame: tf.Tensor, a single audio frame (self.params.window_size_ms) with shape (1, audio_samples_count)

Returns:

tf.Tensor, a tensor containing a single audio feature with shape (self.params.filter_bank_number_of_channels,)

generate_tflite_file() Path

Create a .tflite model file

The model output tensor type will depend on the ‘FeatureParams.use_float_output’ parameter.

Returns:

Path object for the created model file

reset_tflm()

Reset TFLM interpreter state

Re-initializes TFLM interpreter state and the internal state of all TFLM kernel operators. Useful for resetting Signal library operator noise estimation and other internal state.

class birdnet_tiny_forge.features.microspeech.tflite_micro_frontend.FeatureParams(*, sample_rate: int = 16000, window_size_ms: int = 30, window_stride_ms: int = 20, window_scaling_bits: int = 12, filter_bank_number_of_channels: int = 40, filter_bank_lower_band_limit_hz: float = 125.0, filter_bank_upper_band_limit_hz: float = 7500.0, filter_bank_scaling_bits: int = tflite_micro.python.tflite_micro.signal.ops.filter_bank_ops.FILTER_BANK_WEIGHT_SCALING_BITS, filter_bank_alignment: int = 4, filter_bank_channel_block_size: int = 4, filter_bank_post_scaling_bits: int = 6, filter_bank_spectral_subtraction_bits: int = 14, filter_bank_smoothing_bits: int = 10, filter_bank_even_smoothing: float = 0.025, filter_bank_odd_smoothing: float = 0.06, filter_bank_min_signal_remaining: float = 0.05, filter_bank_clamping: bool = False, pcan_strength: float = 0.95, pcan_offset: float = 80.0, pcan_gain_bits: int = 21, pcan_smoothing_bits: int = 10, legacy_output_scaling: float = 25.6, use_float_output: bool = False)

Bases: BaseModel

Feature generator parameters

Defaults are configured to work with the micro_speech_quantized.tflite model

filter_bank_alignment: int

filter bank alignment, updates filter bank constant

filter_bank_channel_block_size: int

filter bank channel block size, updates filter bank constant

filter_bank_clamping: bool

filter bank noise reduction clamping

filter_bank_even_smoothing: float

filter bank noise reduction even smoothing

filter_bank_lower_band_limit_hz: float

filter bank lower band limit

filter_bank_min_signal_remaining: float

filter bank noise reduction minimum signal remaining

filter_bank_number_of_channels: int

filter bank channel count

filter_bank_odd_smoothing: float

filter bank noise reduction odd smoothing

filter_bank_post_scaling_bits: int

filter bank output log-scaling bits

filter_bank_scaling_bits: int

filter bank weight scaling bits, updates filter bank constant

filter_bank_smoothing_bits: int

filter bank noise reduction smoothing bits

filter_bank_spectral_subtraction_bits: int

filter bank noise reduction spectral subtration bits

filter_bank_upper_band_limit_hz: float

filter bank upper band limit

legacy_output_scaling: float

Final output scaling, legacy from training

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pcan_gain_bits: int

PCAN gain control bits

pcan_offset: float

PCAN gain control offset

pcan_smoothing_bits: int

PCAN gain control smoothing bits

pcan_strength: float

PCAN gain control strength

sample_rate: int

audio sample rate

use_float_output: bool

Use float output if True, otherwise int8 output

window_scaling_bits: int

input window shaping: scaling bits

window_size_ms: int

input window size in milliseconds

window_stride_ms: int

input window stride in milliseconds