birdnet_tiny_forge.pipelines.data_preprocessing package¶

Submodules¶

birdnet_tiny_forge.pipelines.data_preprocessing.nodes module¶

Nodes to be used in data preprocessing pipeline

birdnet_tiny_forge.pipelines.data_preprocessing.nodes.decide_splits(df, test_size=0.2, val_size=0.1, random_state=42)¶

Given dataset metadata populate the split information.

Parameters:

df – metadata dataframe
test_size – fraction of dataset used for testing
val_size – fraction of dataset used for validation

Returns:

metadata dataframe containing split info

birdnet_tiny_forge.pipelines.data_preprocessing.nodes.extract_loudest_slice(audio_clips, audio_slice_duration_ms)¶

This node uses Kedro’s lazy loading idiom of passing a dictionary of callables which load the actual data. For each callable, it extracts a slice of audio_slice_duration_ms containing the max of the recording. It doesn’t perform the operation straight away, but it creates a new callable (so we stay lazy and the processing is only done one exactly when it’s time to save the data).

Parameters:

audio_clips – a dictionary of callables returning audio, sample rate.
audio_slice_duration_ms –

Returns:

dictionary of callables returning sliced audio

birdnet_tiny_forge.pipelines.data_preprocessing.nodes.extract_metadata(audio_slices)¶

Return a pandas dataframe of metadata for each audio slice. This includes its original path, and a label inferred from its path.

Parameters:: audio_slices – a dictionary of callables returning sliced audio
Returns:: dictionary of callables returning sliced audio

birdnet_tiny_forge.pipelines.data_preprocessing.nodes.plot_slices_sample(audio_slices, n_slices)¶

Plot a few slices of data as a plotly figure

Parameters:

audio_slices – a dictionary of callables returning sliced audio
n_slices – number of slices to plot

Returns:

plotly figure

birdnet_tiny_forge.pipelines.data_preprocessing.nodes.plot_splits_info(audio_slices_metadata: DataFrame)¶

Plot split counts broken down by clip class, into a plotly figure

Parameters:: audio_slices_metadata – metadata for the audio slices dataset
Returns:: plotly figure

birdnet_tiny_forge.pipelines.data_preprocessing.nodes.save_labels_dict(audio_slices_metadata: DataFrame)¶: Create dictionary mapping audio labels to sequential integers

birdnet_tiny_forge.pipelines.data_preprocessing.nodes.slices_filter_short(audio_slices, audio_slice_duration_ms)¶

Filter out slices that are smaller than audio_slice_duration_ms (and filter out files that can’t be opened)

Parameters:

audio_slices – a dictionary of callables returning sliced audio
audio_slice_duration_ms –

birdnet_tiny_forge.pipelines.data_preprocessing.nodes.slices_make_canonical(audio_slices, sample_rate, subtype)¶

Make sure all slices have the same sample rate, number of channels, etc. This fn uses the kedro idiom of taking a dictionary of callables and returning a dictionary of callables to allow for lazy processing of data.

Parameters:

audio_slices – a dictionary of callables returning sliced audio
sample_rate –
subtype – see soundfile’s documentation for details

Returns:

dictionary of callables returning canonical-ized sliced audio

birdnet_tiny_forge.pipelines.data_preprocessing.pipeline module¶

This pipeline performs pre-processing on audio data, as well as deciding early which audio data belongs to which train/test/validation split

birdnet_tiny_forge.pipelines.data_preprocessing.pipeline.create_pipeline(**kwargs) → Pipeline¶

birdnet_tiny_forge.pipelines.data_preprocessing package¶

Submodules¶

birdnet_tiny_forge.pipelines.data_preprocessing.nodes module¶

birdnet_tiny_forge.pipelines.data_preprocessing.pipeline module¶

BirdNET-Tiny Forge

Navigation

Related Topics