Data Studio

Comprehensive Data Management and Visualization

Powerful ML DataOps for Sensors

Data Management: Covering acquisition, validation, cleaning, transformation, and storage

Workflow Automation: Reducing manual errors and increasing efficiency

Collaboration: Allowing data scientists, engineers, and field technicians to work together to enhance dataset quality and scale

Versioning: Managing data set, labeling, and model changes for reproducibility

Data Studio Key Features

Label and Visualize Data
Label your sensor data with the aid of video synchronization. Multiple built-in tools to merge, modify, and auto-add labels enable you to quickly create a high-quality data set suitable for building production ML models. Many graphs are available, such as line plots, heatmaps, spectrograms, scatter plots, and much more.
Evaluate ML Models
Utilize Data Studio’s capabilities to overlay ML model results on project data to quickly evaluate and compare inference model performance:
  • Quickly compare labels from different sources such as different labelers or models
  • Visualize differences side by side, at a high level, or in a confusion matrix
  • Compare results of one or more models in a single file, multiple files, or see summary statistics for the accuracy, f1-score for the entire project
Live Capture and Inference
Data Studio provides real-time connectivity and visualization features to streamline the effort necessary to produce high quality datasets:
  • Stream data to the Data Studio using a simple streaming protocol over USB serial, BLE, or Wi-Fi
  • Capture the sensor data while running your machine learning model and display data and results in real-time
  • Simultaneously record video that is automatically synchronized to your time series data
  • Capture contextual metadata for each file to build out your dataset
Collaboratively Manage Datasets
SensiML Data Studio is the ideal platform for creating and curating time-series sensor data sets:
  • Collaborate with multiple users to label and collect sensor data
  • Manage contextual file metadata
  • Label regions of data for ML train/test ground truth
  • Compare waveforms and labels across multiple files
  • Import and export data, labels, data quickly and easily
  • Extend analysis capabilities using Python plug-ins
Customize and Extend
With built-in Python integration, you can tailor Data Studio to meet your specific needs:
  • Quickly see the results of filters or analysis
  • Run your own machine learning models
  • Devise your own custom transforms and segmentation algorithms
  • Visualize data easier than using Jupyter Notebook plots
  • Import any Python function to add custom functionality
Data Source Plugins
With plugins for AWS S3, Google Cloud, and Microsoft One Drive, it is easy to start working with your time series data in Data Studio. You can even import model APIs from services such as SageMaker to quickly test and version model results.

Companies often don't have the right data, and get frustrated when they can't build models with data that isn't labeled. That's where companies consistently fail.

ANAND RAO, PARTNER AND GLOBAL AI LEADER AT PRICEWATERHOUSECOOPERS
The Ideal Front-end to a Complete IoT Edge ML Workflow



SensiML Data Studio was built with an iterative workflow for incremental and efficient performance improvement in mind. It works seamlessly with SensiML Analytics Studio to create accurate models and generate optimized firmware for your desired embedded target platform.

Whether your workflow includes the complete SensiML Toolkit, or third party model development tools and AI frameworks is up to you. Data Studio readily supports customization and data import/export to allow it to be used either as part of our integrated tool suite or as a standalone ML Data Ops tool.
Embedded Sensor DataOps Software To Avoid GIGO
GIGO (or Garbage In / Garbage Out) describes the all-too-common result of feeding poor quality train/test datasets into AI/ML and data analytics tools... poor performing models.

Machine learning is nothing but a statistical algorithm trained to produce results consistent with the training dataset itself. Flaws in that dataset lead to reduced accuracy in the desired classifier and regression models.

The following are common dataset quality issues that inevitably degrade ML model performance:
  • Imbalanced Data: Overrepresentation of a subset of classes inserts bias towards those classes
  • Insufficient Data: Leads to overfitting and models that do not generalize
  • Poor Quality Data: Data with errors, outliers, or noise can mislead the model
  • Lack of Feature Diversity: Represented features fail to capture problem complexity
  • Incorrect Data Splitting: The train/test split doesn't reflect real-world data distribution
  • Non-Representative Data: Data does not represent the full spectrum of real-world variability
  • Missing Values: Gaps in data and bad handling of missing values degrades model performance

SensiML Data Studio provides the features and workflows needed to avoid dataset quality issues that can lead to poor performing models.

Data Studio Documentation
To learn more about Data Studio and its key features, visit the links below: