Data Scientist Guide

This guide is for you if you analyze data, build Jupyter notebooks, create dashboards, define data specifications, or develop analytics pipelines. Data scientists have focused tooling with 13 addressable assets spanning data exploration, visualization, and pipeline development.

Recommended Collections

TIP

Install the HVE Core extension from the VS Code Marketplace for the flagship RPI workflow and core artifacts with zero configuration.

Your primary collections are data-science (notebook generation, dashboard creation, and data specification tools) and hve-core (research and planning workflows for larger analytics projects). For clone-based setups, see the Installation Guide.

What HVE Core Does for You

Generates Jupyter notebooks with proper structure, documentation cells, and reproducible analysis patterns
Creates Streamlit dashboards from data specifications or requirements
Builds and validates data specification documents defining schemas, sources, and transformations
Tests generated dashboards for functional correctness
Supports research and planning workflows for complex analytics pipelines
Manages Python virtual environments with uv for reproducible workflows

Your Lifecycle Stages

NOTE

Data scientists primarily operate in these lifecycle stages:

Stage 2: Discovery: Research data sources, explore datasets, investigate patterns Stage 3: Product Definition: Define data schemas, sources, and transformation requirements Stage 6: Implementation: Build notebooks, create dashboards, develop pipelines Stage 7: Review: Validate analysis, review data quality, test dashboards Stage 8: Delivery: Package notebooks, dashboards, and documentation for stakeholders

Stage Walkthrough

Stage 2: Discovery. Use the task-researcher agent to investigate data sources, explore available datasets, and research analytical approaches.
Stage 3: Product Definition. Run the gen-data-spec agent to define data schemas, sources, and transformation requirements as structured specification documents.
Stage 6: Notebook Development. Generate analysis notebooks with the gen-jupyter-notebook agent and create dashboards with the gen-streamlit-dashboard agent.
Stage 7: Validation. Test generated dashboards with the test-streamlit-dashboard agent and review analysis results for accuracy and completeness.
Stage 8: Delivery. Package notebooks, dashboards, and documentation for sharing with stakeholders and engineering teams.

Starter Prompts

Select gen-jupyter-notebook agent:

Create a data analysis notebook for the Q4 sales transactions dataset in
data/sales-q4-2025.parquet. Include data quality assessment, revenue trend
analysis by product category and region, and customer cohort segmentation
using RFM scoring with matplotlib visualizations.

Select gen-data-spec agent:

Define a data specification for the customer event ingestion pipeline.
Source is a Kafka topic with Avro encoding, target is a Delta Lake table.
Include timestamp normalization, PII hashing transformations, quality
rules for null checks, and partitioning by event_date and event_type.

Select gen-streamlit-dashboard agent:

Build a dashboard for API latency and error rate metrics from the
Prometheus endpoint at /metrics. Include P50/P95/P99 latency percentiles,
error rate breakdown by endpoint (5xx vs 4xx), and a 30-day daily active
users trend. Set refresh interval to 5 minutes.

Select test-streamlit-dashboard agent:

Validate the dashboard at dashboards/api-performance.json. Check that all
queries return data for the last 7 days, panels render without errors, and
the refresh rate does not exceed Prometheus scrape intervals.

Select task-researcher agent:

Research data sources for predicting customer churn in the SaaS platform.
Identify internal sources like usage telemetry and billing history,
external benchmark datasets, data freshness requirements for daily
granularity, and GDPR privacy constraints for EU customer data.

Key Agents and Workflows

Agent	Purpose	Docs
gen-jupyter-notebook	Jupyter notebook generation	Agent file
gen-streamlit-dashboard	Streamlit dashboard creation	Agent file
gen-data-spec	Data specification document creation	Agent file
test-streamlit-dashboard	Dashboard functional testing	Agent file
task-researcher	Data source and pattern research	Task Researcher
task-planner	Analytics pipeline planning	Task Planner
memory	Session context and preference persistence	Agent file

Prompts complement the agents for cross-cutting workflows:

Prompt	Purpose	Invoke
git-commit	Stage and commit changes with conventional message formatting	`/git-commit`
pull-request	Create a pull request with structured description	`/pull-request`

Python environment management follows the uv virtual environment instructions for reproducible analysis environments.

Tips

Do	Don't
Start with the gen-data-spec agent to define schemas before coding	Jump straight to notebook coding without data specifications
Use the gen-jupyter-notebook agent for structured, documented notebooks	Create raw notebooks without documentation cells
Test dashboards with the test-streamlit-dashboard agent	Deploy dashboards without functional validation
Research data sources with the task-researcher agent first	Assume data availability without investigation
Use `uv` for reproducible Python environments	Install packages globally or skip environment isolation

Data Scientist + Engineer: Analytics pipelines bridge data exploration with production integration. Engineers implement production-grade versions of prototype analyses. See the Engineer Guide.
Data Scientist + TPM: Data requirements feed into product specifications. Analytics capabilities shape feature definitions. See the TPM Guide.

Next Steps

TIP

Explore the data science collection: Data Science Collection Set up your Python environment: uv Projects See how analytics fits the project lifecycle: AI-Assisted Project Lifecycle

🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.

Recommended Collections​

What HVE Core Does for You​

Your Lifecycle Stages​

Stage Walkthrough​

Starter Prompts​

Key Agents and Workflows​

Tips​

Related Roles​

Next Steps​