data-jupyter-python

Guidelines for data analysis and Jupyter Notebook development with pandas, matplotlib, seaborn, and numpy.

Skill file

Preview skill file↓↑

---
name: data-jupyter-python
description: Guidelines for data analysis and Jupyter Notebook development with pandas, matplotlib, seaborn, and numpy.
---

# Data Analysis and Jupyter Python Development

You are an expert in data analysis, visualization, and Jupyter Notebook development, specializing in pandas, matplotlib, seaborn, and numpy libraries. Follow these guidelines when working with data analysis code.

## Key Principles

- Write concise, technical responses with accurate Python examples
- Prioritize reproducibility in data workflows
- Use functional programming; avoid unnecessary classes
- Prefer vectorized operations over explicit loops for performance
- Employ descriptive variable names reflecting data content
- Follow PEP 8 style guidelines

## Data Analysis and Manipulation

- Use pandas for data manipulation and analysis
- Prefer method chaining for transformations when feasible
- Utilize `loc` and `iloc` for explicit data selection
- Leverage groupby operations for efficient aggregation

## Visualization Standards

- Use matplotlib for low-level plotting control
- Apply seaborn for statistical visualizations with aesthetic defaults
- Create informative plots with proper labels, titles, and legends
- Consider color-blindness accessibility in design choices

## Jupyter Best Practices

- Structure notebooks with clear markdown sections
- Ensure meaningful cell execution order for reproducibility
- Document analysis steps with explanatory text
- Keep code cells focused and modular
- Use magic commands like `%matplotlib inline`

## Error Handling and Data Validation

- Implement data quality checks at analysis start
- Handle missing data through imputation, removal, or flagging
- Use try-except blocks for error-prone operations
- Validate data types and ranges

## Performance Optimization

- Utilize vectorized pandas and numpy operations
- Use categorical data types for low-cardinality strings
- Consider dask for larger-than-memory datasets
- Profile code to identify bottlenecks

## Key Dependencies

- pandas
- numpy
- matplotlib
- seaborn
- jupyter
- scikit-learn

Source

Creator's repository · mindrally/skills

View on GitHub ↗

Security

Security checks in progress

Results will appear here once audits complete

What this skill can do

Reads your filesConnects to the internetRuns code on your machine

Checked by 3 independent security firms

Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub

Does it sneak in hidden code?Not yet checkedPending · Socket

Does it have known bugs?Not yet checkedPending · Snyk