employee-performance-analytics-hr

SQL and Python-based employee performance analytics with KPI aggregation, departmental insights, and HR dashboard generation
Skill file

Preview skill file↓↑
---
name: employee-performance-analytics-hr
description: SQL and Python-based employee performance analytics with KPI aggregation, departmental insights, and HR dashboard generation
triggers:
  - analyze employee performance metrics
  - create hr analytics dashboard
  - calculate departmental kpis
  - visualize productivity trends
  - generate employee performance reports
  - build hr data pipeline
  - assess workforce efficiency metrics
  - aggregate employee performance data
---

# Employee Performance Analytics HR Skill

> Skill by [ara.so](https://ara.so) — Data Skills collection.

## Overview

Employee Performance Analytics is a Python and SQL-based HR analytics tool that transforms employee data into actionable insights. It uses SQLite for KPI aggregation and pandas/matplotlib for visualization, generating departmental performance reports, efficiency metrics, and workload analysis.

The project provides an end-to-end analytics pipeline: data loading → SQL feature engineering → Python analysis → visualization exports.

## Installation

```bash
# Clone the repository
git clone https://github.com/AmirhosseinHonardoust/Employee-Performance-Analytics.git
cd Employee-Performance-Analytics

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

### Dependencies

```txt
pandas>=2.0.0
matplotlib>=3.7.0
seaborn>=0.12.0
sqlite3  # Built-in with Python
numpy>=1.24.0
```

## Project Structure

```
employee-performance-analytics/
├── data/
│   └── employees.csv          # Raw employee data
├── src/
│   ├── create_db.py          # CSV to SQLite loader
│   ├── queries.sql           # SQL KPI queries
│   ├── analyze_performance.py # Main analysis script
│   └── utils.py              # Helper functions
└── outputs/
    ├── department_kpis.csv
    ├── performance_summary.csv
    └── charts/               # Generated visualizations
```

## Data Schema

The project expects employee data with these columns:

| Column | Type | Description |
|--------|------|-------------|
| `employee_id` | int | Unique identifier |
| `name` | string | Employee name |
| `department` | string | Department (Engineering, Sales, etc.) |
| `role` | string | Job title |
| `date` | date | Record date (YYYY-MM-DD) |
| `tasks_completed` | int | Daily tasks completed |
| `hours_worked` | float | Hours worked |
| `rating` | float | Performance rating (1-5) |
| `projects` | int | Active projects |
| `absences` | int | 1 if absent, 0 otherwise |

## Key Commands

### 1. Load Data into SQLite

```bash
python src/create_db.py --csv data/employees.csv --db hr.db
```

**Options:**
- `--csv`: Path to input CSV file
- `--db`: Output SQLite database path
- `--table`: Table name (default: `employees`)

### 2. Run Performance Analysis

```bash
python src/analyze_performance.py --db hr.db --sql src/queries.sql --outdir outputs
```

**Options:**
- `--db`: Path to SQLite database
- `--sql`: Path to SQL queries file
- `--outdir`: Output directory for CSV reports and charts

## Core SQL Queries

The `queries.sql` file contains three main analytical views:

### Department KPIs

```sql
-- Department-level performance metrics
CREATE VIEW IF NOT EXISTS department_kpis AS
SELECT
    department,
    COUNT(DISTINCT employee_id) AS employee_count,
    AVG(rating) AS avg_rating,
    SUM(tasks_completed) AS total_tasks,
    SUM(hours_worked) AS total_hours,
    ROUND(AVG(CAST(absences AS FLOAT)), 2) AS absence_rate,
    ROUND(SUM(tasks_completed) * 1.0 / SUM(hours_worked), 2) AS tasks_per_hour
FROM employees
GROUP BY department
ORDER BY avg_rating DESC;
```

### Employee Summary

```sql
-- Individual employee performance aggregation
CREATE VIEW IF NOT EXISTS employee_summary AS
SELECT
    employee_id,
    name,
    department,
    role,
    SUM(tasks_completed) AS total_tasks,
    SUM(hours_worked) AS total_hours,
    AVG(rating) AS avg_rating,
    COUNT(DISTINCT projects) AS project_count,
    SUM(absences) AS total_absences,
    ROUND(SUM(tasks_completed) * 1.0 / SUM(hours_worked), 2) AS efficiency
FROM employees
GROUP BY employee_id, name, department, role
ORDER BY efficiency DESC;
```

### Daily Productivity

```sql
-- Day-by-day productivity tracking
CREATE VIEW IF NOT EXISTS daily_productivity AS
SELECT
    date,
    department,
    SUM(tasks_completed) AS daily_tasks,
    SUM(hours_worked) AS daily_hours,
    AVG(rating) AS daily_rating
FROM employees
GROUP BY date, department
ORDER BY date, department;
```

## Python API Usage

### Creating Database from CSV

```python
import pandas as pd
import sqlite3

def create_database(csv_path, db_path, table_name='employees'):
    """Load CSV into SQLite database."""
    df = pd.read_csv(csv_path)
    
    # Data validation
    required_cols = ['employee_id', 'name', 'department', 'date', 
                     'tasks_completed', 'hours_worked', 'rating']
    assert all(col in df.columns for col in required_cols), "Missing required columns"
    
    # Create database
    conn = sqlite3.connect(db_path)
    df.to_sql(table_name, conn, if_exists='replace', index=False)
    conn.close()
    print(f"✓ Database created: {db_path}")

# Usage
create_database('data/employees.csv', 'hr.db')
```

### Running SQL Queries

```python
import sqlite3
import pandas as pd

def execute_sql_file(db_path, sql_file_path):
    """Execute SQL script and return results."""
    conn = sqlite3.connect(db_path)
    
    with open(sql_file_path, 'r') as f:
        sql_script = f.read()
    
    # Execute all statements
    cursor = conn.cursor()
    cursor.executescript(sql_script)
    conn.commit()
    
    # Fetch view results
    dept_kpis = pd.read_sql_query("SELECT * FROM department_kpis", conn)
    emp_summary = pd.read_sql_query("SELECT * FROM employee_summary", conn)
    daily_prod = pd.read_sql_query("SELECT * FROM daily_productivity", conn)
    
    conn.close()
    
    return dept_kpis, emp_summary, daily_prod
```

### Generating Visualizations

```python
import matplotlib.pyplot as plt
import seaborn as sns

def plot_department_ratings(dept_kpis, output_path):
    """Bar chart of average rating by department."""
    plt.figure(figsize=(12, 7))
    
    sns.barplot(
        data=dept_kpis,
        x='department',
        y='avg_rating',
        palette='viridis'
    )
    
    plt.title('Average Performance Rating by Department', fontsize=16, weight='bold')
    plt.xlabel('Department', fontsize=12)
    plt.ylabel('Average Rating', fontsize=12)
    plt.xticks(rotation=45, ha='right')
    plt.ylim(0, 5)
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close()

def plot_performance_vs_hours(emp_summary, output_path):
    """Scatter plot of tasks vs hours worked."""
    plt.figure(figsize=(12, 7))
    
    scatter = plt.scatter(
        emp_summary['total_hours'],
        emp_summary['total_tasks'],
        c=emp_summary['avg_rating'],
        cmap='RdYlGn',
        s=100,
        alpha=0.6,
        edgecolors='black'
    )
    
    plt.colorbar(scatter, label='Avg Rating')
    plt.title('Tasks Completed vs Hours Worked', fontsize=16, weight='bold')
    plt.xlabel('Total Hours Worked', fontsize=12)
    plt.ylabel('Total Tasks Completed', fontsize=12)
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close()

def plot_efficiency_distribution(emp_summary, output_path):
    """Histogram of task completion rate."""
    plt.figure(figsize=(12, 7))
    
    plt.hist(
        emp_summary['efficiency'].dropna(),
        bins=30,
        color='steelblue',
        edgecolor='black',
        alpha=0.7
    )
    
    plt.axvline(
        emp_summary['efficiency'].median(),
        color='red',
        linestyle='--',
        linewidth=2,
        label=f"Median: {emp_summary['efficiency'].median():.2f}"
    )
    
    plt.title('Task Completion Rate Distribution', fontsize=16, weight='bold')
    plt.xlabel('Tasks per Hour', fontsize=12)
    plt.ylabel('Number of Employees', fontsize=12)
    plt.legend()
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close()
```

## Complete Analysis Pipeline

```python
import os
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

class HRAnalyzer:
    """Complete HR analytics pipeline."""
    
    def __init__(self, db_path, sql_path, output_dir):
        self.db_path = db_path
        self.sql_path = sql_path
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(parents=True, exist_ok=True)
        (self.output_dir / 'charts').mkdir(exist_ok=True)
        
    def load_data(self):
        """Execute SQL and load analytical views."""
        conn = sqlite3.connect(self.db_path)
        
        # Execute SQL script
        with open(self.sql_path, 'r') as f:
            conn.executescript(f.read())
        
        # Load views
        self.dept_kpis = pd.read_sql_query("SELECT * FROM department_kpis", conn)
        self.emp_summary = pd.read_sql_query("SELECT * FROM employee_summary", conn)
        self.daily_prod = pd.read_sql_query("SELECT * FROM daily_productivity", conn)
        
        conn.close()
        print("✓ Data loaded from SQL views")
        
    def export_reports(self):
        """Save CSV reports."""
        self.dept_kpis.to_csv(
            self.output_dir / 'department_kpis.csv',
            index=False
        )
        self.emp_summary.to_csv(
            self.output_dir / 'performance_summary.csv',
            index=False
        )
        print(f"✓ Reports saved to {self.output_dir}")
        
    def generate_visualizations(self):
        """Create all performance charts."""
        charts_dir = self.output_dir / 'charts'
        
        # Department ratings
        plt.figure(figsize=(12, 7))
        sns.barplot(data=self.dept_kpis, x='department', y='avg_rating', palette='viridis')
        plt.title('Average Rating by Department', fontsize=16, weight='bold')
        plt.xticks(rotation=45, ha='right')
        plt.tight_layout()
        plt.savefig(charts_dir / 'avg_rating_by_department.png', dpi=300)
        plt.close()
        
        # Performance vs hours
        plt.figure(figsize=(12, 7))
        plt.scatter(
            self.emp_summary['total_hours'],
            self.emp_summary['total_tasks'],
            c=self.emp_summary['avg_rating'],
            cmap='RdYlGn',
            s=100,
            alpha=0.6
        )
        plt.colorbar(label='Avg Rating')
        plt.title('Performance vs Hours Worked', fontsize=16, weight='bold')
        plt.xlabel('Total Hours')
        plt.ylabel('Total Tasks')
        plt.tight_layout()
        plt.savefig(charts_dir / 'performance_vs_hours.png', dpi=300)
        plt.close()
        
        # Efficiency distribution
        plt.figure(figsize=(12, 7))
        plt.hist(self.emp_summary['efficiency'].dropna(), bins=30, color='steelblue', edgecolor='black')
        plt.axvline(self.emp_summary['efficiency'].median(), color='red', linestyle='--', linewidth=2)
        plt.title('Task Completion Rate Distribution', fontsize=16, weight='bold')
        plt.xlabel('Tasks per Hour')
        plt.ylabel('Count')
        plt.tight_layout()
        plt.savefig(charts_dir / 'task_completion_rate.png', dpi=300)
        plt.close()
        
        print(f"✓ Visualizations saved to {charts_dir}")
        
    def run_full_analysis(self):
        """Execute complete analytics pipeline."""
        self.load_data()
        self.export_reports()
        self.generate_visualizations()
        print("✓ Analysis complete!")

# Usage
analyzer = HRAnalyzer(
    db_path='hr.db',
    sql_path='src/queries.sql',
    output_dir='outputs'
)
analyzer.run_full_analysis()
```

## Common Patterns

### Custom KPI Queries

```python
def get_top_performers(db_path, n=10):
    """Retrieve top N employees by efficiency."""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        name, 
        department, 
        efficiency, 
        avg_rating
    FROM employee_summary
    ORDER BY efficiency DESC
    LIMIT ?
    """
    top_performers = pd.read_sql_query(query, conn, params=(n,))
    conn.close()
    return top_performers

def get_department_trends(db_path, department):
    """Get time-series data for specific department."""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        date, 
        daily_tasks, 
        daily_hours, 
        daily_rating
    FROM daily_productivity
    WHERE department = ?
    ORDER BY date
    """
    trends = pd.read_sql_query(query, conn, params=(department,))
    conn.close()
    return trends
```

### Filtering and Aggregation

```python
def analyze_by_role(db_path, role_filter):
    """Aggregate performance by specific role."""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        role,
        AVG(rating) as avg_rating,
        AVG(tasks_completed) as avg_tasks,
        AVG(hours_worked) as avg_hours
    FROM employees
    WHERE role LIKE ?
    GROUP BY role
    """
    role_stats = pd.read_sql_query(query, conn, params=(f'%{role_filter}%',))
    conn.close()
    return role_stats
```

### Adding New Metrics

```python
def calculate_workload_balance(emp_summary):
    """Calculate workload balance score."""
    emp_summary['workload_score'] = (
        emp_summary['total_tasks'] / emp_summary['total_tasks'].max() * 0.4 +
        emp_summary['total_hours'] / emp_summary['total_hours'].max() * 0.3 +
        emp_summary['avg_rating'] / 5 * 0.3
    )
    return emp_summary
```

## Configuration

### Custom Chart Styling

```python
# Set global matplotlib style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Custom colors
DEPT_COLORS = {
    'Engineering': '#3498db',
    'Sales': '#e74c3c',
    'Finance': '#2ecc71',
    'HR': '#f39c12',
    'Support': '#9b59b6'
}
```

### Database Configuration

```python
# For larger datasets, enable performance optimizations
def optimize_database(db_path):
    """Apply SQLite performance settings."""
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    cursor.execute("PRAGMA journal_mode = WAL")
    cursor.execute("PRAGMA synchronous = NORMAL")
    cursor.execute("PRAGMA cache_size = 10000")
    cursor.execute("CREATE INDEX IF NOT EXISTS idx_dept ON employees(department)")
    cursor.execute("CREATE INDEX IF NOT EXISTS idx_date ON employees(date)")
    
    conn.commit()
    conn.close()
```

## Troubleshooting

### Missing Columns Error

```python
# Validate CSV before loading
required_columns = [
    'employee_id', 'name', 'department', 'role', 
    'date', 'tasks_completed', 'hours_worked', 
    'rating', 'projects', 'absences'
]

df = pd.read_csv('data/employees.csv')
missing = set(required_columns) - set(df.columns)
if missing:
    raise ValueError(f"Missing columns: {missing}")
```

### SQL View Not Found

```python
# Check if views exist
def verify_views(db_path):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='view'")
    views = [row[0] for row in cursor.fetchall()]
    conn.close()
    
    expected = ['department_kpis', 'employee_summary', 'daily_productivity']
    missing = set(expected) - set(views)
    if missing:
        print(f"⚠ Missing views: {missing}. Re-run queries.sql")
    else:
        print("✓ All views exist")
```

### Division by Zero in Efficiency

```python
# Safe efficiency calculation
emp_summary['efficiency'] = emp_summary.apply(
    lambda row: row['total_tasks'] / row['total_hours'] 
    if row['total_hours'] > 0 else None,
    axis=1
)
```

### Date Parsing Issues

```python
# Ensure proper date format
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
df = df.dropna(subset=['date'])
```

## Advanced Use Cases

### Time Series Analysis

```python
def plot_monthly_trends(db_path, department):
    """Monthly productivity trends for a department."""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        strftime('%Y-%m', date) as month,
        AVG(rating) as avg_rating,
        SUM(tasks_completed) as total_tasks
    FROM employees
    WHERE department = ?
    GROUP BY month
    ORDER BY month
    """
    df = pd.read_sql_query(query, conn, params=(department,))
    conn.close()
    
    fig, ax1 = plt.subplots(figsize=(14, 7))
    ax2 = ax1.twinx()
    
    ax1.plot(df['month'], df['avg_rating'], 'b-', linewidth=2, label='Avg Rating')
    ax2.bar(df['month'], df['total_tasks'], alpha=0.3, color='gray', label='Total Tasks')
    
    ax1.set_xlabel('Month')
    ax1.set_ylabel('Average Rating', color='b')
    ax2.set_ylabel('Total Tasks', color='gray')
    plt.title(f'{department} Performance Trends')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
```

### Comparative Analysis

```python
def compare_departments(dept_kpis):
    """Generate department comparison report."""
    comparison = dept_kpis[['department', 'avg_rating', 'tasks_per_hour', 'absence_rate']]
    
    # Normalize metrics
    for col in ['avg_rating', 'tasks_per_hour']:
        comparison[f'{col}_norm'] = (
            (comparison[col] - comparison[col].min()) / 
            (comparison[col].max() - comparison[col].min())
        )
    
    comparison['performance_index'] = (
        comparison['avg_rating_norm'] * 0.5 + 
        comparison['tasks_per_hour_norm'] * 0.5
    )
    
    return comparison.sort_values('performance_index', ascending=False)
```

This skill provides comprehensive guidance for using the Employee Performance Analytics project to build HR dashboards, analyze workforce metrics, and generate actionable insights from employee data.
Source

Creator's repository · aradotso/data-skills
View on GitHub ↗
Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk