Add SharePoint Connector for Microsoft SharePoint Online and On-Premises

# Enhancement: SharePoint Connector

## 🎯 Overview
Add support for ingesting documents and content from Microsoft SharePoint Online and SharePoint On-Premises environments, enabling organizations to include their SharePoint-based knowledge repositories in their AI-powered development workflows.

## 📋 Problem Statement
Many organizations store critical documentation, policies, procedures, and knowledge assets in Microsoft SharePoint. Currently, qdrant-loader cannot access this content, creating gaps in organizational knowledge bases and limiting the effectiveness of AI-powered development tools that rely on comprehensive documentation.

**Common SharePoint content includes:**
- Technical documentation and wikis
- Project documentation and specifications
- Policies and procedures
- Training materials and guides
- Meeting notes and decisions
- File attachments (PDFs, Office documents, etc.)

## 🚀 Proposed Solution
Implement a SharePoint connector that integrates with both SharePoint Online (Office 365) and SharePoint On-Premises environments using the Office365-REST-Python-Client library.

### Key Features
- **Multi-Environment Support**: SharePoint Online, SharePoint 2013+, and On-Premises
- **Multiple Authentication Methods**: App-Only, username/password, certificate-based, interactive
- **Document Libraries**: Process files from SharePoint document libraries
- **SharePoint Lists**: Extract content from custom lists and built-in lists
- **Rich Metadata**: Capture SharePoint-specific metadata (author, created/modified dates, custom columns)
- **File Conversion Integration**: Leverage existing file conversion for Office documents, PDFs, etc.
- **Incremental Updates**: Change detection for efficient synchronization
- **Attachment Processing**: Handle file attachments with parent-child relationships

## 🏗️ Technical Implementation

### Configuration Structure
```yaml
sources:
  sharepoint:
    company-intranet:
      base_url: "https://company.sharepoint.com/sites/intranet"
      source: "company-intranet"
      source_type: "sharepoint"
      
      # Authentication
      authentication_method: "client_credentials"  # or "username_password", "certificate", "interactive"
      client_id: "${SHAREPOINT_CLIENT_ID}"
      client_secret: "${SHAREPOINT_CLIENT_SECRET}"
      tenant_id: "${SHAREPOINT_TENANT_ID}"
      
      # Content Selection
      document_libraries:
        - "Documents"
        - "Shared Documents"
        - "Policies"
      
      sharepoint_lists:
        - "Announcements"
        - "Project Updates"
        - "Knowledge Base"
      
      # File Processing
      enable_file_conversion: true
      download_attachments: true
      file_extensions: [".docx", ".pdf", ".xlsx", ".pptx", ".txt", ".md"]
      max_file_size: 52428800  # 50MB
      
      # Filtering
      exclude_paths:
        - "Forms/"
        - "_catalogs/"
      include_content_types:
        - "Document"
        - "Page"
        - "List Item"
      
      # Metadata
      custom_columns:
        - "Department"
        - "Category"
        - "Tags"
```

### Authentication Methods

#### 1. App-Only (Client Credentials) - Recommended for Production
```python
# Azure AD App registration required
client_credentials = ClientCredential(client_id, client_secret)
ctx = ClientContext(site_url).with_credentials(client_credentials)
```

#### 2. Username/Password - Development & Testing
```python
user_credentials = UserCredential(username, password)
ctx = ClientContext(site_url).with_credentials(user_credentials)
```

#### 3. Certificate-Based - Enterprise Security
```python
# For high-security environments
ctx = ClientContext(site_url).with_certificate(cert_path, thumbprint, client_id)
```

#### 4. Interactive - Development
```python
# Browser-based authentication for development
ctx = ClientContext(site_url).with_interactive(tenant_id, client_id)
```

### Content Processing Strategy

#### Document Libraries
- Enumerate all files in configured document libraries
- Extract file metadata (title, author, created/modified dates, custom properties)
- Download and convert files using existing file conversion pipeline
- Create parent-child relationships for files with attachments

#### SharePoint Lists
- Process list items as structured documents
- Extract list item fields as metadata
- Handle rich text fields and attachments
- Support custom content types

#### Pages and Wiki Content
- Extract SharePoint pages and wiki content
- Process web parts and embedded content
- Maintain page hierarchy and navigation structure

## 📦 Dependencies

### Primary Library
- **Office365-REST-Python-Client** (>=2.6.0)
  - Actively maintained and comprehensive
  - Supports both SharePoint REST API and Microsoft Graph API
  - Multiple authentication methods
  - Python 3.12 compatible

### Integration Points
- **Existing File Conversion**: Leverage `markitdown` for Office documents, PDFs
- **Document Model**: Extend existing Document model with SharePoint-specific metadata
- **State Management**: Use existing change detection and state tracking
- **Configuration System**: Follow existing Pydantic-based configuration pattern

## 🔧 Implementation Plan

### Phase 1: Core Infrastructure (Week 1-2)
- [ ] Create SharePoint connector package structure
- [ ] Implement SharePointConfig with authentication options
- [ ] Set up basic SharePoint connection and authentication
- [ ] Add Office365-REST-Python-Client dependency

### Phase 2: Document Library Support (Week 3-4)
- [ ] Implement document library enumeration
- [ ] Add file download and metadata extraction
- [ ] Integrate with existing file conversion pipeline
- [ ] Implement basic error handling and logging

### Phase 3: SharePoint Lists Support (Week 5)
- [ ] Add SharePoint list processing
- [ ] Extract list item content and metadata
- [ ] Handle rich text fields and attachments
- [ ] Support custom content types

### Phase 4: Advanced Features (Week 6)
- [ ] Implement incremental updates and change detection
- [ ] Add support for SharePoint pages and wiki content
- [ ] Enhance metadata extraction with custom columns
- [ ] Add comprehensive filtering options

### Phase 5: Testing and Documentation (Week 7)
- [ ] Add comprehensive unit and integration tests
- [ ] Create documentation and configuration examples
- [ ] Add error handling for common SharePoint scenarios
- [ ] Performance optimization and testing

## 🎯 Benefits

### For Organizations
- **Unified Knowledge Base**: Include SharePoint content in AI-powered development workflows
- **Comprehensive Search**: Search across SharePoint and other sources simultaneously
- **Existing Investment**: Leverage existing SharePoint content without migration
- **Enterprise Integration**: Native support for enterprise authentication and security

### For Developers
- **Contextual AI Assistance**: Access SharePoint documentation through Cursor, Windsurf, etc.
- **Cross-Platform Search**: Find information across Git, Confluence, JIRA, and SharePoint
- **Rich Metadata**: Leverage SharePoint's rich metadata for better search results
- **File Format Support**: Process Office documents, PDFs, and other SharePoint files

## 🔍 Use Cases

1. **Enterprise Documentation**: Access company policies, procedures, and guidelines
2. **Project Knowledge**: Include project documentation stored in SharePoint
3. **Training Materials**: Process training content and knowledge base articles
4. **Cross-Team Collaboration**: Search across team sites and document libraries
5. **Compliance Documentation**: Include regulatory and compliance documents
6. **Meeting Notes**: Process meeting minutes and decision records

## 📊 Success Criteria

- [ ] Successfully connect to SharePoint Online and On-Premises environments
- [ ] Support all major authentication methods (App-Only, username/password, certificate)
- [ ] Process document libraries with file conversion integration
- [ ] Extract and process SharePoint list content
- [ ] Maintain rich metadata from SharePoint (author, dates, custom columns)
- [ ] Implement efficient change detection and incremental updates
- [ ] Handle file attachments with parent-child relationships
- [ ] Achieve processing time < 30 seconds for typical documents
- [ ] Test coverage > 90% for new components
- [ ] Zero breaking changes to existing functionality

## 🔒 Security Considerations

- **Authentication**: Support enterprise-grade authentication methods
- **Permissions**: Respect SharePoint permissions and access controls
- **Data Privacy**: Handle sensitive content according to organizational policies
- **Secure Storage**: Store credentials securely using environment variables
- **Audit Trail**: Log access and processing activities for compliance

## 📚 Configuration Examples

### Basic SharePoint Online Setup
```yaml
sources:
  sharepoint:
    main-site:
      base_url: "https://company.sharepoint.com/sites/main"
      source: "main-site"
      source_type: "sharepoint"
      authentication_method: "client_credentials"
      client_id: "${SHAREPOINT_CLIENT_ID}"
      client_secret: "${SHAREPOINT_CLIENT_SECRET}"
      document_libraries: ["Documents"]
      enable_file_conversion: true
```

### Advanced Multi-Library Configuration
```yaml
sources:
  sharepoint:
    knowledge-base:
      base_url: "https://company.sharepoint.com/sites/kb"
      source: "knowledge-base"
      source_type: "sharepoint"
      authentication_method: "client_credentials"
      client_id: "${SHAREPOINT_CLIENT_ID}"
      client_secret: "${SHAREPOINT_CLIENT_SECRET}"
      
      document_libraries:
        - "Technical Documentation"
        - "Policies and Procedures"
        - "Training Materials"
      
      sharepoint_lists:
        - "FAQ"
        - "Best Practices"
        - "Announcements"
      
      enable_file_conversion: true
      download_attachments: true
      custom_columns: ["Department", "Category", "Priority"]
      
      exclude_paths:
        - "Archive/"
        - "Templates/"
```

## 🏷️ Related Issues

This enhancement complements existing features:
- File conversion support (#16) - for processing SharePoint documents
- Multiple projects support (#20) - for organizing SharePoint content by project

## 📖 References

- [Office365-REST-Python-Client Documentation](https://github.com/vgrem/Office365-REST-Python-Client)
- [SharePoint REST API Reference](https://docs.microsoft.com/en-us/sharepoint/dev/sp-add-ins/get-to-know-the-sharepoint-rest-service)
- [Microsoft Graph SharePoint API](https://docs.microsoft.com/en-us/graph/api/resources/sharepoint)
- [Azure AD App Registration Guide](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app)

---

**Labels**: `enhancement`, `feature-request`, `connector`, `sharepoint`, `enterprise`
**Priority**: High - addresses common enterprise requirement
**Effort**: Medium - leverages existing architecture and proven libraries

Add SharePoint Connector for Microsoft SharePoint Online and On-Premises #22

Description

Enhancement: SharePoint Connector

🎯 Overview

📋 Problem Statement

🚀 Proposed Solution

Key Features

🏗️ Technical Implementation

Configuration Structure

Authentication Methods

1. App-Only (Client Credentials) - Recommended for Production

2. Username/Password - Development & Testing

3. Certificate-Based - Enterprise Security

4. Interactive - Development

Content Processing Strategy

Document Libraries

SharePoint Lists

Pages and Wiki Content

📦 Dependencies

Primary Library

Integration Points

🔧 Implementation Plan

Phase 1: Core Infrastructure (Week 1-2)

Phase 2: Document Library Support (Week 3-4)

Phase 3: SharePoint Lists Support (Week 5)

Phase 4: Advanced Features (Week 6)

Phase 5: Testing and Documentation (Week 7)

🎯 Benefits

For Organizations

For Developers

🔍 Use Cases

📊 Success Criteria

🔒 Security Considerations

📚 Configuration Examples

Basic SharePoint Online Setup

Advanced Multi-Library Configuration

🏷️ Related Issues

📖 References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions