Tags · Encamina/enmarcha

v10.0.2

Merge pull request #183 from Encamina/@mramos/fix_OpenAPI

Update Swashbuckle packages to 8.1.4

Dec 17, 2025
2f55b35
zip
tar.gz
Notes
Downloads

v10.0.1

Merge pull request #182 from Encamina/@rliberoff/update-semantic-kernel

Update Semantic Kernel dependencies to 1.68.0, bump version

Dec 17, 2025
d3ecc8c
zip
tar.gz
Notes
Downloads

v10.0.0

Merge pull request #181 from Encamina/upgrade-to-NET10

Upgrade to net10

Dec 16, 2025
ac3cb41
zip
tar.gz
Notes
Downloads

v10.0.0-preview-09

Merge pull request #180 from Encamina/@ddiaz/migrating-middlewares

Add SemanticKernel rate limit middleware

Nov 19, 2025
37ee1c9
zip
tar.gz
Notes
Downloads

v10.0.0-preview-08

Merge pull request #179 from Encamina/@ddiaz/new-version-update

Update version suffix to preview-08

Nov 18, 2025
3581bfa
zip
tar.gz
Notes
Downloads

v10.0.0-preview-07

Merge pull request #177 from Encamina/@hramos/smart-chunking

# Summary
This is an improvement to PDF processing for use in RAG.
## Technical details
- PDF -> Markdown conversion using the Mistral Document AI 2505 model.
- Refinement of the resulting Markdown with GPT-4.1.
- Document segmentation (chunking) with the following rules:
- Token limit per chunk: 1024.
- A header hierarchy is respected (H1, H2, ... and bold text).
- Header inheritance system: chunks keep the context of higher-level headers to preserve content coherence.
- Filtering of very small chunks (e.g., < 30 tokens) to avoid noise in the index.

## Included files
- `src\Encamina.Enmarcha.SemanticKernel.Connectors.Document\Connectors\MistralAIDocumentConnector.cs`
- Orchestrates extraction from PDFs, calls to MistralAI (HTTP endpoint) and subsequent refinement with a chat model (GPT-4.1).
- Manages PDF splitting, the HTTP request to Mistral, and the logic to send parts for LLM refinement.
- Configurable via `MistralAIDocumentConnectorOptions` (Endpoint, ApiKey, ModelName, SplitPageNumber, LLMPostProcessing).

- `src\Encamina.Enmarcha.SemanticKernel.Connectors.Document\Utils\MistralAIHelper.cs`
- Utilities for:
- Splitting PDFs by pages (`SplitPdfByPagesAsync`) using PdfPig.
- Building a base64 data URL of the PDF to send to the service (`BuildPdfDataUrlAsync`).
- Extracting and combining Markdown from Mistral's JSON response (`ExtractAndCombineMarkdown`), replacing image references with filenames.
- Splitting Markdown into manageable parts for LLM refinement (`SplitMarkdownForRefinement`).
- Normalizing and extracting embedded images (`ExtractImageDataFromPage` / `ReplaceImagesInMarkdown`).
- Implements merging and cleanup during page extraction.

- `src\Encamina.Enmarcha.AI\TextSplitters\EnrichedMarkdownCharacterSplitter.cs`
- Splitter that:
- Respects header hierarchies (#, ##, ###, ...) and treats H1 as main sections.
- Performs recursive splitting by header levels and by delimiters when necessary.
- Extracts metadata (H1...H6 and Bold) and maintains inherited context across chunks.
- Avoids very small chunks and prioritizes keeping paragraphs/semantic blocks together.

- `src\Encamina.Enmarcha.AI\OpenAI\Abstractions\ModelInfo.cs`
- Add GPT-5 models:
- GPT-5
- GPT-5-mini

## Rules and transformations applied
- Preserve all textual content from the PDF (do not remove text); only correct/structure it into Markdown.
- Merge tables split across pages when they share identical headers or are direct continuations.
- Fix malformed tables, lists, and markdown; remove repeated footers/headers and HTML pagination comments.
- Correct common OCR errors (hyphenated/split words, extra spaces, stray characters).
- Do not generate automatic links or HTML entities; do not add new content that changes the original text.

Oct 22, 2025
db3e833
zip
tar.gz
Notes
Downloads

v10.0.0-preview-06

Merge pull request #176 from Encamina/@mramos/update_Microsoft.Azure.…

…Cosmos

Update Microsoft.Azure.Cosmos to `3.49.0`

Oct 14, 2025
95a04db
zip
tar.gz
Notes
Downloads

v10.0.0-preview-05

Merge pull request #174 from Encamina/@mramos/traceparent_agentsSdk

Implement telemetry correlation in M365 Agents SDK

Oct 8, 2025
d1ba74c
zip
tar.gz
Notes
Downloads

v10.0.0-preview-04

Merge pull request #172 from Encamina/@mramos/activity_propagation_logs

Update `TelemetryInitializerMiddleware` and `TelemetryAgentIdInitializer` to work with Agents 365 SDK

Oct 7, 2025
3e300eb
zip
tar.gz
Notes
Downloads

v10.0.0-preview-03

Merge pull request #171 from LuisM000/@lmarcos/fix_conversation_state…

…_logger_middleware

Update conversation state access in middleware

Sep 16, 2025
a32bfc7
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!