Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support XLSX files #2403

Merged
merged 9 commits into from
Oct 3, 2024
Merged

Conversation

shatfield4
Copy link
Collaborator

@shatfield4 shatfield4 commented Oct 1, 2024

Pull Request Type

  • ✨ feat
  • πŸ› fix
  • ♻️ refactor
  • πŸ’„ style
  • πŸ”¨ chore
  • πŸ“ docs

Relevant Issues

resolves #2398

What is in this change?

  • Allow upload of XLSX files (iterate over each sheet and create a folder of sheets)
  • Uses xlsx package to convert sheet into csv format

Additional Information

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

@shatfield4 shatfield4 linked an issue Oct 1, 2024 that may be closed by this pull request
@timothycarambat timothycarambat merged commit b658f50 into master Oct 3, 2024
@timothycarambat timothycarambat deleted the 2398-feat-xslx-file-upload-support branch October 3, 2024 20:45
lohawk-azalea pushed a commit to azalea-gograbcode/anything-llm that referenced this pull request Oct 4, 2024
* patch scrollbar on msgs
resolves Mintplex-Labs#2190

* remove system setting cap on messages (use at own risk)

* Bug/make swagger json output openapi 3 compliant (Mintplex-Labs#2219)

update source to ensure swagger.json is openapi 3.0.0 compliant

* Feature/use escape key to close documents modal (Mintplex-Labs#2222)

* Add ability to use Esc keypress to close modal for documents

* move escape close to hook

---------

Co-authored-by: Mr Simon C <iamontheinternet@yahoo.com>

* Feature/add searchapi web browsing (Mintplex-Labs#2224)

* Add SearchApi to web browsing

* UI modifications for SearchAPI

---------

Co-authored-by: Sebastjan Prachovskij <sebastjan.prachovskij@gmail.com>

* fixed the typo in LLMs (Mintplex-Labs#2225)

(not a big deal, just to avoid someone pointing it out)

* Ollama sequential embedding (Mintplex-Labs#2230)

* ollama: Switch from parallel to sequential chunk embedding

* throw error on empty embeddings

---------

Co-authored-by: John Blomberg <john.jb.blomberg@gmail.com>

* remove log

* remove Jazzicons & Add default pfps (Mintplex-Labs#2233)

remove Jazzicons
update pfps

* update docs showing no need for manual port forwarding of Server in G… (Mintplex-Labs#2247)

update docs showing no need for manual port forwarding of Server in GHCodespaces

* Add verbose logging to GH loader
connect Mintplex-Labs#2243

* match user prompts exactly not partially (Mintplex-Labs#2245)

* Milvus bug fix (Mintplex-Labs#2183)

* patch no text results for milvus chunks

* wrap addDocumentToNamespace in try catch for handling milvus errors

* lint

* revert milvus db changes

* add try catch to handle grpc error from milvus

* Fix UI for slash cmd presets (Mintplex-Labs#2260)

* fix ui for slash cmd presets

* hide scroll

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

* Add support for custom agent skills via plugins (Mintplex-Labs#2202)

* Add support for custom agent skills via plugins
Update Admin.systemPreferences to updated endpoint (legacy has deprecation notice

* lint

* dev build

* patch safeJson
patch label loading

* allow plugins with no config options

* lint

* catch invalid setupArgs in frontend

* update link to docs page for agent skills

* remove unneeded files

---------

Co-authored-by: shatfield4 <seanhatfield5@gmail.com>

* update 128k label for Azure models
resolves Mintplex-Labs#2264

* Add Gemini `exp` models (Mintplex-Labs#2268)

Add Gemini  models
resolves Mintplex-Labs#2263

* Update OpenAI models and prices Mintplex-Labs#2261 (Mintplex-Labs#2269)

* Update OpenAI models

* Sort OpenAI models by created timestamp in ascending order

* Update OpenAI models price

* uncheck fallback listing (even if old)
closes Mintplex-Labs#2261

* linting

---------

Co-authored-by: Yaner <1468275133@qq.com>

* UI Cleanup (Mintplex-Labs#2270)

Remove FineTuningBanner
remove AgentAlert for first time users

* Patch missing folder autogenerate for plugins (Mintplex-Labs#2273)

* bump ref to browser ext

* bump Perplexity models (Mintplex-Labs#2275)

* Make `userId` actually optional for workspaceThread endpoint (Mintplex-Labs#2276)

* update doc links and readme

* Support `@agent` custom skills (Mintplex-Labs#2280)

* Support `@agent` custom skills

* move import

* Patch UI bug with agent skill web-search and sql-connector (Mintplex-Labs#2282)

* Patch UI bug with agent skill

* wrap call in try/catch for failures
res?. optional call for settings since null is default

* uncheck

* Patch 11Labs selection UI bug (Mintplex-Labs#2284)

* Patch 11Labs selection UI bug

* remove log

* 1943 add fireworksai support (Mintplex-Labs#2300)

* Issue Mintplex-Labs#1943: Add support for LLM provider - Fireworks AI

* Update UI selection boxes
Update base AI keys for future embedder support if needed
Add agent capabilites for FireworksAI

* class only return

---------

Co-authored-by: Aaron Van Doren <vandoren96+1@gmail.com>

* Appearance setting for show/hide scroll bar on chat window (Mintplex-Labs#2187)

* implement appearance setting for show/hide scrollbar

* put back comments

* revert backend for show_scrollbar

* show scrollbar save to localstorage

* old model function

* lint

* edit

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>

* Fix gitlab data connector for self-hosted instances (Mintplex-Labs#2315) (Mintplex-Labs#2316)

* Fix gitlab data connector for self-hosted instances (Mintplex-Labs#2315)

* Linting fix.

* Add more verbose error messages in embed chat (Mintplex-Labs#2306)

* publish embed updates

* server sided error messages

* publish embed chat widget

* sync submodule

* unset change

* update embed to merged changes for error

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

* Load all branches in gitlab data connector (Mintplex-Labs#2325)

* Fix gitlab data connector for self-hosted instances (Mintplex-Labs#2315)

* Linting fix.

* Load all branches in the GitLab data connector Mintplex-Labs#2319

* Mintplex-Labs#2319 lint fixes.

* update fetch on fail

---------

Co-authored-by: BΕ‚aΕΌej Owczarczyk <blazeyy@gmail.com>

* Add ability to copy/paste images, files, and text from web, local, or otherwise (Mintplex-Labs#2326)

* Fix custom domain in confluence (Mintplex-Labs#2328)

confluence custom domain fix

* Enable Mistral Multimodal (Mintplex-Labs#2343)

* Enable Mistral Multimodal

* remove console

* Added JSONSchema for `plugin.json` files (Mintplex-Labs#2344)

Added JSONSchema for agent skill plugin manifest files

Signed-off-by: Jaid <6216144+Jaid@users.noreply.github.com>

* Make streaming behavior more natural (Mintplex-Labs#2336)

* fix scrolling behavior + add cursor to streaming chats

* lint

* linting

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

* Dont overwrite content in input on paste
linting

* Add select/unselect all context menu to directory component (Mintplex-Labs#2337)

add select/unselect all context menu to directory component

* PR#2355 Continued + expanded scope (Mintplex-Labs#2365)

* Mintplex-Labs#2317 Fetch pinned documents once per folder to reduce the number of queries.

* Reorder the lines to keeps const declarations together.

* Add some comments to functions
move pinned document fetch for folder to function
move watched documents per-folder to also function the same
remove unused function in documents model

---------

Co-authored-by: BΕ‚aΕΌej Owczarczyk <blazeyy@gmail.com>

* Export embedded chat history (Mintplex-Labs#2329)

export embedded chat history

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>

* Bulk document removal from workspace

* wip improve remove document ux

* fix border ui bugs when adding files to workspace

* sort workspacedirectory put adding files at top

* fix workspace file row ui shifting

* fix selected items bug when adding another item with items already selected on workspace

* fix tooltip

* lint

* refactor

* fix bug where unadding single item while selected would stay selected

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

* Workspace agent autoselection (Mintplex-Labs#2357)

* refactor agent to add fallback to workspace, then to chat provider/model

* commenting
update logic for bedrock and fireworks fallbacks

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

* Support attachments in developer API (Mintplex-Labs#2373)

* support attachments in developer api

* lint

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>

* 1417 completion timeout (Mintplex-Labs#2374)

* Refactor handleDefaultStreamResponseV2 function for better error handling

* run yarn lint

* small error handling changes

* update error handling flow and scope of vars

* add back space

---------

Co-authored-by: Roman <rrojaski@gmail.com>

* Support more Confluence URL formats (Mintplex-Labs#2118)

* support more confluence url formats

* use pattern matching for confluence urls and manual splitting as fallback

* rework entire Confluence flow to prevent issues with custom, local, and cloud spaces

* remove dep

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>

* Add dropdown for confluence connector deployment (Mintplex-Labs#2376)

* Added an option to fetch issues from gitlab. Made the file fetching a… (Mintplex-Labs#2335)

* Added an option to fetch issues from gitlab. Made the file fetching asynchornous to improve performance. Mintplex-Labs#2334

* Fixed a typo in loadGitlabRepo.

* Convert issues to markdown.

* Fixed an issue with time estimate field names in issueToMarkdown.

* handle rate limits more gracefully + update checkbox to toggle switch

* lint

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>
Co-authored-by: shatfield4 <seanhatfield5@gmail.com>

* 1959 filetype filters (Mintplex-Labs#2378)

* Updated the `GitHubRepoLoader` class to use the new import syntax and adjust the `recursiveLoader` method accordingly.

* add @langchain/community to collector package.json

* fix: Improve handling of complex ignore patterns in GitLabRepoLoader

* refactor: use ignore package for simplified ignore logic

* run yarn lint

* add @langchain/community@^0.2.23

* remove unused dep
lint

---------

Co-authored-by: Emil Rofors (aider) <emirof@gmail.com>

* Support DeepSeek (Mintplex-Labs#2377)

* add deepseek support

* lint

* update deepseek context length

* add deepseek to onboarding

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>

* Once again, modify Confluence to support every esoteric combination and undocumented way of running Confluence ever devised
resolves Mintplex-Labs#2379

* Patch bug with pasted text not being detected (Mintplex-Labs#2386)

* [FEAT] Add Llama 3.2 models to Fireworks AI's LLM selection dropdown (Mintplex-Labs#2384)

Add Llama 3.2 3B and 1B models to Fireworks AI LLM selection

* Added voyage-3 and voyage-3-lite. (Mintplex-Labs#2394)

* Add 3GB file size limit to body parser middlewares (Mintplex-Labs#2390)

* Tavily search web search agent support (Mintplex-Labs#2395)

* support tavily search web search agent

* lint

* remove unneeded comments

* Add exception handling for special case files like `Dockerfile` and `Jenkinsfile` (Mintplex-Labs#2410)

* Support XLSX files (Mintplex-Labs#2403)

* support xlsx files

* lint

* create seperate docs for each xlsx sheet

* lint

* use node-xlsx pkg for parsing xslx files

* lint

* update error handling

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

---------

Signed-off-by: Jaid <6216144+Jaid@users.noreply.github.com>
Co-authored-by: timothycarambat <rambat1010@gmail.com>
Co-authored-by: Mr Simon C <iamontheinternet@yahoo.com>
Co-authored-by: Sebastjan Prachovskij <sebastjan.prachovskij@gmail.com>
Co-authored-by: amrrs <1littlecoder@gmail.com>
Co-authored-by: John Blomberg <john.jb.blomberg@gmail.com>
Co-authored-by: Sean Hatfield <seanhatfield5@gmail.com>
Co-authored-by: Yaner <1468275133@qq.com>
Co-authored-by: Aaron Van Doren <vandoren96+1@gmail.com>
Co-authored-by: Blazej Owczarczyk <blazeyy@gmail.com>
Co-authored-by: Jaid <6216144+Jaid@users.noreply.github.com>
Co-authored-by: Roman <rrojaski@gmail.com>
Co-authored-by: Emil Rofors (aider) <emirof@gmail.com>
Co-authored-by: a4v2d4 <61302808+a4v2d4@users.noreply.github.com>
Co-authored-by: Bahtiar Ariyanki <bahtiarariyanki@Bahtiars-MacBook-Air.local>
shatfield4 added a commit that referenced this pull request Oct 8, 2024
* Support XLSX files (#2403)

* support xlsx files

* lint

* create seperate docs for each xlsx sheet

* lint

* use node-xlsx pkg for parsing xslx files

* lint

* update error handling

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>

* wip chat window

* ux+ux improvements and update new colors

* chat window dark mode

* remove comment

---------

Co-authored-by: timothycarambat <rambat1010@gmail.com>
@Tofusito
Copy link

Hello!

First of all, thank you for including this feature in AnythingLLM.

Before this, I was manually converting XLSX files to CSV before uploading, but I wasn’t able to achieve satisfactory results with those conversions. I noticed that you now convert to CSV and can embed the files for RAG processing. Could you clarify which embedding tool you are using for this?

I’ve tried the included one, Nomic, and JinAI, but I’m still facing issues. I’m unable to access all the sheets generated by the converter.

Thanks in advance for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEAT]: .XSLX File upload support
3 participants