- Los Angeles
Highlights
Starred repositories
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
A stock market text sentiment analysis website. A股舆情分析, web-crawler, bayesian algorithm, SQL, django, data-visualization.
FinHack®,一个易于拓展的量化金融框架,它在当前版本中集成了数据采集、因子计算、因子挖掘、因子分析、机器学习、策略编写、量化回测、实盘接入等全流程的量化投研工作。
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
A guidance language for controlling large language models.
an intro to retrieval augmented large language model
ArchGuard Co-mate is an AI-powered architecture copilot, design and governance tools.
ClickPrompt - Streamline your prompt design, with ClickPrompt, you can easily view, share, and run these prompts with just one click. ClickPrompt 用于一键轻松查看、分享和执行您的 Prompt。
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。)
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。
💫 Industrial-strength Natural Language Processing (NLP) in Python
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
SpaCy 中文模型 | Models for SpaCy that support Chinese
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
项目介绍: 智能交互金融智能聊天。具体实现用户在所有关于股票话题的智能问答。其中难点是问题 分类、数据预处理、参数提取。 ☆个人工作: 实现金融智能聊天,实现所有股票问题的精确回答。通过提取通用特征将5亿+条训练语料缩减为10w条,语料内存占用量从10G减少到2M,并将精确度提高98%以上。设计划分股票问题为问股、选股、诊股、百科四个话题。设计利用TF-IDF,无监督训练得到分类。提供可靠稳…
100+ Chinese Word Vectors 上百种预训练中文词向量
A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.