- Los Angeles
Highlights
Starred repositories
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
A stock market text sentiment analysis website. A股舆情分析, web-crawler, bayesian algorithm, SQL, django, data-visualization.
FinHack®,一个易于拓展的量化金融框架,它在当前版本中集成了数据采集、因子计算、因子挖掘、因子分析、机器学习、策略编写、量化回测、实盘接入等全流程的量化投研工作。
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
A guidance language for controlling large language models.
an intro to retrieval augmented large language model
ArchGuard Co-mate is an AI-powered architecture copilot, design and governance tools.
ClickPrompt - Streamline your prompt design, with ClickPrompt, you can easily view, share, and run these prompts with just one click. ClickPrompt 用于一键轻松查看、分享和执行您的 Prompt。
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。)
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。
💫 Industrial-strength Natural Language Processing (NLP) in Python
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
SpaCy 中文模型 | Models for SpaCy that support Chinese
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
项目介绍: 智能交互金融智能聊天。具体实现用户在所有关于股票话题的智能问答。其中难点是问题 分类、数据预处理、参数提取。 ☆个人工作: 实现金融智能聊天,实现所有股票问题的精确回答。通过提取通用特征将5亿+条训练语料缩减为10w条,语料内存占用量从10G减少到2M,并将精确度提高98%以上。设计划分股票问题为问股、选股、诊股、百科四个话题。设计利用TF-IDF,无监督训练得到分类。提供可靠稳…
100+ Chinese Word Vectors 上百种预训练中文词向量
A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.