kiminh

Follow

Ramsey kiminh

Follow

132 followers · 2.9k following

Starred repositories

ShaoQiBNU / adTips

广告系统基础及综述

1 Updated Mar 1, 2026

Liao2025227 / efficient-long-sequence-transformer

Efficient and memory-optimized training methods for long-sequence modeling, including hybrid attention mechanisms (FlashAttention, Performer, Sparse Attention) for scalable long-context learning.

1 Updated Apr 7, 2026

DoodlingAWorld / sasrec-longseq

Efficiency and longer-sequence experiments on SASRec: vectorized negative sampling, full-softmax vs sampled loss, sequence-length scaling (Table V), and throughput profiling. PyTorch, CPU-friendly.

Python 1 Updated Jun 30, 2026

cjpcool / delta-rec

Customized linear attention for scaling long sequence in recommendation system

1 Updated Jul 17, 2026

athrva98 / FlashNystrom

Tensor-core CUDA kernels for Nyström attention, linear-time forward and backward with exact autograd gradients. Faster than flash-attention at long sequence length.

Python 2 1 Updated Jul 24, 2026

tc-mb / llama.cpp-omni

Omni inference in C/C++

C++ 218 62 Updated Jul 22, 2026

Alpha-Park / genpark-marketing-banner-agent-skill

GenPark seasonal promo web page banner matching scheduler skill.

Python 2 Updated Jul 2, 2026

Alpha-Park / genpark-search-auto-suggest-skill

GenPark search autocomplete suggestions and semantic query expansion agent skill.

Python 2 Updated Jun 29, 2026

juanmmm21 / query-parser-autocomplete

A query parser with operator support, spellcheck, and prefix/n-gram based autocomplete suggestions.

Python 1 Updated Jul 9, 2026

z4chariah14 / Drifting-model-reconstruction

A small-scale implementation of Generative Modeling via Drifting

Python 1 Updated Jul 20, 2026

uhulahu / TIGER-Hybrid

基于生成式检索（Generative Retrieval）范式的序列推荐系统。本项目在复现 TIGER 架构的基础上，围绕 Semantic ID tokenizer、Sinkhorn、逐层生成瓶颈和碰撞 suffix 展开系统性诊断与改进。

Jupyter Notebook 3 Updated Jul 14, 2026

naver / fast

Personalized preference alignment from limited data

Python 3 Updated Dec 8, 2025

naver / timehash

Timehash: Hierarchical temporal indexing for efficient "open now" search in large-scale POI systems.

Python 5 2 Updated May 14, 2026

peter-cui-yi / open-open-reasoning

Forked from 5SSjw/open-open-reasoning

HTML 1 Updated Jul 14, 2026

VerdureChen / Beyond-Polarization

Implementation of "Beyond Polarization: The Generative Constraint of Chain-of-Thought in Pointwise Reranking."

Python 1 Updated Jun 3, 2026

lirji / recsys

搜索、推荐、广告平台基础架构

Java 2 Updated Jul 23, 2026

nguyentuongbachhy / CONGA

CONGA: COntrastive Nested Graph Architecture for Continual Sequential Recommendation

Python 1 Updated Jul 20, 2026

Ygier / SALM

Official implementation of our ACM RecSys 2026 paper.

Python 1 Updated Jul 19, 2026

WillDreamer / T2PO

【ICML2026 Spotlight】 T2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

Python 51 Updated May 27, 2026

Liu-Jingzhe / BONSAI

Python 2 Updated Apr 22, 2026

RaykeshR / Project-Open-ended-Text-Generation

Jupyter Notebook 2 Updated Jan 19, 2026

gmeehan96 / SparseColdStart

Python 1 Updated Apr 22, 2026

Zcy233035 / rl-explainer

rl-explainer

Svelte 195 5 Updated Mar 9, 2026

saitejasrivilli / search-engine-rs

Rust 1 Updated Jul 20, 2026

MagicAgent-Search / PCTD

PCTD: Preference-Guided Counterfactual Task Decomposition for Agent Tool Retrieval

Python 5 2 Updated Jul 17, 2026

arlecchino2 / recsys-examples

Forked from NVIDIA/recsys-examples

Examples for Recommenders - easy to train and deploy on accelerated infrastructure.

Python 1 Updated Feb 7, 2026

susmitsingh01 / triton-llm-kernels-lab

A hands-on lab implementing LLM inference kernels from scratch using Triton. Covers fused attention, Flash Attention, GQA, RoPE, INT8 quantization, KV cache optimization, and speculative decoding —…

Jupyter Notebook 1 Updated Jul 4, 2026

v-code01 / kvdivergence

Does quantizing the KV cache change the greedy output? Yes: q8_0 KV changes the generated text on 83% of prompts, q4_0 on 100% (often from the start) - with flash attention held constant, so KV pre…

Python 1 Updated Jul 9, 2026

michaelxu2288 / gpt2-forward-pass

gpt 2 forward pass CUDA kernels w/ optimizations like flash attention, kv cache, cublass/cutlass, split k , tensor cores

Cuda 1 Updated Jul 4, 2026

correaswebert / flash-attention

Flash Attention 2 inference with KV caching deployed on GPT-2

Python 1 1 Updated Mar 3, 2026

Starred topics

arrays

parameter-server

lottery-ticket-hypothesis

covariate-shift

submodular-optimization