基本信息

姓名:苏剑林

生日:1993年X月Y日

硕士:中山大学数学学院

本科:华南师范大学数学科学学院

坐标:广东广州

老家:广东云浮

爱好:阅读、研究、折腾

偶像:Richard Feynman

邮箱bojone@spaces.ac.cn

主页https://jianlin.su

微博https://weibo.com/bojone

推特https://x.com/Jianlin_S

代码https://github.com/bojone

学术Google Scholar

东扯西犊

中山大学基础数学研究生,本科为华南师范大学。93年从奥尔特星云移民地球,因忘记回家路线,遂仰望星空,希望找到时空之路。

兼爱各种科学,热衷钻牛角尖,因此经常碰壁,但偶然把牛角钻穿,也乐在其中。偏爱物理、天文、计算机,喜欢思考,企图打开科学的果壳。虽擅长理性分析,但也容易感情用事,崇拜Feynman。闲时无聊读金庸附庸风雅,没事偷懒玩象棋适情雅趣,时时兴起焖炖煮开水白菜,偶尔手痒也开开数据挖掘机仰望蓝翔。

明明要学基础数学,偏偏不务正业,沉溺神经网络,妄想人工智能,未曾在ACL、AAAI、CVPR、ICLR等发表多篇文章。目前专注于自然语言处理,企图破解语言奥秘。爱好写作,经常在博客天方夜谭,幸未被读者嫌弃。现科学空间(https://kexue.fm)恭候各位大驾光临,非诚亦可扰。

微言微语

  • 2025-11-19 01:16

    推荐论文:

    An Exploration of Non-Euclidean Gradient Descent_ Muon and its Many Variants

    Back to Basics_ Let Denoising Generative Models Denoise

    Branching Flows_ Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

    Decoupling Positional and Symbolic Attention Behavior in Transformers

    How Memory in Optimization Algorithms Implicitly Modifies the Loss

    Isotropic Curvature Model for Understanding Deep Learning Optimization_ Is Gradient Orthogonalization Optimal?

    L2M_ Mutual Information Scaling Law for Long-Context Language Modeling

    Larger Datasets Can Be Repeated More_ A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression

    On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication

    On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling

    Scaling Laws and In-Context Learning_ A Unified Theoretical Framework

    https://papers.cool/arxiv/2510.09827,2511.13720,2511.09465,2511.11579,2502.02132,2511.00674,2503.04725,2511.13421,2511.00025,2505.22491,2511.06232

  • 2025-11-13 13:00

    妈妈“现在的孩子怎么这么多病呢”
    儿子“不是病多了,是能治的病多了,搁以前都是夭折的”

    好精辟的回答,受教了!

    来源:https://www.zhihu.com/question/1926923396882621109/answer/1970943451643224638 评论区

  • 2025-11-10 17:06

    入职两年多,第二次到北京总部。

  • 2025-10-24 16:34

    推荐论文:

    Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization

    AlphaFlow_ Understanding and Improving MeanFlow Models

    Arithmetic-Mean μP for Modern Architectures_ A Unified Learning-Rate Scale for CNNs and ResNets

    Equilibrium Matching_ Generative Modeling with Implicit Energy-Based Models

    From Condensation to Rank Collapse_ A Two-Stage Analysis of Transformer Training Dynamics

    On residual network depth

    On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization

    Optimal Scaling Needs Optimal Norm

    Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks

    Who Said Neural Networks Aren't Linear?

    Why Low-Precision Transformer Training Fails_ An Analysis on Flash Attention

    https://papers.cool/arxiv/2510.04988,2510.20771,2510.04327,2510.02300,2510.06954,2510.03470,2510.19953,2510.03871,2510.11354,2510.08570,2510.04212

  • 2025-10-06 11:07

    疯狗逻辑:虽然某些人跟疯狗有很大区别,但我只要我认为这些区别不重要,那么某些人就是疯狗。

  • 2025-10-03 23:08

    我是一个比较蠢的人,只会按部就班地进行推导,同时也没啥直觉,通常无法理解能推导出来以外的内容。

  • 2025-10-02 21:23

    推荐论文:

    Conda_ Column-Normalized Adam for Training Large Language Models Faster

    DiVeQ_ Differentiable Vector Quantization Using the Reparameterization Trick

    Efficient Hyperparameter Tuning via Trajectory Invariance Principle

    Muon Outperforms Adam in Tail-End Associative Memory Learning

    Power Lines_ Scaling Laws for Weight Decay and Batch Size in LLM Pre-training

    Unveiling the Role of Learning Rate Schedules via Functional Scaling Laws

    https://papers.cool/arxiv/2509.24218,2509.26469,2509.25049,2509.26030,2505.13738,2509.19189

  • 2025-09-16 11:04

    推荐论文:

    Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching

    Attention as an Adaptive Filter

    Causal Attention with Lookahead Keys

    Depth-Aware Initialization for Stable and Efficient Neural Network Training

    Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models

    Flow Straight and Fast in Hilbert Space_ Functional Rectified Flow

    Limitations of Normalization in Attention Mechanism

    Predicting the Order of Upcoming Tokens Improves Language Modeling

    Rotational Equilibrium_ How Weight Decay Balances Learning Across Neural Networks

    Scaled-Dot-Product Attention as One-Sided Entropic Optimal Transport

    The Optimiser Hidden in Plain Sight_ Training with the Loss Landscape's Induced Metric

    Transition Models_ Rethinking the Generative Learning Objective

    UltraMemV2_ Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

    Understanding Transformers through the Lens of Pavlovian Conditioning

    https://papers.cool/arxiv/2509.00336,2509.04154,2509.07301,2509.05018,2508.21106,2509.10384,2508.17821,2508.19228,2305.17212,2508.08369,2509.03594,2509.04394,2508.18756,2508.08289

  • 2025-08-10 23:52

    推荐论文:

    Accelerating Newton-Schulz Iteration for Orthogonalization via Chebyshev-type Polynomials

    Zero-Variance Gradients for Variational Autoencoders

    https://papers.cool/arxiv/2506.10935,2508.03587

  • 2025-08-09 19:20

    mark:知乎关注8万。

部分工作

title: Variational Inference: A Unified Framework of Generative Models and Some Revelations
author: Su Jianlin
journal: arXiv preprint arXiv:1807.05936
year: 2018

title: Using deep Residual Networks to search for galaxy-Ly $\alpha$ emitter lens candidates based on spectroscopic selection
author: Li Rui; Shu Yiping; Su Jianlin; Feng Haicheng; Zhang Guobao; Wang Jiancheng; Liu Hongtao
journal: Monthly Notices of the Royal Astronomical Society
volume: 482
number: 1
pages: 313--320
year: 2018
publisher: Oxford University Press

title: f-VAEs: Improve VAEs with Conditional Flows
author: Su Jianlin; Wu Guang
journal: arXiv preprint arXiv:1809.05861
year: 2018

title: Training Generative Adversarial Networks Via Turing Test
author: Su Jianlin
journal: arXiv preprint arXiv:1810.10948
year: 2018

title: Gan-qp: A novel gan framework without gradient vanishing and lipschitz constraint
author: Su Jianlin
journal: arXiv preprint arXiv:1811.07296
year: 2018

title: Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification
author: Ren Hao; Su Jianlin; Lu Hong
journal: arXiv preprint arXiv:1901.10112
year: 2019

title: Artist Style Transfer Via Quadratic Potential
author: Bhalley Rahul; Su Jianlin
journal: arXiv preprint arXiv:1902.11108
year: 2019

title: O-GAN: Extremely Concise Approach for Auto-Encoding Generative Adversarial Networks
author: Su Jianlin
journal: arXiv preprint arXiv:1903.01931
year: 2019

title: Rectified Exponential Units for Convolutional Neural Networks
author: Ying Yao; Su Jianlin; Shan Peng; Miao Ligang; Wang Xiaolian; Peng Silong
journal: IEEE Access
year: 2019
publisher: IEEE

title: A Novel Cascade Binary Tagging Framework for Relational Triple Extraction
author: Zhepei Wei; Jianlin Su; Yue Wang; Yuan Tian; Yi Chang
journal: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
year: 2020
publisher: ACL

title: Whitening Sentence Representations for Better Semantics and Faster Retrieval
author: Jianlin Su; Jiarun Cao; Weijie Liu; Yangyiwen Ou
journal: arXiv preprint arXiv:2103.15316
year: 2021

title: RoFormer: Enhanced Transformer with Rotary Position Embedding
author: Jianlin Su; Yu Lu; Shengfeng Pan; Bo Wen; Yunfeng Liu
journal: arXiv preprint arXiv:2104.09864
year: 2021

往事如烟

苏剑林,今年(2009)正好16岁,居住在广东省云浮市的一个小村庄。

我从小就对科学感兴趣,数学是我的强项,不过到了初三,还要加上一个“化学”。

我从2006.09开始接触电脑,而接触网络的时间就是2007.01,想想看,发展还是挺快的(接触电脑之前我可是一无所知)。2007.04接触到了BBS,后来曾经自行建立过IT类的BBS,后来因为IT而疏远了科学。到了2008.09以后,我开始重新专注科学,于是在努力下,便诞生了这个Blog。

现在(2012年7月)我已经是高中毕业了。经历了很多事情,也成熟了很多,自我感觉我更懂得珍惜了,也有了各种各样喜欢的东西。以前我的很内向、腼腆,现在相对来说开朗了很多,也懂得和朋友们一起闹、一起疯了。当然,我对科学的激情有增无减,但是兴趣方面有所变化。数学依然是我的核心,我爱好物理,陶醉于天文,之前的化学、生物于我而言成为了业余的兴趣了。^_^愿在科学空间一直和各位读者分享我的科学人生。

目前(2018年1月)中山大学研究生二年级,专业是基础数学(方向为生物应用数学),但花了较多时间在机器学习相关(尤其是自然语言处理)方面。各种东西都想学,都想弄清楚,无奈心有余而力不足~加油吧,再前进一点点。

如今(2019年7月)总算顺利毕业了,彻底入坑了机器学习。目前在追一科技的机器学习算法部门打杂~

(未完,但别待续了吧~)