<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Large Model | Yequan&#39;s Academic</title>
    <link>http://localhost:1313/tag/large-model/</link>
      <atom:link href="http://localhost:1313/tag/large-model/index.xml" rel="self" type="application/rss+xml" />
    <description>Large Model</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Sat, 16 Aug 2025 00:00:00 +0000</lastBuildDate>
    <image>
      <url>http://localhost:1313/media/icon_hu041b0395efa72cb92c3618e7883e8354_359622_512x512_fill_lanczos_center_3.png</url>
      <title>Large Model</title>
      <link>http://localhost:1313/tag/large-model/</link>
    </image>
    
    <item>
      <title>Not All Layers of LLMs Are Necessary During Inference</title>
      <link>http://localhost:1313/publication/ijcai2025-adainfer/</link>
      <pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate>
      <guid>http://localhost:1313/publication/ijcai2025-adainfer/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Few-Shot Learner Generalizes Across AI-Generated Image Detection</title>
      <link>http://localhost:1313/publication/icml2025/</link>
      <pubDate>Tue, 15 Jul 2025 00:00:00 +0000</pubDate>
      <guid>http://localhost:1313/publication/icml2025/</guid>
      <description></description>
    </item>
    
    <item>
      <title>52B to 1T: Lessons Learned via Tele-FLM Series</title>
      <link>http://localhost:1313/publication/arxiv2024-tele-flm-1t/</link>
      <pubDate>Wed, 03 Jul 2024 00:00:00 +0000</pubDate>
      <guid>http://localhost:1313/publication/arxiv2024-tele-flm-1t/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Masked Structural Growth for 2x Faster Language Model Pre-training</title>
      <link>http://localhost:1313/publication/iclr2024-msg/</link>
      <pubDate>Tue, 07 May 2024 00:00:00 +0000</pubDate>
      <guid>http://localhost:1313/publication/iclr2024-msg/</guid>
      <description></description>
    </item>
    
    <item>
      <title>FLM Family</title>
      <link>http://localhost:1313/project/flm/</link>
      <pubDate>Mon, 01 Jan 2024 00:00:00 +0000</pubDate>
      <guid>http://localhost:1313/project/flm/</guid>
      <description>&lt;p&gt;FLM is a large language model jointly developed by the Cognitive Team (Cofe-AI) of BAAI, together with Tsinghua University, ICT, Nanyang Technological University, and University of Electronic Science and Technology. The project aims to develop a cost-effective, fully open-source, and highly effective large model. The FLM series has evolved to its second generation at the current stage.&lt;/p&gt;
&lt;h2 id=&#34;1-flm-2&#34;&gt;1. FLM-2&lt;/h2&gt;
&lt;p&gt;FLM-2 is a more significant attempt. Doing&lt;/p&gt;
&lt;h2 id=&#34;2-flm-101b&#34;&gt;2. FLM-101B&lt;/h2&gt;
&lt;p&gt;FLM-101B inherits the structure of FreeLM and employs the Growth Strategy (&lt;a href=&#34;https://zhuanlan.zhihu.com/p/627026484&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;MSG&lt;/a&gt;) to reduce costs by more than 70%. Additionally, it utilizes &lt;a href=&#34;https://zhuanlan.zhihu.com/p/625356129&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;loss prediction&lt;/a&gt; to determine the optimal hyperparameters. FLM-101B represents a significant milestone, not only validating the feasibility of individual sub-technologies but also successfully implementing them at the system level. Regarding the relationship between FLM-101B and MSG, we perceive it as analogous to the relationship between GPT-3 and the Transformer architecture—it is not merely a matter of scaling up, but rather, it signifies the first successful implementation at the system level.&lt;/p&gt;
&lt;p&gt;For detail, please refer to the &lt;a href=&#34;https://zhuanlan.zhihu.com/p/655875712&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FLM-101B and How to train it with a $100,000 Budget.&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;3-freelm&#34;&gt;3. FreeLM&lt;/h2&gt;
&lt;p&gt;FreeLM is at generation 0, with the objective of validating the feasibility of integrating relevant knowledge learning stages into language model training.&lt;/p&gt;
&lt;p&gt;For detail, please refer to the &lt;a href=&#34;https://zhuanlan.zhihu.com/p/626425789&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FreeLM&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;4-concepts-for-large-modes-development&#34;&gt;4. Concepts for Large Modes Development&lt;/h2&gt;
&lt;p&gt;Our team&amp;rsquo;s philosophy about the development of large models is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Both system capabilities and research capabilities are essential.&lt;/li&gt;
&lt;li&gt;Without system capabilities, it is not possible to develop large models, as it would be impossible to control costs.&lt;/li&gt;
&lt;li&gt;Without research capabilities, one can only follow in the footsteps of others; under the circumstances where the leader in large models chooses to close the source code, it will be impossible to make further breakthroughs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We welcome researchers with strong capabilities in both system capabilities and research capabilities to contact us!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;&lt;font color=purple&gt;Easter egg&lt;/font&gt;&lt;/strong&gt;&lt;/em&gt;: This page was generated by an early version of FLM-2, without further editing.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
