stat.ML.xml

<rss version="2.0"><channel><title>Chat Arxiv stat.ML</title><link>https://github.com/qhduan/cn-chat-arxiv</link><description>This is arxiv RSS feed for stat.ML</description><item><title>&#26412;&#30740;&#31350;&#25506;&#35752;&#20102;&#22312;&#36716;&#31227;&#23398;&#20064;&#29615;&#22659;&#20013;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#23610;&#24230;&#34892;&#20026;&#65292;&#21457;&#29616;&#24494;&#35843;&#25968;&#25454;&#38598;&#30340;&#22823;&#23567;&#21644;&#39044;&#35757;&#32451;&#25968;&#25454;&#19982;&#19979;&#28216;&#25968;&#25454;&#30340;&#20998;&#24067;&#19968;&#33268;&#24615;&#23545;&#19979;&#28216;&#24615;&#33021;&#26377;&#26174;&#33879;&#24433;&#21709;&#12290;</title><link>https://arxiv.org/abs/2402.04177</link><description>&lt;p&gt;
&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#19979;&#28216;&#20219;&#21153;&#24615;&#33021;&#30340;&#23610;&#24230;&#24459;
&lt;/p&gt;
&lt;p&gt;
Scaling Laws for Downstream Task Performance of Large Language Models
&lt;/p&gt;
&lt;p&gt;
https://arxiv.org/abs/2402.04177
&lt;/p&gt;
&lt;p&gt;
&#26412;&#30740;&#31350;&#25506;&#35752;&#20102;&#22312;&#36716;&#31227;&#23398;&#20064;&#29615;&#22659;&#20013;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#30340;&#23610;&#24230;&#34892;&#20026;&#65292;&#21457;&#29616;&#24494;&#35843;&#25968;&#25454;&#38598;&#30340;&#22823;&#23567;&#21644;&#39044;&#35757;&#32451;&#25968;&#25454;&#19982;&#19979;&#28216;&#25968;&#25454;&#30340;&#20998;&#24067;&#19968;&#33268;&#24615;&#23545;&#19979;&#28216;&#24615;&#33021;&#26377;&#26174;&#33879;&#24433;&#21709;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#23610;&#24230;&#24459;&#25552;&#20379;&#20102;&#37325;&#35201;&#30340;&#35265;&#35299;&#65292;&#21487;&#20197;&#25351;&#23548;&#22823;&#22411;&#35821;&#35328;&#27169;&#22411;&#65288;LLM&#65289;&#30340;&#35774;&#35745;&#12290;&#29616;&#26377;&#30740;&#31350;&#20027;&#35201;&#38598;&#20013;&#22312;&#30740;&#31350;&#39044;&#35757;&#32451;&#65288;&#19978;&#28216;&#65289;&#25439;&#22833;&#30340;&#23610;&#24230;&#24459;&#12290;&#28982;&#32780;&#65292;&#22312;&#36716;&#31227;&#23398;&#20064;&#29615;&#22659;&#20013;&#65292;LLM&#20808;&#22312;&#26080;&#30417;&#30563;&#25968;&#25454;&#38598;&#19978;&#36827;&#34892;&#39044;&#35757;&#32451;&#65292;&#28982;&#21518;&#22312;&#19979;&#28216;&#20219;&#21153;&#19978;&#36827;&#34892;&#24494;&#35843;&#65292;&#25105;&#20204;&#36890;&#24120;&#20063;&#20851;&#24515;&#19979;&#28216;&#24615;&#33021;&#12290;&#22312;&#36825;&#39033;&#24037;&#20316;&#20013;&#65292;&#25105;&#20204;&#30740;&#31350;&#20102;&#22312;&#36716;&#31227;&#23398;&#20064;&#29615;&#22659;&#20013;&#30340;&#23610;&#24230;&#34892;&#20026;&#65292;&#20854;&#20013;LLM&#34987;&#24494;&#35843;&#29992;&#20110;&#26426;&#22120;&#32763;&#35793;&#20219;&#21153;&#12290;&#20855;&#20307;&#32780;&#35328;&#65292;&#25105;&#20204;&#30740;&#31350;&#20102;&#39044;&#35757;&#32451;&#25968;&#25454;&#30340;&#36873;&#25321;&#21644;&#22823;&#23567;&#23545;&#19979;&#28216;&#24615;&#33021;&#65288;&#32763;&#35793;&#36136;&#37327;&#65289;&#30340;&#24433;&#21709;&#65292;&#20351;&#29992;&#20102;&#20004;&#20010;&#35780;&#20215;&#25351;&#26631;&#65306;&#19979;&#28216;&#20132;&#21449;&#29109;&#21644;BLEU&#20998;&#25968;&#12290;&#25105;&#20204;&#30340;&#23454;&#39564;&#35777;&#26126;&#65292;&#24494;&#35843;&#25968;&#25454;&#38598;&#30340;&#22823;&#23567;&#21644;&#39044;&#35757;&#32451;&#25968;&#25454;&#19982;&#19979;&#28216;&#25968;&#25454;&#30340;&#20998;&#24067;&#19968;&#33268;&#24615;&#26174;&#33879;&#24433;&#21709;&#23610;&#24230;&#34892;&#20026;&#12290;&#22312;&#20805;&#20998;&#19968;&#33268;&#24615;&#24773;&#20917;&#19979;&#65292;&#19979;&#28216;&#20132;&#21449;&#29109;&#21644;BLEU&#20998;&#25968;&#37117;&#20250;&#36880;&#28176;&#25552;&#21319;&#12290;
&lt;/p&gt;
&lt;p&gt;
Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. Specifically, we investigate how the choice of the pretraining data and its size affect downstream performance (translation quality) as judged by two metrics: downstream cross-entropy and BLEU score. Our experiments indicate that the size of the finetuning dataset and the distribution alignment between the pretraining and downstream data significantly influence the scaling behavior. With sufficient alignment, both downstream cross-entropy and BLEU score improve monotonically with 
&lt;/p&gt;</description></item><item><title>&#35813;&#30740;&#31350;&#25552;&#20986;&#20102;&#26102;&#38388;&#22343;&#21248;&#32622;&#20449;&#29699;&#24207;&#21015;&#65292;&#21487;&#20197;&#21516;&#26102;&#39640;&#27010;&#29575;&#22320;&#21253;&#21547;&#21508;&#31181;&#26679;&#26412;&#37327;&#19979;&#38543;&#26426;&#21521;&#37327;&#30340;&#22343;&#20540;&#65292;&#24182;&#38024;&#23545;&#19981;&#21516;&#20998;&#24067;&#20551;&#35774;&#36827;&#34892;&#20102;&#25193;&#23637;&#21644;&#32479;&#19968;&#20998;&#26512;&#12290;</title><link>https://arxiv.org/abs/2311.08168</link><description>&lt;p&gt;
&#38543;&#26426;&#21521;&#37327;&#22343;&#20540;&#30340;&#26102;&#38388;&#22343;&#21248;&#32622;&#20449;&#29699;
&lt;/p&gt;
&lt;p&gt;
Time-Uniform Confidence Spheres for Means of Random Vectors
&lt;/p&gt;
&lt;p&gt;
https://arxiv.org/abs/2311.08168
&lt;/p&gt;
&lt;p&gt;
&#35813;&#30740;&#31350;&#25552;&#20986;&#20102;&#26102;&#38388;&#22343;&#21248;&#32622;&#20449;&#29699;&#24207;&#21015;&#65292;&#21487;&#20197;&#21516;&#26102;&#39640;&#27010;&#29575;&#22320;&#21253;&#21547;&#21508;&#31181;&#26679;&#26412;&#37327;&#19979;&#38543;&#26426;&#21521;&#37327;&#30340;&#22343;&#20540;&#65292;&#24182;&#38024;&#23545;&#19981;&#21516;&#20998;&#24067;&#20551;&#35774;&#36827;&#34892;&#20102;&#25193;&#23637;&#21644;&#32479;&#19968;&#20998;&#26512;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#25105;&#20204;&#25512;&#23548;&#24182;&#30740;&#31350;&#20102;&#26102;&#38388;&#22343;&#21248;&#32622;&#20449;&#29699;&#8212;&#8212;&#21253;&#21547;&#38543;&#26426;&#21521;&#37327;&#22343;&#20540;&#24182;&#19988;&#36328;&#36234;&#25152;&#26377;&#26679;&#26412;&#37327;&#20855;&#26377;&#24456;&#39640;&#27010;&#29575;&#30340;&#32622;&#20449;&#29699;&#24207;&#21015;&#65288;CSSs&#65289;&#12290;&#21463;Catoni&#21644;Giulini&#21407;&#22987;&#24037;&#20316;&#21551;&#21457;&#65292;&#25105;&#20204;&#32479;&#19968;&#24182;&#25193;&#23637;&#20102;&#20182;&#20204;&#30340;&#20998;&#26512;&#65292;&#28085;&#30422;&#39034;&#24207;&#35774;&#32622;&#24182;&#22788;&#29702;&#21508;&#31181;&#20998;&#24067;&#20551;&#35774;&#12290;&#25105;&#20204;&#30340;&#32467;&#26524;&#21253;&#25324;&#26377;&#30028;&#38543;&#26426;&#21521;&#37327;&#30340;&#32463;&#39564;&#20271;&#24681;&#26031;&#22374;CSS&#65288;&#23548;&#33268;&#26032;&#39062;&#30340;&#32463;&#39564;&#20271;&#24681;&#26031;&#22374;&#32622;&#20449;&#21306;&#38388;&#65292;&#28176;&#36817;&#23485;&#24230;&#25353;&#29031;&#30495;&#23454;&#26410;&#30693;&#26041;&#24046;&#25104;&#27604;&#20363;&#32553;&#25918;&#65289;&#12289;&#29992;&#20110;&#23376;-$\psi$&#38543;&#26426;&#21521;&#37327;&#30340;CSS&#65288;&#21253;&#25324;&#23376;&#20285;&#39532;&#12289;&#23376;&#27850;&#26494;&#21644;&#23376;&#25351;&#25968;&#20998;&#24067;&#65289;&#12289;&#21644;&#29992;&#20110;&#37325;&#23614;&#38543;&#26426;&#21521;&#37327;&#65288;&#20165;&#26377;&#20004;&#38454;&#30697;&#65289;&#30340;CSS&#12290;&#26368;&#21518;&#65292;&#25105;&#20204;&#25552;&#20379;&#20102;&#20004;&#20010;&#25269;&#25239;Huber&#22122;&#22768;&#27745;&#26579;&#30340;CSS&#12290;&#31532;&#19968;&#20010;&#26159;&#25105;&#20204;&#32463;&#39564;&#20271;&#24681;&#26031;&#22374;CSS&#30340;&#40065;&#26834;&#29256;&#26412;&#65292;&#31532;&#20108;&#20010;&#25193;&#23637;&#20102;&#21333;&#21464;&#37327;&#24207;&#21015;&#26368;&#36817;&#30340;&#24037;&#20316;&#12290;
&lt;/p&gt;
&lt;p&gt;
arXiv:2311.08168v2 Announce Type: replace-cross  Abstract: We derive and study time-uniform confidence spheres -- confidence sphere sequences (CSSs) -- which contain the mean of random vectors with high probability simultaneously across all sample sizes. Inspired by the original work of Catoni and Giulini, we unify and extend their analysis to cover both the sequential setting and to handle a variety of distributional assumptions. Our results include an empirical-Bernstein CSS for bounded random vectors (resulting in a novel empirical-Bernstein confidence interval with asymptotic width scaling proportionally to the true unknown variance), CSSs for sub-$\psi$ random vectors (which includes sub-gamma, sub-Poisson, and sub-exponential), and CSSs for heavy-tailed random vectors (two moments only). Finally, we provide two CSSs that are robust to contamination by Huber noise. The first is a robust version of our empirical-Bernstein CSS, and the second extends recent work in the univariate se
&lt;/p&gt;</description></item><item><title>BART-SIMP&#26159;&#19968;&#31181;&#28789;&#27963;&#30340;&#31354;&#38388;&#21327;&#21464;&#24314;&#27169;&#21644;&#39044;&#27979;&#30340;&#26032;&#26694;&#26550;&#65292;&#36890;&#36807;&#32467;&#21512;&#39640;&#26031;&#36807;&#31243;&#31354;&#38388;&#27169;&#22411;&#21644;&#36125;&#21494;&#26031;&#21152;&#27861;&#22238;&#24402;&#26641;&#27169;&#22411;&#65292;&#21487;&#20197;&#25552;&#20379;&#21487;&#38752;&#30340;&#19981;&#30830;&#23450;&#24615;&#20272;&#35745;&#65292;&#24182;&#25104;&#21151;&#24212;&#29992;&#20110;&#32943;&#23612;&#20122;&#23478;&#24237;&#38598;&#32676;&#26679;&#26412;&#20013;&#30340;&#20154;&#20307;&#27979;&#37327;&#21709;&#24212;&#39044;&#27979;&#12290;</title><link>http://arxiv.org/abs/2309.13270</link><description>&lt;p&gt;
BART-SIMP&#65306;&#19968;&#31181;&#28789;&#27963;&#30340;&#31354;&#38388;&#21327;&#21464;&#24314;&#27169;&#21644;&#39044;&#27979;&#30340;&#26032;&#26694;&#26550;&#20351;&#29992;&#36125;&#21494;&#26031;&#21152;&#27861;&#22238;&#24402;&#26641;
&lt;/p&gt;
&lt;p&gt;
BART-SIMP: a novel framework for flexible spatial covariate modeling and prediction using Bayesian additive regression trees. (arXiv:2309.13270v1 [stat.ME])
&lt;/p&gt;
&lt;p&gt;
http://arxiv.org/abs/2309.13270
&lt;/p&gt;
&lt;p&gt;
BART-SIMP&#26159;&#19968;&#31181;&#28789;&#27963;&#30340;&#31354;&#38388;&#21327;&#21464;&#24314;&#27169;&#21644;&#39044;&#27979;&#30340;&#26032;&#26694;&#26550;&#65292;&#36890;&#36807;&#32467;&#21512;&#39640;&#26031;&#36807;&#31243;&#31354;&#38388;&#27169;&#22411;&#21644;&#36125;&#21494;&#26031;&#21152;&#27861;&#22238;&#24402;&#26641;&#27169;&#22411;&#65292;&#21487;&#20197;&#25552;&#20379;&#21487;&#38752;&#30340;&#19981;&#30830;&#23450;&#24615;&#20272;&#35745;&#65292;&#24182;&#25104;&#21151;&#24212;&#29992;&#20110;&#32943;&#23612;&#20122;&#23478;&#24237;&#38598;&#32676;&#26679;&#26412;&#20013;&#30340;&#20154;&#20307;&#27979;&#37327;&#21709;&#24212;&#39044;&#27979;&#12290;
&lt;/p&gt;
&lt;p&gt;

&lt;/p&gt;
&lt;p&gt;
&#22312;&#31354;&#38388;&#32479;&#35745;&#23398;&#20013;&#65292;&#39044;&#27979;&#26159;&#19968;&#20010;&#32463;&#20856;&#30340;&#25361;&#25112;&#65292;&#23558;&#31354;&#38388;&#21327;&#21464;&#37327;&#32435;&#20837;&#20855;&#26377;&#28508;&#22312;&#31354;&#38388;&#25928;&#24212;&#30340;&#27169;&#22411;&#20013;&#21487;&#20197;&#26497;&#22823;&#22320;&#25552;&#39640;&#39044;&#27979;&#24615;&#33021;&#12290;&#25105;&#20204;&#24076;&#26395;&#24320;&#21457;&#20986;&#28789;&#27963;&#30340;&#22238;&#24402;&#27169;&#22411;&#65292;&#20801;&#35768;&#22312;&#21327;&#21464;&#37327;&#32467;&#26500;&#20013;&#23384;&#22312;&#38750;&#32447;&#24615;&#21644;&#20132;&#20114;&#20316;&#29992;&#12290;&#26426;&#22120;&#23398;&#20064;&#27169;&#22411;&#24050;&#32463;&#22312;&#31354;&#38388;&#29615;&#22659;&#20013;&#25552;&#20986;&#65292;&#20801;&#35768;&#27531;&#24046;&#20013;&#23384;&#22312;&#31354;&#38388;&#20381;&#36182;&#24615;&#65292;&#20294;&#26080;&#27861;&#25552;&#20379;&#21487;&#38752;&#30340;&#19981;&#30830;&#23450;&#24615;&#20272;&#35745;&#12290;&#22312;&#26412;&#25991;&#20013;&#65292;&#25105;&#20204;&#30740;&#31350;&#20102;&#39640;&#26031;&#36807;&#31243;&#31354;&#38388;&#27169;&#22411;&#21644;&#36125;&#21494;&#26031;&#21152;&#27861;&#22238;&#24402;&#26641;&#65288;BART&#65289;&#27169;&#22411;&#30340;&#26032;&#32452;&#21512;&#12290;&#36890;&#36807;&#23558;&#39532;&#23572;&#21487;&#22827;&#38142;&#33945;&#29305;&#21345;&#27931;&#65288;MCMC&#65289;&#19982;&#23884;&#22871;&#25289;&#26222;&#25289;&#26031;&#36817;&#20284;&#65288;INLA&#65289;&#25216;&#26415;&#30456;&#32467;&#21512;&#65292;&#38477;&#20302;&#20102;&#26041;&#27861;&#30340;&#35745;&#31639;&#36127;&#25285;&#12290;&#25105;&#20204;&#36890;&#36807;&#27169;&#25311;&#30740;&#31350;&#20102;&#35813;&#26041;&#27861;&#30340;&#24615;&#33021;&#65292;&#24182;&#20351;&#29992;&#35813;&#27169;&#22411;&#39044;&#27979;&#22312;&#32943;&#23612;&#20122;&#23478;&#24237;&#38598;&#32676;&#26679;&#26412;&#20013;&#25910;&#38598;&#30340;&#20154;&#20307;&#27979;&#37327;&#21709;&#24212;&#12290;
&lt;/p&gt;
&lt;p&gt;
Prediction is a classic challenge in spatial statistics and the inclusion of spatial covariates can greatly improve predictive performance when incorporated into a model with latent spatial effects. It is desirable to develop flexible regression models that allow for nonlinearities and interactions in the covariate structure. Machine learning models have been suggested in the spatial context, allowing for spatial dependence in the residuals, but fail to provide reliable uncertainty estimates. In this paper, we investigate a novel combination of a Gaussian process spatial model and a Bayesian Additive Regression Tree (BART) model. The computational burden of the approach is reduced by combining Markov chain Monte Carlo (MCMC) with the Integrated Nested Laplace Approximation (INLA) technique. We study the performance of the method via simulations and use the model to predict anthropometric responses, collected via household cluster samples in Kenya.
&lt;/p&gt;</description></item></channel></rss>