人工智能对齐 (Chinese Wikipedia)

Analysis of information sources in references of the Wikipedia article "人工智能对齐" in Chinese language version.

refsWebsite
Global rank Chinese rank
1st place
1st place
69th place
254th place
5th place
12th place
2nd place
23rd place
11th place
332nd place
1,559th place
848th place
4th place
5th place
551st place
572nd place
low place
low place
6,413th place
7,513th place
18th place
57th place
7th place
31st place
12th place
60th place
616th place
838th place
97th place
122nd place
9,352nd place
low place
1,943rd place
2,036th place
low place
low place
20th place
41st place
6,158th place
5,588th place
low place
low place
low place
low place
432nd place
770th place
low place
low place
low place
low place
2,012th place
low place
low place
low place
731st place
808th place
179th place
275th place
916th place
1,065th place
low place
low place
low place
low place
234th place
227th place
1,185th place
809th place
low place
low place
1,160th place
1,594th place
421st place
761st place
243rd place
550th place
low place
low place
703rd place
1,016th place
79th place
143rd place
1,174th place
1,668th place
49th place
81st place
low place
low place
388th place
546th place
6th place
4th place
3rd place
8th place
low place
low place
low place
6,768th place
low place
low place
low place
low place
low place
low place
low place
4,491st place
6,703rd place
low place
274th place
320th place
low place
low place
415th place
500th place
8,920th place
7,729th place
low place
low place
610th place
388th place
896th place
1,333rd place
222nd place
216th place
low place
3,567th place
2,318th place
2,867th place
3,700th place
4,616th place
low place
low place

80000hours.org

aaai.org

ojs.aaai.org

aclanthology.org

acm.org

dl.acm.org

analyticsindiamag.com

archive.org

arstechnica.com

arxiv.org

  • Hendrycks, Dan; Carlini, Nicholas; Schulman, John; Steinhardt, Jacob. Unsolved Problems in ML Safety. 2022-06-16. arXiv:2109.13916可免费查阅 [cs.LG]. 
  • Carlsmith, Joseph. Is Power-Seeking AI an Existential Risk?. 2022-06-16. arXiv:2206.13353可免费查阅 [cs.CY]. 
  • Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine; Brunskill, Emma; Brynjolfsson, Erik. On the Opportunities and Risks of Foundation Models. Stanford CRFM. 2022-07-12 [2022-12-07]. arXiv:2108.07258可免费查阅. (原始内容存档于2023-02-10). 
  • Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, J.; Hilton, Jacob; Kelton, Fraser; Miller, Luke E.; Simens, Maddie; Askell, Amanda; Welinder, P.; Christiano, P.; Leike, J.; Lowe, Ryan J. Training language models to follow instructions with human feedback. 2022. arXiv:2203.02155可免费查阅 [cs.CL]. 
  • Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter. Reward (Mis)design for Autonomous Driving (PDF). 2022-03-11 [2022-12-07]. arXiv:2104.13906可免费查阅. (原始内容存档 (PDF)于2023-02-10). 
  • Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan. Concrete Problems in AI Safety. 2016-06-21. arXiv:1606.06565可免费查阅 [cs.AI] (英语). 
  • Mohseni, Sina; Wang, Haotao; Yu, Zhiding; Xiao, Chaowei; Wang, Zhangyang; Yadawa, Jay. Taxonomy of Machine Learning Safety: A Survey and Primer. 2022-03-07. arXiv:2106.04823可免费查阅 [cs.LG]. 
  • Manheim, David; Garrabrant, Scott. Categorizing Variants of Goodhart's Law. 2018. arXiv:1803.04585可免费查阅 [cs.AI]. 
  • Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Madotto, Andrea; Fung, Pascale. Survey of Hallucination in Natural Language Generation. 2022-02-01 [2022-12-09]. arXiv:2202.03629可免费查阅. (原始内容存档于2023-02-10). 
  • Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff. Emergent Abilities of Large Language Models. 2022-06-15. arXiv:2206.07682可免费查阅 [cs.CL]. 
  • Leike, Jan; Martic, Miljan; Krakovna, Victoria; Ortega, Pedro A.; Everitt, Tom; Lefrancq, Andrew; Orseau, Laurent; Legg, Shane. AI Safety Gridworlds. 2017-11-28. arXiv:1711.09883可免费查阅 [cs.LG]. 
  • Turner, Alexander Matt; Smith, Logan; Shah, Rohin; Critch, Andrew; Tadepalli, Prasad. Optimal Policies Tend to Seek Power. Neural Information Processing Systems. 2021-12-03, 34 [2022-12-12]. arXiv:1912.01683可免费查阅. (原始内容存档于2023-02-10). 
  • Manheim, David; Garrabrant, Scott. Categorizing Variants of Goodhart's Law. 2018. arXiv:1803.04585可免费查阅 [cs.AI]. 
  • Everitt, Tom; Lea, Gary; Hutter, Marcus. AGI Safety Literature Review. 2018-05-21. arXiv:1805.01109可免费查阅 [cs.AI]. 
  • Hendrycks, Dan; Burns, Collin; Basart, Steven; Critch, Andrew; Li, Jerry; Song, Dawn; Steinhardt, Jacob. Aligning AI With Shared Human Values. International Conference on Learning Representations. 2021-07-24. arXiv:2008.02275可免费查阅. 
  • Perez, Ethan; Huang, Saffron; Song, Francis; Cai, Trevor; Ring, Roman; Aslanides, John; Glaese, Amelia; McAleese, Nat; Irving, Geoffrey. Red Teaming Language Models with Language Models. 2022-02-07. arXiv:2202.03286可免费查阅 [cs.CL]. 
  • Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nisan; Lowe, Ryan; Leike, Jan; Christiano, Paul. Recursively Summarizing Books with Human Feedback. 2021-09-27. arXiv:2109.10862可免费查阅 [cs.CL]. 
  • Christiano, Paul; Shlegeris, Buck; Amodei, Dario. Supervising strong learners by amplifying weak experts. 2018-10-19. arXiv:1810.08575可免费查阅 [cs.LG]. 
  • Leike, Jan; Krueger, David; Everitt, Tom; Martic, Miljan; Maini, Vishal; Legg, Shane. Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871 [cs, stat]. 2018-11-19 [2022-12-14]. (原始内容存档于2022-12-18). 
  • Evans, Owain; Cotton-Barratt, Owen; Finnveden, Lukas; Bales, Adam; Balwit, Avital; Wills, Peter; Righetti, Luca; Saunders, William. Truthful AI: Developing and governing AI that does not lie. 2021-10-13. arXiv:2110.06674可免费查阅 [cs.CY]. 
  • Nakano, Reiichiro; Hilton, Jacob; Balaji, Suchir; Wu, Jeff; Ouyang, Long; Kim, Christina; Hesse, Christopher; Jain, Shantanu; Kosaraju, Vineet; Saunders, William; Jiang, Xu. WebGPT: Browser-assisted question-answering with human feedback. 2022-06-01. arXiv:2112.09332可免费查阅 [cs.CL]. 
  • Menick, Jacob; Trebacz, Maja; Mikulik, Vladimir; Aslanides, John; Song, Francis; Chadwick, Martin; Glaese, Mia; Young, Susannah; Campbell-Gillingham, Lucy; Irving, Geoffrey; McAleese, Nat. Teaching language models to support answers with verified quotes. DeepMind. 2022-03-21 [2022-12-16]. arXiv:2203.11147可免费查阅. (原始内容存档于2023-02-10). 
  • Askell, Amanda; Bai, Yuntao; Chen, Anna; Drain, Dawn; Ganguli, Deep; Henighan, Tom; Jones, Andy; Joseph, Nicholas; Mann, Ben; DasSarma, Nova; Elhage, Nelson. A General Language Assistant as a Laboratory for Alignment. 2021-12-09. arXiv:2112.00861可免费查阅 [cs.CL]. 
  • Everitt, Tom; Lea, Gary; Hutter, Marcus. AGI Safety Literature Review. 1805.01109. 21 May 2018. arXiv:1805.01109可免费查阅. 
  • Demski, Abram; Garrabrant, Scott. Embedded Agency. 6 October 2020. arXiv:1902.09469可免费查阅 [cs.AI]. 
  • Everitt, Tom; Ortega, Pedro A.; Barnes, Elizabeth; Legg, Shane. Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings. 6 September 2019. arXiv:1902.09980可免费查阅 [cs.AI]. 

bbc.com

books.google.com

ca.gov

leginfo.legislature.ca.gov

dagstuhl.de

drops.dagstuhl.de

  • Fürnkranz, Johannes; Hüllermeier, Eyke; Rudin, Cynthia; Slowinski, Roman; Sanner, Scott. Marc Herbstritt. Preference Learning. Dagstuhl Reports. 2014, 4 (3): 27 pages [2022-12-13]. doi:10.4230/DAGREP.4.3.1. (原始内容存档于2023-02-10) (英语). 

deepmind.com

distill.pub

doi.org

edge.org

elsevier.com

linkinghub.elsevier.com

futureoflife.org

  • Future of Life Institute. Asilomar AI Principles. Future of Life Institute. 2017-08-11 [2022-07-18]. (原始内容存档于2022-10-10). 

gcrinstitute.org

georgetown.edu

cset.georgetown.edu

googleusercontent.com

lh3.googleusercontent.com

gov.uk

harvard.edu

ui.adsabs.harvard.edu

infoq.com

jair.org

longtermrisk.org

marktechpost.com

medium.com

deepmindsafetyresearch.medium.com

mit.edu

direct.mit.edu

most.gov.cn

nature.com

neurips.cc

proceedings.neurips.cc

nih.gov

ncbi.nlm.nih.gov

nips.cc

papers.nips.cc

nscai.gov

  • NSCAI Final Report (PDF). Washington, DC: The National Security Commission on Artificial Intelligence. 2021 [2023-01-03]. (原始内容存档 (PDF)于2023-02-15). 

nytimes.com

nyu.edu

bhr.stern.nyu.edu

openai.com

openreview.net

pearson.com

penguinrandomhouse.com

quantamagazine.org

reuters.com

sagepub.com

journals.sagepub.com

science.org

scientificamerican.com

semanticscholar.org

api.semanticscholar.org

smallake.kr

springer.com

link.springer.com

ssrn.com

papers.ssrn.com

stanford.edu

fsi.stanford.edu

technologyreview.com

theguardian.com

theregister.com

towardsdatascience.com

un.org

unite.ai

universitypressscholarship.com

oxford.universitypressscholarship.com

utexas.edu

cs.utexas.edu

venturebeat.com

web.archive.org

wikipedia.org

en.wikipedia.org

wiley.com

onlinelibrary.wiley.com

worldcat.org

wsj.com

wwnorton.co.uk