人工智能对齐 (Chinese Wikipedia)

Analysis of information sources in references of the Wikipedia article "人工智能对齐" in Chinese language version.

refsWebsite

Global rank Chinese rank

90web.archive.org

1^st place

26arxiv.org

69^th place

254^th place

24worldcat.org

5^th place

12^th place

22doi.org

2^nd place

23^rd place

16semanticscholar.org

11^th place

332^nd place

8openai.com

1,559^th place

848^th place

4nih.gov

4^th place

5^th place

4medium.com

551^st place

572^nd place

4deepmind.com

low place

3quantamagazine.org

6,413^th place

7,513^th place

3harvard.edu

18^th place

57^th place

3nytimes.com

7^th place

31^st place

3theguardian.com

12^th place

60^th place

3venturebeat.com

616^th place

838^th place

2un.org

97^th place

122^nd place

2aaai.org

9,352^nd place

low place

2technologyreview.com

1,943^rd place

2,036^th place

2aclanthology.org

low place

2bbc.com

20^th place

41^st place

2infoq.com

6,158^th place

5,588^th place

2jair.org

low place

2neurips.cc

low place

2gov.uk

432^nd place

770^th place

1wikipedia.org

low place

1pearson.com

low place

1penguinrandomhouse.com

2,012^th place

low place

1openreview.net

low place

1sagepub.com

731^st place

808^th place

1stanford.edu

179^th place

275^th place

1utexas.edu

916^th place

1,065^th place

1futureoflife.org

low place

1longtermrisk.org

low place

1nature.com

234^th place

227^th place

1acm.org

1,185^th place

809^th place

1distill.pub

low place

1science.org

1,160^th place

1,594^th place

1ca.gov

421^st place

761^st place

1googleusercontent.com

243^rd place

550^th place

1edge.org

low place

1ssrn.com

703^rd place

1,016^th place

1wsj.com

79^th place

143^rd place

1nyu.edu

1,174^th place

1,668^th place

1reuters.com

49^th place

81^st place

1gcrinstitute.org

low place

1arstechnica.com

388^th place

546^th place

1archive.org

6^th place

4^th place

1books.google.com

3^rd place

8^th place

1wwnorton.co.uk

low place

1nips.cc

low place

6,768^th place

1smallake.kr

low place

1dagstuhl.de

low place

1unite.ai

low place

1analyticsindiamag.com

low place

4,491^st place

1universitypressscholarship.com

6,703^rd place

low place

1springer.com

274^th place

320^th place

180000hours.org

low place

1mit.edu

415^th place

500^th place

1towardsdatascience.com

8,920^th place

7,729^th place

1marktechpost.com

low place

1elsevier.com

610^th place

388^th place

1scientificamerican.com

896^th place

1,333^rd place

1wiley.com

222^nd place

216^th place

1most.gov.cn

low place

3,567^th place

1georgetown.edu

2,318^th place

2,867^th place

1theregister.com

3,700^th place

4,616^th place

1nscai.gov

low place

80000hours.org

Wiblin, Robert. Dr Paul Christiano on how OpenAI is developing real solutions to the ‘AI alignment problem’, and his vision of how humanity will progressively hand over decision-making to AI systems (播客). 80,000 hours. October 2, 2018 [2022-07-23]. （原始内容存档于2022-12-14）.

aaai.org

ojs.aaai.org

Russell, Stuart; Dewey, Daniel; Tegmark, Max. Research Priorities for Robust and Beneficial Artificial Intelligence. AI Magazine. 2015-12-31, 36 (4): 105–114 [2022-12-08]. ISSN 2371-9621. S2CID 8174496. doi:10.1609/aimag.v36i4.2577. （原始内容存档于2023-02-02）.
McCarthy, John; Minsky, Marvin L.; Rochester, Nathaniel; Shannon, Claude E. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Magazine. 2006-12-15, 27 (4): 12 [2022-12-21]. ISSN 2371-9621. S2CID 19439915. doi:10.1609/aimag.v27i4.1904. （原始内容存档于2023-01-31）（英语）.

aclanthology.org

Lin, Stephanie; Hilton, Jacob; Evans, Owain. TruthfulQA: Measuring How Models Mimic Human Falsehoods. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Dublin, Ireland: Association for Computational Linguistics). 2022: 3214–3252 [2022-12-09]. S2CID 237532606. doi:10.18653/v1/2022.acl-long.229. （原始内容存档于2023-02-10）（英语）.
Shuster, Kurt; Poff, Spencer; Chen, Moya; Kiela, Douwe; Weston, Jason. Retrieval Augmentation Reduces Hallucination in Conversation. Findings of the Association for Computational Linguistics: EMNLP 2021. EMNLP-Findings 2021. Punta Cana, Dominican Republic: Association for Computational Linguistics: 3784–3803. November 2021 [2022-07-23]. doi:10.18653/v1/2021.findings-emnlp.320. （原始内容存档于2023-02-10）.

acm.org

dl.acm.org

Prunkl, Carina; Whittlestone, Jess. Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (New York NY USA: ACM). 2020-02-07: 138–143 [2022-12-08]. ISBN 978-1-4503-7110-0. S2CID 210164673. doi:10.1145/3375627.3375803. （原始内容存档于2022-10-16）（英语）.

analyticsindiamag.com

Bhattacharyya, Sreejani. DeepMind's "red teaming" language models with language models: What is it?. Analytics India Magazine. 2022-02-14 [2022-07-23]. （原始内容存档于2023-02-13）.

archive.org

Bostrom, Nick. Superintelligence: Paths, Dangers, Strategies 1st. USA: Oxford University Press, Inc. 2014. ISBN 978-0-19-967811-2.

arstechnica.com

Edwards, Ben. Adept's AI assistant can browse, search, and use web apps like a human. Ars Technica. 2022-04-26 [2022-09-09]. （原始内容存档于2023-01-17）.

arxiv.org

Hendrycks, Dan; Carlini, Nicholas; Schulman, John; Steinhardt, Jacob. Unsolved Problems in ML Safety. 2022-06-16. arXiv:2109.13916  [cs.LG].
Carlsmith, Joseph. Is Power-Seeking AI an Existential Risk?. 2022-06-16. arXiv:2206.13353  [cs.CY].
Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine; Brunskill, Emma; Brynjolfsson, Erik. On the Opportunities and Risks of Foundation Models. Stanford CRFM. 2022-07-12 [2022-12-07]. arXiv:2108.07258 . （原始内容存档于2023-02-10）.
Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, J.; Hilton, Jacob; Kelton, Fraser; Miller, Luke E.; Simens, Maddie; Askell, Amanda; Welinder, P.; Christiano, P.; Leike, J.; Lowe, Ryan J. Training language models to follow instructions with human feedback. 2022. arXiv:2203.02155  [cs.CL].
Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter. Reward (Mis)design for Autonomous Driving (PDF). 2022-03-11 [2022-12-07]. arXiv:2104.13906 . （原始内容存档 (PDF)于2023-02-10）.
Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan. Concrete Problems in AI Safety. 2016-06-21. arXiv:1606.06565  [cs.AI] （英语）.
Mohseni, Sina; Wang, Haotao; Yu, Zhiding; Xiao, Chaowei; Wang, Zhangyang; Yadawa, Jay. Taxonomy of Machine Learning Safety: A Survey and Primer. 2022-03-07. arXiv:2106.04823  [cs.LG].
Manheim, David; Garrabrant, Scott. Categorizing Variants of Goodhart's Law. 2018. arXiv:1803.04585  [cs.AI].
Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Madotto, Andrea; Fung, Pascale. Survey of Hallucination in Natural Language Generation. 2022-02-01 [2022-12-09]. arXiv:2202.03629 . （原始内容存档于2023-02-10）.
Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff. Emergent Abilities of Large Language Models. 2022-06-15. arXiv:2206.07682  [cs.CL].
Leike, Jan; Martic, Miljan; Krakovna, Victoria; Ortega, Pedro A.; Everitt, Tom; Lefrancq, Andrew; Orseau, Laurent; Legg, Shane. AI Safety Gridworlds. 2017-11-28. arXiv:1711.09883  [cs.LG].
Turner, Alexander Matt; Smith, Logan; Shah, Rohin; Critch, Andrew; Tadepalli, Prasad. Optimal Policies Tend to Seek Power. Neural Information Processing Systems. 2021-12-03, 34 [2022-12-12]. arXiv:1912.01683 . （原始内容存档于2023-02-10）.
Manheim, David; Garrabrant, Scott. Categorizing Variants of Goodhart's Law. 2018. arXiv:1803.04585  [cs.AI].
Everitt, Tom; Lea, Gary; Hutter, Marcus. AGI Safety Literature Review. 2018-05-21. arXiv:1805.01109  [cs.AI].
Hendrycks, Dan; Burns, Collin; Basart, Steven; Critch, Andrew; Li, Jerry; Song, Dawn; Steinhardt, Jacob. Aligning AI With Shared Human Values. International Conference on Learning Representations. 2021-07-24. arXiv:2008.02275 .
Perez, Ethan; Huang, Saffron; Song, Francis; Cai, Trevor; Ring, Roman; Aslanides, John; Glaese, Amelia; McAleese, Nat; Irving, Geoffrey. Red Teaming Language Models with Language Models. 2022-02-07. arXiv:2202.03286  [cs.CL].
Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nisan; Lowe, Ryan; Leike, Jan; Christiano, Paul. Recursively Summarizing Books with Human Feedback. 2021-09-27. arXiv:2109.10862  [cs.CL].
Christiano, Paul; Shlegeris, Buck; Amodei, Dario. Supervising strong learners by amplifying weak experts. 2018-10-19. arXiv:1810.08575  [cs.LG].
Leike, Jan; Krueger, David; Everitt, Tom; Martic, Miljan; Maini, Vishal; Legg, Shane. Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871 [cs, stat]. 2018-11-19 [2022-12-14]. （原始内容存档于2022-12-18）.
Evans, Owain; Cotton-Barratt, Owen; Finnveden, Lukas; Bales, Adam; Balwit, Avital; Wills, Peter; Righetti, Luca; Saunders, William. Truthful AI: Developing and governing AI that does not lie. 2021-10-13. arXiv:2110.06674  [cs.CY].
Nakano, Reiichiro; Hilton, Jacob; Balaji, Suchir; Wu, Jeff; Ouyang, Long; Kim, Christina; Hesse, Christopher; Jain, Shantanu; Kosaraju, Vineet; Saunders, William; Jiang, Xu. WebGPT: Browser-assisted question-answering with human feedback. 2022-06-01. arXiv:2112.09332  [cs.CL].
Menick, Jacob; Trebacz, Maja; Mikulik, Vladimir; Aslanides, John; Song, Francis; Chadwick, Martin; Glaese, Mia; Young, Susannah; Campbell-Gillingham, Lucy; Irving, Geoffrey; McAleese, Nat. Teaching language models to support answers with verified quotes. DeepMind. 2022-03-21 [2022-12-16]. arXiv:2203.11147 . （原始内容存档于2023-02-10）.
Askell, Amanda; Bai, Yuntao; Chen, Anna; Drain, Dawn; Ganguli, Deep; Henighan, Tom; Jones, Andy; Joseph, Nicholas; Mann, Ben; DasSarma, Nova; Elhage, Nelson. A General Language Assistant as a Laboratory for Alignment. 2021-12-09. arXiv:2112.00861  [cs.CL].
Everitt, Tom; Lea, Gary; Hutter, Marcus. AGI Safety Literature Review. 1805.01109. 21 May 2018. arXiv:1805.01109 .
Demski, Abram; Garrabrant, Scott. Embedded Agency. 6 October 2020. arXiv:1902.09469  [cs.AI].
Everitt, Tom; Ortega, Pedro A.; Barnes, Elizabeth; Legg, Shane. Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings. 6 September 2019. arXiv:1902.09980  [cs.AI].

bbc.com

Wakefield, Jane. DeepMind AI rivals average human competitive coder. BBC News. 2022-02-02 [2022-09-09]. （原始内容存档于2023-02-10）.
Wakefield, Jane. Intelligent Machines: Do we really need to fear AI?. BBC News. 27 September 2015 [9 February 2021]. （原始内容存档于8 November 2020）.

books.google.com

Rochon, Louis-Philippe; Rossi, Sergio. The Encyclopedia of Central Banking. Edward Elgar Publishing. 2015-02-27 [2022-12-12]. ISBN 978-1-78254-744-0. （原始内容存档于2023-02-10）（英语）.

ca.gov

leginfo.legislature.ca.gov

California Assembly. Bill Text - ACR-215 23 Asilomar AI Principles.. [2022-07-18]. （原始内容存档于2023-02-10）.

dagstuhl.de

drops.dagstuhl.de

Fürnkranz, Johannes; Hüllermeier, Eyke; Rudin, Cynthia; Slowinski, Roman; Sanner, Scott. Marc Herbstritt. Preference Learning. Dagstuhl Reports. 2014, 4 (3): 27 pages [2022-12-13]. doi:10.4230/DAGREP.4.3.1. （原始内容存档于2023-02-10）（英语）.

deepmind.com

Krakovna, Victoria; Uesato, Jonathan; Mikulik, Vladimir; Rahtz, Matthew; Everitt, Tom; Kumar, Ramana; Kenton, Zac; Leike, Jan; Legg, Shane. Specification gaming: the flip side of AI ingenuity. Deepmind. 2020-04-21 [2022-08-26]. （原始内容存档于2023-02-10）.
Orseau, Laurent; Armstrong, Stuart. Safely Interruptible Agents. 2016-01-01 [2022-07-20]. （原始内容存档于2023-02-10）.
Menick, Jacob; Trebacz, Maja; Mikulik, Vladimir; Aslanides, John; Song, Francis; Chadwick, Martin; Glaese, Mia; Young, Susannah; Campbell-Gillingham, Lucy; Irving, Geoffrey; McAleese, Nat. Teaching language models to support answers with verified quotes. DeepMind. 2022-03-21 [2022-12-16]. arXiv:2203.11147 . （原始内容存档于2023-02-10）.
Krakovna, Victoria; Legg, Shane. Specification gaming: the flip side of AI ingenuity. Deepmind. [6 January 2021]. （原始内容存档于26 January 2021）.

distill.pub

Irving, Geoffrey; Askell, Amanda. AI Safety Needs Social Scientists. Distill. 2019-02-19, 4 (2): 10.23915/distill.00014 [2022-12-08]. ISSN 2476-0757. S2CID 159180422. doi:10.23915/distill.00014. （原始内容存档于2023-02-10）.

doi.org

Gabriel, Iason. Artificial Intelligence, Values, and Alignment. Minds and Machines. 2020-09-01, 30 (3): 411–437 [2022-07-23]. ISSN 1572-8641. S2CID 210920551. doi:10.1007/s11023-020-09539-2. （原始内容存档于2023-03-15）.
Kober, Jens; Bagnell, J. Andrew; Peters, Jan. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research. 2013-09-01, 32 (11): 1238–1274 [2022-12-07]. ISSN 0278-3649. S2CID 1932843. doi:10.1177/0278364913495721. （原始内容存档于2022-10-15）（英语）.
Stray, Jonathan. Aligning AI Optimization to Community Well-Being. International Journal of Community Well-Being. 2020, 3 (4): 443–463. ISSN 2524-5295. PMC 7610010 . PMID 34723107. S2CID 226254676. doi:10.1007/s42413-020-00086-3 （英语）.
Russell, Stuart; Dewey, Daniel; Tegmark, Max. Research Priorities for Robust and Beneficial Artificial Intelligence. AI Magazine. 2015-12-31, 36 (4): 105–114 [2022-12-08]. ISSN 2371-9621. S2CID 8174496. doi:10.1609/aimag.v36i4.2577. （原始内容存档于2023-02-02）.
Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore. Cooperative AI: machines must learn to find common ground. Nature. 2021-05-06, 593 (7857): 33–36 [2022-12-08]. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. S2CID 233740521. doi:10.1038/d41586-021-01170-0. （原始内容存档于2022-12-18）（英语）.
Prunkl, Carina; Whittlestone, Jess. Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (New York NY USA: ACM). 2020-02-07: 138–143 [2022-12-08]. ISBN 978-1-4503-7110-0. S2CID 210164673. doi:10.1145/3375627.3375803. （原始内容存档于2022-10-16）（英语）.
Irving, Geoffrey; Askell, Amanda. AI Safety Needs Social Scientists. Distill. 2019-02-19, 4 (2): 10.23915/distill.00014 [2022-12-08]. ISSN 2476-0757. S2CID 159180422. doi:10.23915/distill.00014. （原始内容存档于2023-02-10）.
Wiener, Norbert. Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers.. Science. 1960-05-06, 131 (3410): 1355–1358 [2022-12-09]. ISSN 0036-8075. PMID 17841602. doi:10.1126/science.131.3410.1355. （原始内容存档于2022-10-15）（英语）.
Lin, Stephanie; Hilton, Jacob; Evans, Owain. TruthfulQA: Measuring How Models Mimic Human Falsehoods. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Dublin, Ireland: Association for Computational Linguistics). 2022: 3214–3252 [2022-12-09]. S2CID 237532606. doi:10.18653/v1/2022.acl-long.229. （原始内容存档于2023-02-10）（英语）.
Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain. Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts. Journal of Artificial Intelligence Research. 2018-07-31, 62: 729–754 [2022-12-09]. ISSN 1076-9757. S2CID 8746462. doi:10.1613/jair.1.11222. （原始内容存档于2023-02-10）.
Zhang, Baobao; Anderljung, Markus; Kahn, Lauren; Dreksler, Noemi; Horowitz, Michael C.; Dafoe, Allan. Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers. Journal of Artificial Intelligence Research. 2021-08-02, 71 [2022-12-09]. ISSN 1076-9757. S2CID 233740003. doi:10.1613/jair.1.12895. （原始内容存档于2023-02-10）.
Hadfield-Menell, Dylan; Dragan, Anca; Abbeel, Pieter; Russell, Stuart. The Off-Switch Game. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17: 220–227. 2017. doi:10.24963/ijcai.2017/32.
Fürnkranz, Johannes; Hüllermeier, Eyke; Rudin, Cynthia; Slowinski, Roman; Sanner, Scott. Marc Herbstritt. Preference Learning. Dagstuhl Reports. 2014, 4 (3): 27 pages [2022-12-13]. doi:10.4230/DAGREP.4.3.1. （原始内容存档于2023-02-10）（英语）.
Wiegel, Vincent. Wendell Wallach and Colin Allen: moral machines: teaching robots right from wrong. Ethics and Information Technology. 2010-12-01, 12 (4): 359–361 [2022-07-23]. ISSN 1572-8439. S2CID 30532107. doi:10.1007/s10676-010-9239-1. （原始内容存档于2023-03-15）.
Gabriel, Iason. Artificial Intelligence, Values, and Alignment. Minds and Machines. 2020-09-01, 30 (3) [2022-12-07]. ISSN 1572-8641. doi:10.1007/s11023-020-09539-2. （原始内容存档于2023-03-15）（英语）.
Banzhaf, Wolfgang; Goodman, Erik; Sheneman, Leigh; Trujillo, Leonardo; Worzel, Bill (编). Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Cham: Springer International Publishing. 2020 [2022-07-23]. ISBN 978-3-030-39957-3. S2CID 218531292. doi:10.1007/978-3-030-39958-0. （原始内容存档于2023-03-15）.
Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume; Bryson, David M.; Cheney, Nick. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. Artificial Life. 2020, 26 (2): 274–306 [2022-12-14]. ISSN 1064-5462. PMID 32271631. S2CID 4519185. doi:10.1162/artl_a_00319. （原始内容存档于2022-10-10）（英语）.
Shuster, Kurt; Poff, Spencer; Chen, Moya; Kiela, Douwe; Weston, Jason. Retrieval Augmentation Reduces Hallucination in Conversation. Findings of the Association for Computational Linguistics: EMNLP 2021. EMNLP-Findings 2021. Punta Cana, Dominican Republic: Association for Computational Linguistics: 3784–3803. November 2021 [2022-07-23]. doi:10.18653/v1/2021.findings-emnlp.320. （原始内容存档于2023-02-10）.
Zhang, Xiaoge; Chan, Felix T.S.; Yan, Chao; Bose, Indranil. Towards risk-aware artificial intelligence and machine learning systems: An overview. Decision Support Systems. 2022, 159: 113800 [2022-12-16]. S2CID 248585546. doi:10.1016/j.dss.2022.113800. （原始内容存档于2023-02-10）（英语）.
McCarthy, John; Minsky, Marvin L.; Rochester, Nathaniel; Shannon, Claude E. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Magazine. 2006-12-15, 27 (4): 12 [2022-12-21]. ISSN 2371-9621. S2CID 19439915. doi:10.1609/aimag.v27i4.1904. （原始内容存档于2023-01-31）（英语）.
Cohen, Michael K.; Hutter, Marcus; Osborne, Michael A. Advanced artificial agents intervene in the provision of reward. AI Magazine. 2022-08-29, 43 (3): 282–293 [2023-01-03]. ISSN 0738-4602. S2CID 235489158. doi:10.1002/aaai.12064. （原始内容存档于2023-02-10）（英语）.
Sotala, Kaj; Yampolskiy, Roman. Responses to catastrophic AGI risk: a survey. Physica Scripta. 19 December 2014, 90 (1): 018001. Bibcode:2015PhyS...90a8001S. doi:10.1088/0031-8949/90/1/018001 .

edge.org

Edge.org. The Myth Of AI |Edge.org. [2022-07-19]. （原始内容存档于2023-02-10）.

elsevier.com

linkinghub.elsevier.com

Zhang, Xiaoge; Chan, Felix T.S.; Yan, Chao; Bose, Indranil. Towards risk-aware artificial intelligence and machine learning systems: An overview. Decision Support Systems. 2022, 159: 113800 [2022-12-16]. S2CID 248585546. doi:10.1016/j.dss.2022.113800. （原始内容存档于2023-02-10）（英语）.

futureoflife.org

Future of Life Institute. Asilomar AI Principles. Future of Life Institute. 2017-08-11 [2022-07-18]. （原始内容存档于2022-10-10）.

gcrinstitute.org

Baum, Seth. 2020 Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy. 2021-01-01 [2022-07-20]. （原始内容存档于2023-02-10）.

georgetown.edu

cset.georgetown.edu

PRC Ministry of Science and Technology. Ethical Norms for New Generation Artificial Intelligence Released, 2021. A translation （页面存档备份，存于互联网档案馆） by Center for Security and Emerging Technology

googleusercontent.com

lh3.googleusercontent.com

Misaligned boat racing AI crashes to collect points instead of finishing the race (GIF). [2022-12-09]. （原始内容存档于2022-09-09）.

gov.uk

"The government takes the long term risk of non-aligned Artificial General Intelligence, and the unforeseeable changes that it would mean for the UK and the world, seriously." (The National AI Strategy of the UK （页面存档备份，存于互联网档案馆）, 2021)
The National AI Strategy of the UK （页面存档备份，存于互联网档案馆）, 2021 (actions 9 and 10 of the section "Pillar 3 - Governing AI Effectively")

harvard.edu

ui.adsabs.harvard.edu

Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore. Cooperative AI: machines must learn to find common ground. Nature. 2021-05-06, 593 (7857): 33–36 [2022-12-08]. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. S2CID 233740521. doi:10.1038/d41586-021-01170-0. （原始内容存档于2022-12-18）（英语）.
Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Madotto, Andrea; Fung, Pascale. Survey of Hallucination in Natural Language Generation. 2022-02-01 [2022-12-09]. arXiv:2202.03629 . （原始内容存档于2023-02-10）.
Sotala, Kaj; Yampolskiy, Roman. Responses to catastrophic AGI risk: a survey. Physica Scripta. 19 December 2014, 90 (1): 018001. Bibcode:2015PhyS...90a8001S. doi:10.1088/0031-8949/90/1/018001 .

infoq.com

Dominguez, Daniel. DeepMind Introduces Gato, a New Generalist AI Agent. InfoQ. 2022-05-19 [2022-09-09]. （原始内容存档于2023-02-10）.
Alford, Anthony. EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J. InfoQ. 2021-07-13 [2022-07-23]. （原始内容存档于2023-02-10）.

jair.org

Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain. Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts. Journal of Artificial Intelligence Research. 2018-07-31, 62: 729–754 [2022-12-09]. ISSN 1076-9757. S2CID 8746462. doi:10.1613/jair.1.11222. （原始内容存档于2023-02-10）.
Zhang, Baobao; Anderljung, Markus; Kahn, Lauren; Dreksler, Noemi; Horowitz, Michael C.; Dafoe, Allan. Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers. Journal of Artificial Intelligence Research. 2021-08-02, 71 [2022-12-09]. ISSN 1076-9757. S2CID 233740003. doi:10.1613/jair.1.12895. （原始内容存档于2023-02-10）.

longtermrisk.org

Clifton, Jesse. Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda. Center on Long-Term Risk. 2020 [2022-07-18]. （原始内容存档于2023-01-01）.

marktechpost.com

Kumar, Nitish. OpenAI Researchers Find Ways To More Accurately Answer Open-Ended Questions Using A Text-Based Web Browser. MarkTechPost. 2021-12-23 [2022-07-23]. （原始内容存档于2023-02-10）.

medium.com

deepmindsafetyresearch.medium.com

Ortega, Pedro A.; Maini, Vishal; DeepMind safety team. Building safe artificial intelligence: specification, robustness, and assurance. DeepMind Safety Research - Medium. 2018-09-27 [2022-07-18]. （原始内容存档于2023-02-10）.
Medium. DeepMind Safety Research. Medium. [2022-07-18]. （原始内容存档于2023-02-10）.
Kenton, Zachary; Everitt, Tom; Weidinger, Laura; Gabriel, Iason; Mikulik, Vladimir; Irving, Geoffrey. Alignment of Language Agents. DeepMind Safety Research - Medium. 2021-03-30 [2022-07-23]. （原始内容存档于2023-02-10）.
Ortega, Pedro A.; Maini, Vishal; DeepMind safety team. Building safe artificial intelligence: specification, robustness, and assurance. Medium. 2018-09-27 [2022-08-26]. （原始内容存档于2023-02-10）.

mit.edu

direct.mit.edu

Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume; Bryson, David M.; Cheney, Nick. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. Artificial Life. 2020, 26 (2): 274–306 [2022-12-14]. ISSN 1064-5462. PMID 32271631. S2CID 4519185. doi:10.1162/artl_a_00319. （原始内容存档于2022-10-10）（英语）.

most.gov.cn

《新一代人工智能伦理规范》发布 -中华人民共和国科学技术部. www.most.gov.cn. [2023-01-03]. （原始内容存档于2022-12-25）.

nature.com

Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore. Cooperative AI: machines must learn to find common ground. Nature. 2021-05-06, 593 (7857): 33–36 [2022-12-08]. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. S2CID 233740521. doi:10.1038/d41586-021-01170-0. （原始内容存档于2022-12-18）（英语）.

neurips.cc

proceedings.neurips.cc

Turner, Alexander Matt; Smith, Logan; Shah, Rohin; Critch, Andrew; Tadepalli, Prasad. Optimal Policies Tend to Seek Power. Neural Information Processing Systems. 2021-12-03, 34 [2022-12-12]. arXiv:1912.01683 . （原始内容存档于2023-02-10）.
Armstrong, Stuart; Mindermann, Sören. Occam' s razor is insufficient to infer the preferences of irrational agents. Advances in Neural Information Processing Systems. NeurIPS 2018 31. Montréal: Curran Associates, Inc. 2018 [2022-07-21]. （原始内容存档于2023-02-10）.

nih.gov

ncbi.nlm.nih.gov

Stray, Jonathan. Aligning AI Optimization to Community Well-Being. International Journal of Community Well-Being. 2020, 3 (4): 443–463. ISSN 2524-5295. PMC 7610010 . PMID 34723107. S2CID 226254676. doi:10.1007/s42413-020-00086-3 （英语）.
Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore. Cooperative AI: machines must learn to find common ground. Nature. 2021-05-06, 593 (7857): 33–36 [2022-12-08]. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. S2CID 233740521. doi:10.1038/d41586-021-01170-0. （原始内容存档于2022-12-18）（英语）.
Wiener, Norbert. Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers.. Science. 1960-05-06, 131 (3410): 1355–1358 [2022-12-09]. ISSN 0036-8075. PMID 17841602. doi:10.1126/science.131.3410.1355. （原始内容存档于2022-10-15）（英语）.
Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume; Bryson, David M.; Cheney, Nick. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. Artificial Life. 2020, 26 (2): 274–306 [2022-12-14]. ISSN 1064-5462. PMID 32271631. S2CID 4519185. doi:10.1162/artl_a_00319. （原始内容存档于2022-10-10）（英语）.

nips.cc

papers.nips.cc

Hadfield-Menell, Dylan; Russell, Stuart J; Abbeel, Pieter; Dragan, Anca. Cooperative Inverse Reinforcement Learning. Advances in Neural Information Processing Systems. NIPS'16 29. 2016 [2022-07-21]. ISBN 978-1-5108-3881-9. （原始内容存档于2023-02-10）.

nscai.gov

NSCAI Final Report (PDF). Washington, DC: The National Security Commission on Artificial Intelligence. 2021 [2023-01-03]. （原始内容存档 (PDF)于2023-02-15）.

nytimes.com

The Ezra Klein Show. If 'All Models Are Wrong,' Why Do We Give Them So Much Power?. The New York Times. 2021-06-04 [2022-07-18]. ISSN 0362-4331. （原始内容存档于2023-02-15）.
Johnson, Steven; Iziev, Nikita. A.I. Is Mastering Language. Should We Trust What It Says?. The New York Times. 2022-04-15 [2022-07-18]. ISSN 0362-4331. （原始内容存档于2022-11-24）.
Marcus, Gary; Davis, Ernest. How to Build Artificial Intelligence We Can Trust. The New York Times. 6 September 2019 [9 February 2021]. （原始内容存档于22 September 2020）.

nyu.edu

bhr.stern.nyu.edu

Barrett, Paul M.; Hendrix, Justin; Sims, J. Grant. How Social Media Intensifies U.S. Political Polarization-And What Can Be Done About It (报告). Center for Business and Human Rights, NYU. September 2021 [2022-12-09]. （原始内容存档于2023-02-01）.

openai.com

Zaremba, Wojciech; Brockman, Greg; OpenAI. OpenAI Codex. OpenAI. 2021-08-10 [2022-07-23]. （原始内容存档于2023-02-03）.
OpenAI. Aligning AI systems with human intent. OpenAI. 2022-02-15 [2022-07-18]. （原始内容存档于2023-02-10）.
Faulty Reward Functions in the Wild. OpenAI. 2016-12-22 [2022-12-09]. （原始内容存档于2021-01-26）（英语）.
Robot hand trained with human feedback 'pretends' to grasp ball (GIF). [2022-12-09]. （原始内容存档于2022-12-18）.
Amodei, Dario; Christiano, Paul; Ray, Alex. Learning from Human Preferences. OpenAI. 2017-06-13 [2022-07-21]. （原始内容存档于2021-01-03）.
Hilton, Jacob; Gao, Leo. Measuring Goodhart's Law. OpenAI. 2022-04-13 [2022-09-09]. （原始内容存档于2023-02-10）.
Irving, Geoffrey; Amodei, Dario. AI Safety via Debate. OpenAI. 2018-05-03 [2022-07-23]. （原始内容存档于2023-02-10）.
Leike, Jan; Schulman, John; Wu, Jeffrey. Our approach to alignment research. OpenAI. 2022-08-24 [2022-09-09]. （原始内容存档于2023-02-15）.

openreview.net

Pan, Alexander; Bhatia, Kush; Steinhardt, Jacob. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models. International Conference on Learning Representations. 2022-02-14 [2022-07-21]. （原始内容存档于2023-02-10）.

pearson.com

Russell, Stuart J.; Norvig, Peter. Artificial intelligence: A modern approach 4th. Pearson. 2020: 31–34 [2022-12-07]. ISBN 978-1-292-40113-3. OCLC 1303900751. （原始内容存档于2022-07-15）.

penguinrandomhouse.com

Russell, Stuart J. Human compatible: Artificial intelligence and the problem of control. Penguin Random House. 2020 [2022-12-07]. ISBN 9780525558637. OCLC 1113410915. （原始内容存档于2023-02-10）.

quantamagazine.org

Rorvig, Mordechai. Researchers Gain New Understanding From Simple AI. Quanta Magazine. 2022-04-14 [2022-07-18]. （原始内容存档于2023-02-10）.
Wolchover, Natalie. Concerns of an Artificial Intelligence Pioneer. Quanta Magazine. 2015-04-21 [2022-07-18]. （原始内容存档于2023-02-10）.
Ornes, Stephen. Playing Hide-and-Seek, Machines Invent New Tools. Quanta Magazine. 2019-11-18 [2022-08-26]. （原始内容存档于2023-02-10）.

reuters.com

Shepardson, David. Uber disabled emergency braking in self-driving car: U.S. agency. Reuters. 2018-05-24 [2022-07-20]. （原始内容存档于2023-02-10）.

sagepub.com

journals.sagepub.com

Kober, Jens; Bagnell, J. Andrew; Peters, Jan. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research. 2013-09-01, 32 (11): 1238–1274 [2022-12-07]. ISSN 0278-3649. S2CID 1932843. doi:10.1177/0278364913495721. （原始内容存档于2022-10-15）（英语）.

science.org

Wiener, Norbert. Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers.. Science. 1960-05-06, 131 (3410): 1355–1358 [2022-12-09]. ISSN 0036-8075. PMID 17841602. doi:10.1126/science.131.3410.1355. （原始内容存档于2022-10-15）（英语）.

scientificamerican.com

Shermer, Michael. Artificial Intelligence Is Not a Threat—Yet. Scientific American. 2017-03-01 [2022-08-26]. （原始内容存档于2017-12-01）.

semanticscholar.org

api.semanticscholar.org

Gabriel, Iason. Artificial Intelligence, Values, and Alignment. Minds and Machines. 2020-09-01, 30 (3): 411–437 [2022-07-23]. ISSN 1572-8641. S2CID 210920551. doi:10.1007/s11023-020-09539-2. （原始内容存档于2023-03-15）.
Kober, Jens; Bagnell, J. Andrew; Peters, Jan. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research. 2013-09-01, 32 (11): 1238–1274 [2022-12-07]. ISSN 0278-3649. S2CID 1932843. doi:10.1177/0278364913495721. （原始内容存档于2022-10-15）（英语）.
Stray, Jonathan. Aligning AI Optimization to Community Well-Being. International Journal of Community Well-Being. 2020, 3 (4): 443–463. ISSN 2524-5295. PMC 7610010 . PMID 34723107. S2CID 226254676. doi:10.1007/s42413-020-00086-3 （英语）.
Russell, Stuart; Dewey, Daniel; Tegmark, Max. Research Priorities for Robust and Beneficial Artificial Intelligence. AI Magazine. 2015-12-31, 36 (4): 105–114 [2022-12-08]. ISSN 2371-9621. S2CID 8174496. doi:10.1609/aimag.v36i4.2577. （原始内容存档于2023-02-02）.
Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore. Cooperative AI: machines must learn to find common ground. Nature. 2021-05-06, 593 (7857): 33–36 [2022-12-08]. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. S2CID 233740521. doi:10.1038/d41586-021-01170-0. （原始内容存档于2022-12-18）（英语）.
Prunkl, Carina; Whittlestone, Jess. Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (New York NY USA: ACM). 2020-02-07: 138–143 [2022-12-08]. ISBN 978-1-4503-7110-0. S2CID 210164673. doi:10.1145/3375627.3375803. （原始内容存档于2022-10-16）（英语）.
Irving, Geoffrey; Askell, Amanda. AI Safety Needs Social Scientists. Distill. 2019-02-19, 4 (2): 10.23915/distill.00014 [2022-12-08]. ISSN 2476-0757. S2CID 159180422. doi:10.23915/distill.00014. （原始内容存档于2023-02-10）.
Lin, Stephanie; Hilton, Jacob; Evans, Owain. TruthfulQA: Measuring How Models Mimic Human Falsehoods. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Dublin, Ireland: Association for Computational Linguistics). 2022: 3214–3252 [2022-12-09]. S2CID 237532606. doi:10.18653/v1/2022.acl-long.229. （原始内容存档于2023-02-10）（英语）.
Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain. Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts. Journal of Artificial Intelligence Research. 2018-07-31, 62: 729–754 [2022-12-09]. ISSN 1076-9757. S2CID 8746462. doi:10.1613/jair.1.11222. （原始内容存档于2023-02-10）.
Zhang, Baobao; Anderljung, Markus; Kahn, Lauren; Dreksler, Noemi; Horowitz, Michael C.; Dafoe, Allan. Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers. Journal of Artificial Intelligence Research. 2021-08-02, 71 [2022-12-09]. ISSN 1076-9757. S2CID 233740003. doi:10.1613/jair.1.12895. （原始内容存档于2023-02-10）.
Wiegel, Vincent. Wendell Wallach and Colin Allen: moral machines: teaching robots right from wrong. Ethics and Information Technology. 2010-12-01, 12 (4): 359–361 [2022-07-23]. ISSN 1572-8439. S2CID 30532107. doi:10.1007/s10676-010-9239-1. （原始内容存档于2023-03-15）.
Banzhaf, Wolfgang; Goodman, Erik; Sheneman, Leigh; Trujillo, Leonardo; Worzel, Bill (编). Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Cham: Springer International Publishing. 2020 [2022-07-23]. ISBN 978-3-030-39957-3. S2CID 218531292. doi:10.1007/978-3-030-39958-0. （原始内容存档于2023-03-15）.
Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume; Bryson, David M.; Cheney, Nick. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. Artificial Life. 2020, 26 (2): 274–306 [2022-12-14]. ISSN 1064-5462. PMID 32271631. S2CID 4519185. doi:10.1162/artl_a_00319. （原始内容存档于2022-10-10）（英语）.
Zhang, Xiaoge; Chan, Felix T.S.; Yan, Chao; Bose, Indranil. Towards risk-aware artificial intelligence and machine learning systems: An overview. Decision Support Systems. 2022, 159: 113800 [2022-12-16]. S2CID 248585546. doi:10.1016/j.dss.2022.113800. （原始内容存档于2023-02-10）（英语）.
McCarthy, John; Minsky, Marvin L.; Rochester, Nathaniel; Shannon, Claude E. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Magazine. 2006-12-15, 27 (4): 12 [2022-12-21]. ISSN 2371-9621. S2CID 19439915. doi:10.1609/aimag.v27i4.1904. （原始内容存档于2023-01-31）（英语）.
Cohen, Michael K.; Hutter, Marcus; Osborne, Michael A. Advanced artificial agents intervene in the provision of reward. AI Magazine. 2022-08-29, 43 (3): 282–293 [2023-01-03]. ISSN 0738-4602. S2CID 235489158. doi:10.1002/aaai.12064. （原始内容存档于2023-02-10）（英语）.

smallake.kr

Li, Yuxi. Deep Reinforcement Learning: An Overview (PDF). Lecture Notes in Networks and Systems Book Series. 2018-11-25 [2022-12-13]. （原始内容存档 (PDF)于2022-10-10）.

springer.com

link.springer.com

Banzhaf, Wolfgang; Goodman, Erik; Sheneman, Leigh; Trujillo, Leonardo; Worzel, Bill (编). Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Cham: Springer International Publishing. 2020 [2022-07-23]. ISBN 978-3-030-39957-3. S2CID 218531292. doi:10.1007/978-3-030-39958-0. （原始内容存档于2023-03-15）.

ssrn.com

papers.ssrn.com

Tasioulas, John. First Steps Towards an Ethics of Robots and Artificial Intelligence. Journal of Practical Ethics (Rochester, NY). 2019-06-30, 7 (1): 61–95 （英语）.

stanford.edu

fsi.stanford.edu

Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine; Brunskill, Emma; Brynjolfsson, Erik. On the Opportunities and Risks of Foundation Models. Stanford CRFM. 2022-07-12 [2022-12-07]. arXiv:2108.07258 . （原始内容存档于2023-02-10）.

technologyreview.com

Heaven, Will Douglas. The new version of GPT-3 is much better behaved (and should be less toxic). MIT Technology Review. 2022-01-27 [2022-07-18]. （原始内容存档于2023-02-10）.
Heaven, Will Douglas. OpenAI's new language generator GPT-3 is shockingly good—and completely mindless. MIT Technology Review. 2020-07-20 [2022-07-23]. （原始内容存档于2020-07-25）.

theguardian.com

Naughton, John. The truth about artificial intelligence? It isn't that honest. The Observer. 2021-10-02 [2022-07-18]. ISSN 0029-7712. （原始内容存档于2023-02-13）.
Naughton, John. The truth about artificial intelligence? It isn't that honest. The Observer. 2021-10-02 [2022-07-23]. ISSN 0029-7712. （原始内容存档于2023-02-13）.
The Guardian. A robot wrote this entire article. Are you scared yet, human?. The Guardian. 2020-09-08 [2022-07-23]. ISSN 0261-3077. （原始内容存档于2021-02-04）.

theregister.com

Richardson, Tim. UK publishes National Artificial Intelligence Strategy. The Register. 22 September 2021 [2023-01-03]. （原始内容存档于2023-02-10）.

towardsdatascience.com

Moltzau, Alex. Debating the AI Safety Debate. Medium. 2019-08-24 [2022-12-14]. （原始内容存档于2022-10-13）（英语）.

un.org

United Nations. Our Common Agenda: Report of the Secretary-General (PDF) (报告). New York: United Nations. 2021 [2022-12-08]. （原始内容存档 (PDF)于2022-05-22）.
Secretary-General’s report on “Our Common Agenda” （页面存档备份，存于互联网档案馆）, 2021. Page 63: "[T]he Compact could also promote regulation of artificial intelligence to ensure that this is aligned with shared global values"

unite.ai

Anderson, Martin. The Perils of Using Quotations to Authenticate NLG Content. Unite.AI. 2022-04-05 [2022-07-21]. （原始内容存档于2023-02-10）.

universitypressscholarship.com

oxford.universitypressscholarship.com

Wallach, Wendell; Allen, Colin. Moral Machines: Teaching Robots Right from Wrong. New York: Oxford University Press. 2009 [2022-07-23]. ISBN 978-0-19-537404-9. （原始内容存档于2023-03-15）.

utexas.edu

cs.utexas.edu

Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter. Reward (Mis)design for Autonomous Driving (PDF). 2022-03-11 [2022-12-07]. arXiv:2104.13906 . （原始内容存档 (PDF)于2023-02-10）.

venturebeat.com

Wiggers, Kyle. Despite recent progress, AI-powered chatbots still have a long way to go. VentureBeat. 2022-02-05 [2022-07-23]. （原始内容存档于2022-07-23）.
OpenAI unveils model that can summarize books of any length. VentureBeat. 2021-09-23 [2022-12-14]. （原始内容存档于2022-12-19）（美国英语）.
Wiggers, Kyle. Falsehoods more likely with large language models. VentureBeat. 2021-09-20 [2022-07-23].

web.archive.org

Gabriel, Iason. Artificial Intelligence, Values, and Alignment. Minds and Machines. 2020-09-01, 30 (3): 411–437 [2022-07-23]. ISSN 1572-8641. S2CID 210920551. doi:10.1007/s11023-020-09539-2. （原始内容存档于2023-03-15）.
Russell, Stuart J.; Norvig, Peter. Artificial intelligence: A modern approach 4th. Pearson. 2020: 31–34 [2022-12-07]. ISBN 978-1-292-40113-3. OCLC 1303900751. （原始内容存档于2022-07-15）.
Russell, Stuart J. Human compatible: Artificial intelligence and the problem of control. Penguin Random House. 2020 [2022-12-07]. ISBN 9780525558637. OCLC 1113410915. （原始内容存档于2023-02-10）.
Pan, Alexander; Bhatia, Kush; Steinhardt, Jacob. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models. International Conference on Learning Representations. 2022-02-14 [2022-07-21]. （原始内容存档于2023-02-10）.
Kober, Jens; Bagnell, J. Andrew; Peters, Jan. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research. 2013-09-01, 32 (11): 1238–1274 [2022-12-07]. ISSN 0278-3649. S2CID 1932843. doi:10.1177/0278364913495721. （原始内容存档于2022-10-15）（英语）.
Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine; Brunskill, Emma; Brynjolfsson, Erik. On the Opportunities and Risks of Foundation Models. Stanford CRFM. 2022-07-12 [2022-12-07]. arXiv:2108.07258 . （原始内容存档于2023-02-10）.
Zaremba, Wojciech; Brockman, Greg; OpenAI. OpenAI Codex. OpenAI. 2021-08-10 [2022-07-23]. （原始内容存档于2023-02-03）.
Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter. Reward (Mis)design for Autonomous Driving (PDF). 2022-03-11 [2022-12-07]. arXiv:2104.13906 . （原始内容存档 (PDF)于2023-02-10）.
Future of Life Institute. Asilomar AI Principles. Future of Life Institute. 2017-08-11 [2022-07-18]. （原始内容存档于2022-10-10）.
United Nations. Our Common Agenda: Report of the Secretary-General (PDF) (报告). New York: United Nations. 2021 [2022-12-08]. （原始内容存档 (PDF)于2022-05-22）.
Ortega, Pedro A.; Maini, Vishal; DeepMind safety team. Building safe artificial intelligence: specification, robustness, and assurance. DeepMind Safety Research - Medium. 2018-09-27 [2022-07-18]. （原始内容存档于2023-02-10）.
Rorvig, Mordechai. Researchers Gain New Understanding From Simple AI. Quanta Magazine. 2022-04-14 [2022-07-18]. （原始内容存档于2023-02-10）.
Russell, Stuart; Dewey, Daniel; Tegmark, Max. Research Priorities for Robust and Beneficial Artificial Intelligence. AI Magazine. 2015-12-31, 36 (4): 105–114 [2022-12-08]. ISSN 2371-9621. S2CID 8174496. doi:10.1609/aimag.v36i4.2577. （原始内容存档于2023-02-02）.
Heaven, Will Douglas. The new version of GPT-3 is much better behaved (and should be less toxic). MIT Technology Review. 2022-01-27 [2022-07-18]. （原始内容存档于2023-02-10）.
Clifton, Jesse. Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda. Center on Long-Term Risk. 2020 [2022-07-18]. （原始内容存档于2023-01-01）.
Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore. Cooperative AI: machines must learn to find common ground. Nature. 2021-05-06, 593 (7857): 33–36 [2022-12-08]. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. S2CID 233740521. doi:10.1038/d41586-021-01170-0. （原始内容存档于2022-12-18）（英语）.
Prunkl, Carina; Whittlestone, Jess. Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (New York NY USA: ACM). 2020-02-07: 138–143 [2022-12-08]. ISBN 978-1-4503-7110-0. S2CID 210164673. doi:10.1145/3375627.3375803. （原始内容存档于2022-10-16）（英语）.
Irving, Geoffrey; Askell, Amanda. AI Safety Needs Social Scientists. Distill. 2019-02-19, 4 (2): 10.23915/distill.00014 [2022-12-08]. ISSN 2476-0757. S2CID 159180422. doi:10.23915/distill.00014. （原始内容存档于2023-02-10）.
Wiener, Norbert. Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers.. Science. 1960-05-06, 131 (3410): 1355–1358 [2022-12-09]. ISSN 0036-8075. PMID 17841602. doi:10.1126/science.131.3410.1355. （原始内容存档于2022-10-15）（英语）.
The Ezra Klein Show. If 'All Models Are Wrong,' Why Do We Give Them So Much Power?. The New York Times. 2021-06-04 [2022-07-18]. ISSN 0362-4331. （原始内容存档于2023-02-15）.
Wolchover, Natalie. Concerns of an Artificial Intelligence Pioneer. Quanta Magazine. 2015-04-21 [2022-07-18]. （原始内容存档于2023-02-10）.
California Assembly. Bill Text - ACR-215 23 Asilomar AI Principles.. [2022-07-18]. （原始内容存档于2023-02-10）.
Johnson, Steven; Iziev, Nikita. A.I. Is Mastering Language. Should We Trust What It Says?. The New York Times. 2022-04-15 [2022-07-18]. ISSN 0362-4331. （原始内容存档于2022-11-24）.
OpenAI. Aligning AI systems with human intent. OpenAI. 2022-02-15 [2022-07-18]. （原始内容存档于2023-02-10）.
Medium. DeepMind Safety Research. Medium. [2022-07-18]. （原始内容存档于2023-02-10）.
Krakovna, Victoria; Uesato, Jonathan; Mikulik, Vladimir; Rahtz, Matthew; Everitt, Tom; Kumar, Ramana; Kenton, Zac; Leike, Jan; Legg, Shane. Specification gaming: the flip side of AI ingenuity. Deepmind. 2020-04-21 [2022-08-26]. （原始内容存档于2023-02-10）.
Faulty Reward Functions in the Wild. OpenAI. 2016-12-22 [2022-12-09]. （原始内容存档于2021-01-26）（英语）.
Misaligned boat racing AI crashes to collect points instead of finishing the race (GIF). [2022-12-09]. （原始内容存档于2022-09-09）.
Lin, Stephanie; Hilton, Jacob; Evans, Owain. TruthfulQA: Measuring How Models Mimic Human Falsehoods. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Dublin, Ireland: Association for Computational Linguistics). 2022: 3214–3252 [2022-12-09]. S2CID 237532606. doi:10.18653/v1/2022.acl-long.229. （原始内容存档于2023-02-10）（英语）.
Naughton, John. The truth about artificial intelligence? It isn't that honest. The Observer. 2021-10-02 [2022-07-18]. ISSN 0029-7712. （原始内容存档于2023-02-13）.
Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Madotto, Andrea; Fung, Pascale. Survey of Hallucination in Natural Language Generation. 2022-02-01 [2022-12-09]. arXiv:2202.03629 . （原始内容存档于2023-02-10）.
Robot hand trained with human feedback 'pretends' to grasp ball (GIF). [2022-12-09]. （原始内容存档于2022-12-18）.
Edge.org. The Myth Of AI |Edge.org. [2022-07-19]. （原始内容存档于2023-02-10）.
Wells, Georgia; Deepa Seetharaman; Horwitz, Jeff. Is Facebook Bad for You? It Is for About 360 Million Users, Company Surveys Suggest. Wall Street Journal. 2021-11-05 [2022-07-19]. ISSN 0099-9660. （原始内容存档于2023-02-10）.
Barrett, Paul M.; Hendrix, Justin; Sims, J. Grant. How Social Media Intensifies U.S. Political Polarization-And What Can Be Done About It (报告). Center for Business and Human Rights, NYU. September 2021 [2022-12-09]. （原始内容存档于2023-02-01）.
Shepardson, David. Uber disabled emergency braking in self-driving car: U.S. agency. Reuters. 2018-05-24 [2022-07-20]. （原始内容存档于2023-02-10）.
Baum, Seth. 2020 Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy. 2021-01-01 [2022-07-20]. （原始内容存档于2023-02-10）.
Edwards, Ben. Adept's AI assistant can browse, search, and use web apps like a human. Ars Technica. 2022-04-26 [2022-09-09]. （原始内容存档于2023-01-17）.
Wakefield, Jane. DeepMind AI rivals average human competitive coder. BBC News. 2022-02-02 [2022-09-09]. （原始内容存档于2023-02-10）.
Dominguez, Daniel. DeepMind Introduces Gato, a New Generalist AI Agent. InfoQ. 2022-05-19 [2022-09-09]. （原始内容存档于2023-02-10）.
Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain. Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts. Journal of Artificial Intelligence Research. 2018-07-31, 62: 729–754 [2022-12-09]. ISSN 1076-9757. S2CID 8746462. doi:10.1613/jair.1.11222. （原始内容存档于2023-02-10）.
Zhang, Baobao; Anderljung, Markus; Kahn, Lauren; Dreksler, Noemi; Horowitz, Michael C.; Dafoe, Allan. Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers. Journal of Artificial Intelligence Research. 2021-08-02, 71 [2022-12-09]. ISSN 1076-9757. S2CID 233740003. doi:10.1613/jair.1.12895. （原始内容存档于2023-02-10）.
Ornes, Stephen. Playing Hide-and-Seek, Machines Invent New Tools. Quanta Magazine. 2019-11-18 [2022-08-26]. （原始内容存档于2023-02-10）.
Orseau, Laurent; Armstrong, Stuart. Safely Interruptible Agents. 2016-01-01 [2022-07-20]. （原始内容存档于2023-02-10）.
Turner, Alexander Matt; Smith, Logan; Shah, Rohin; Critch, Andrew; Tadepalli, Prasad. Optimal Policies Tend to Seek Power. Neural Information Processing Systems. 2021-12-03, 34 [2022-12-12]. arXiv:1912.01683 . （原始内容存档于2023-02-10）.
Rochon, Louis-Philippe; Rossi, Sergio. The Encyclopedia of Central Banking. Edward Elgar Publishing. 2015-02-27 [2022-12-12]. ISBN 978-1-78254-744-0. （原始内容存档于2023-02-10）（英语）.
Christian, Brian. The alignment problem: Machine learning and human values. W. W. Norton & Company. 2020 [2022-12-12]. ISBN 978-0-393-86833-3. OCLC 1233266753. （原始内容存档于2023-02-10）.
Hadfield-Menell, Dylan; Russell, Stuart J; Abbeel, Pieter; Dragan, Anca. Cooperative Inverse Reinforcement Learning. Advances in Neural Information Processing Systems. NIPS'16 29. 2016 [2022-07-21]. ISBN 978-1-5108-3881-9. （原始内容存档于2023-02-10）.
Armstrong, Stuart; Mindermann, Sören. Occam' s razor is insufficient to infer the preferences of irrational agents. Advances in Neural Information Processing Systems. NeurIPS 2018 31. Montréal: Curran Associates, Inc. 2018 [2022-07-21]. （原始内容存档于2023-02-10）.
Amodei, Dario; Christiano, Paul; Ray, Alex. Learning from Human Preferences. OpenAI. 2017-06-13 [2022-07-21]. （原始内容存档于2021-01-03）.
Li, Yuxi. Deep Reinforcement Learning: An Overview (PDF). Lecture Notes in Networks and Systems Book Series. 2018-11-25 [2022-12-13]. （原始内容存档 (PDF)于2022-10-10）.
Fürnkranz, Johannes; Hüllermeier, Eyke; Rudin, Cynthia; Slowinski, Roman; Sanner, Scott. Marc Herbstritt. Preference Learning. Dagstuhl Reports. 2014, 4 (3): 27 pages [2022-12-13]. doi:10.4230/DAGREP.4.3.1. （原始内容存档于2023-02-10）（英语）.
Hilton, Jacob; Gao, Leo. Measuring Goodhart's Law. OpenAI. 2022-04-13 [2022-09-09]. （原始内容存档于2023-02-10）.
Anderson, Martin. The Perils of Using Quotations to Authenticate NLG Content. Unite.AI. 2022-04-05 [2022-07-21]. （原始内容存档于2023-02-10）.
Wiggers, Kyle. Despite recent progress, AI-powered chatbots still have a long way to go. VentureBeat. 2022-02-05 [2022-07-23]. （原始内容存档于2022-07-23）.
Bhattacharyya, Sreejani. DeepMind's "red teaming" language models with language models: What is it?. Analytics India Magazine. 2022-02-14 [2022-07-23]. （原始内容存档于2023-02-13）.
Wiegel, Vincent. Wendell Wallach and Colin Allen: moral machines: teaching robots right from wrong. Ethics and Information Technology. 2010-12-01, 12 (4): 359–361 [2022-07-23]. ISSN 1572-8439. S2CID 30532107. doi:10.1007/s10676-010-9239-1. （原始内容存档于2023-03-15）.
Wallach, Wendell; Allen, Colin. Moral Machines: Teaching Robots Right from Wrong. New York: Oxford University Press. 2009 [2022-07-23]. ISBN 978-0-19-537404-9. （原始内容存档于2023-03-15）.
Gabriel, Iason. Artificial Intelligence, Values, and Alignment. Minds and Machines. 2020-09-01, 30 (3) [2022-12-07]. ISSN 1572-8641. doi:10.1007/s11023-020-09539-2. （原始内容存档于2023-03-15）（英语）.
Irving, Geoffrey; Amodei, Dario. AI Safety via Debate. OpenAI. 2018-05-03 [2022-07-23]. （原始内容存档于2023-02-10）.
Naughton, John. The truth about artificial intelligence? It isn't that honest. The Observer. 2021-10-02 [2022-07-23]. ISSN 0029-7712. （原始内容存档于2023-02-13）.
Banzhaf, Wolfgang; Goodman, Erik; Sheneman, Leigh; Trujillo, Leonardo; Worzel, Bill (编). Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Cham: Springer International Publishing. 2020 [2022-07-23]. ISBN 978-3-030-39957-3. S2CID 218531292. doi:10.1007/978-3-030-39958-0. （原始内容存档于2023-03-15）.
Wiblin, Robert. Dr Paul Christiano on how OpenAI is developing real solutions to the ‘AI alignment problem’, and his vision of how humanity will progressively hand over decision-making to AI systems (播客). 80,000 hours. October 2, 2018 [2022-07-23]. （原始内容存档于2022-12-14）.
Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume; Bryson, David M.; Cheney, Nick. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. Artificial Life. 2020, 26 (2): 274–306 [2022-12-14]. ISSN 1064-5462. PMID 32271631. S2CID 4519185. doi:10.1162/artl_a_00319. （原始内容存档于2022-10-10）（英语）.
Leike, Jan; Krueger, David; Everitt, Tom; Martic, Miljan; Maini, Vishal; Legg, Shane. Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871 [cs, stat]. 2018-11-19 [2022-12-14]. （原始内容存档于2022-12-18）.
OpenAI unveils model that can summarize books of any length. VentureBeat. 2021-09-23 [2022-12-14]. （原始内容存档于2022-12-19）（美国英语）.
Moltzau, Alex. Debating the AI Safety Debate. Medium. 2019-08-24 [2022-12-14]. （原始内容存档于2022-10-13）（英语）.
The Guardian. A robot wrote this entire article. Are you scared yet, human?. The Guardian. 2020-09-08 [2022-07-23]. ISSN 0261-3077. （原始内容存档于2021-02-04）.
Heaven, Will Douglas. OpenAI's new language generator GPT-3 is shockingly good—and completely mindless. MIT Technology Review. 2020-07-20 [2022-07-23]. （原始内容存档于2020-07-25）.
Alford, Anthony. EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J. InfoQ. 2021-07-13 [2022-07-23]. （原始内容存档于2023-02-10）.
Shuster, Kurt; Poff, Spencer; Chen, Moya; Kiela, Douwe; Weston, Jason. Retrieval Augmentation Reduces Hallucination in Conversation. Findings of the Association for Computational Linguistics: EMNLP 2021. EMNLP-Findings 2021. Punta Cana, Dominican Republic: Association for Computational Linguistics: 3784–3803. November 2021 [2022-07-23]. doi:10.18653/v1/2021.findings-emnlp.320. （原始内容存档于2023-02-10）.
Kumar, Nitish. OpenAI Researchers Find Ways To More Accurately Answer Open-Ended Questions Using A Text-Based Web Browser. MarkTechPost. 2021-12-23 [2022-07-23]. （原始内容存档于2023-02-10）.
Menick, Jacob; Trebacz, Maja; Mikulik, Vladimir; Aslanides, John; Song, Francis; Chadwick, Martin; Glaese, Mia; Young, Susannah; Campbell-Gillingham, Lucy; Irving, Geoffrey; McAleese, Nat. Teaching language models to support answers with verified quotes. DeepMind. 2022-03-21 [2022-12-16]. arXiv:2203.11147 . （原始内容存档于2023-02-10）.
Kenton, Zachary; Everitt, Tom; Weidinger, Laura; Gabriel, Iason; Mikulik, Vladimir; Irving, Geoffrey. Alignment of Language Agents. DeepMind Safety Research - Medium. 2021-03-30 [2022-07-23]. （原始内容存档于2023-02-10）.
Leike, Jan; Schulman, John; Wu, Jeffrey. Our approach to alignment research. OpenAI. 2022-08-24 [2022-09-09]. （原始内容存档于2023-02-15）.
Ortega, Pedro A.; Maini, Vishal; DeepMind safety team. Building safe artificial intelligence: specification, robustness, and assurance. Medium. 2018-09-27 [2022-08-26]. （原始内容存档于2023-02-10）.
Zhang, Xiaoge; Chan, Felix T.S.; Yan, Chao; Bose, Indranil. Towards risk-aware artificial intelligence and machine learning systems: An overview. Decision Support Systems. 2022, 159: 113800 [2022-12-16]. S2CID 248585546. doi:10.1016/j.dss.2022.113800. （原始内容存档于2023-02-10）（英语）.
McCarthy, John; Minsky, Marvin L.; Rochester, Nathaniel; Shannon, Claude E. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Magazine. 2006-12-15, 27 (4): 12 [2022-12-21]. ISSN 2371-9621. S2CID 19439915. doi:10.1609/aimag.v27i4.1904. （原始内容存档于2023-01-31）（英语）.
Shermer, Michael. Artificial Intelligence Is Not a Threat—Yet. Scientific American. 2017-03-01 [2022-08-26]. （原始内容存档于2017-12-01）.
Krakovna, Victoria; Legg, Shane. Specification gaming: the flip side of AI ingenuity. Deepmind. [6 January 2021]. （原始内容存档于26 January 2021）.
Cohen, Michael K.; Hutter, Marcus; Osborne, Michael A. Advanced artificial agents intervene in the provision of reward. AI Magazine. 2022-08-29, 43 (3): 282–293 [2023-01-03]. ISSN 0738-4602. S2CID 235489158. doi:10.1002/aaai.12064. （原始内容存档于2023-02-10）（英语）.
Marcus, Gary; Davis, Ernest. How to Build Artificial Intelligence We Can Trust. The New York Times. 6 September 2019 [9 February 2021]. （原始内容存档于22 September 2020）.
Wakefield, Jane. Intelligent Machines: Do we really need to fear AI?. BBC News. 27 September 2015 [9 February 2021]. （原始内容存档于8 November 2020）.
Secretary-General’s report on “Our Common Agenda” （页面存档备份，存于互联网档案馆）, 2021. Page 63: "[T]he Compact could also promote regulation of artificial intelligence to ensure that this is aligned with shared global values"
《新一代人工智能伦理规范》发布 -中华人民共和国科学技术部. www.most.gov.cn. [2023-01-03]. （原始内容存档于2022-12-25）.
PRC Ministry of Science and Technology. Ethical Norms for New Generation Artificial Intelligence Released, 2021. A translation （页面存档备份，存于互联网档案馆） by Center for Security and Emerging Technology
Richardson, Tim. UK publishes National Artificial Intelligence Strategy. The Register. 22 September 2021 [2023-01-03]. （原始内容存档于2023-02-10）.
"The government takes the long term risk of non-aligned Artificial General Intelligence, and the unforeseeable changes that it would mean for the UK and the world, seriously." (The National AI Strategy of the UK （页面存档备份，存于互联网档案馆）, 2021)
The National AI Strategy of the UK （页面存档备份，存于互联网档案馆）, 2021 (actions 9 and 10 of the section "Pillar 3 - Governing AI Effectively")
NSCAI Final Report (PDF). Washington, DC: The National Security Commission on Artificial Intelligence. 2021 [2023-01-03]. （原始内容存档 (PDF)于2023-02-15）.

wikipedia.org

en.wikipedia.org

有1797名人工智能与机器人相关研究者在Asilomar人工智能会议（英语：Asilomar Conference on Beneficial AI）上签署了人工智能准则。^[14]此外，联合国秘书长在《我们的共同议程》^[15]中也提到： “该契约可促进针对人工智能的监管，以保证其符合全体人类共有价值。”，并探讨了面来可能面临的全球灾难危机。

wiley.com

onlinelibrary.wiley.com

Cohen, Michael K.; Hutter, Marcus; Osborne, Michael A. Advanced artificial agents intervene in the provision of reward. AI Magazine. 2022-08-29, 43 (3): 282–293 [2023-01-03]. ISSN 0738-4602. S2CID 235489158. doi:10.1002/aaai.12064. （原始内容存档于2023-02-10）（英语）.

worldcat.org

Gabriel, Iason. Artificial Intelligence, Values, and Alignment. Minds and Machines. 2020-09-01, 30 (3): 411–437 [2022-07-23]. ISSN 1572-8641. S2CID 210920551. doi:10.1007/s11023-020-09539-2. （原始内容存档于2023-03-15）.
Russell, Stuart J.; Norvig, Peter. Artificial intelligence: A modern approach 4th. Pearson. 2020: 31–34 [2022-12-07]. ISBN 978-1-292-40113-3. OCLC 1303900751. （原始内容存档于2022-07-15）.
Russell, Stuart J. Human compatible: Artificial intelligence and the problem of control. Penguin Random House. 2020 [2022-12-07]. ISBN 9780525558637. OCLC 1113410915. （原始内容存档于2023-02-10）.
Kober, Jens; Bagnell, J. Andrew; Peters, Jan. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research. 2013-09-01, 32 (11): 1238–1274 [2022-12-07]. ISSN 0278-3649. S2CID 1932843. doi:10.1177/0278364913495721. （原始内容存档于2022-10-15）（英语）.
Stray, Jonathan. Aligning AI Optimization to Community Well-Being. International Journal of Community Well-Being. 2020, 3 (4): 443–463. ISSN 2524-5295. PMC 7610010 . PMID 34723107. S2CID 226254676. doi:10.1007/s42413-020-00086-3 （英语）.
Russell, Stuart; Dewey, Daniel; Tegmark, Max. Research Priorities for Robust and Beneficial Artificial Intelligence. AI Magazine. 2015-12-31, 36 (4): 105–114 [2022-12-08]. ISSN 2371-9621. S2CID 8174496. doi:10.1609/aimag.v36i4.2577. （原始内容存档于2023-02-02）.
Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore. Cooperative AI: machines must learn to find common ground. Nature. 2021-05-06, 593 (7857): 33–36 [2022-12-08]. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. S2CID 233740521. doi:10.1038/d41586-021-01170-0. （原始内容存档于2022-12-18）（英语）.
Irving, Geoffrey; Askell, Amanda. AI Safety Needs Social Scientists. Distill. 2019-02-19, 4 (2): 10.23915/distill.00014 [2022-12-08]. ISSN 2476-0757. S2CID 159180422. doi:10.23915/distill.00014. （原始内容存档于2023-02-10）.
Wiener, Norbert. Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers.. Science. 1960-05-06, 131 (3410): 1355–1358 [2022-12-09]. ISSN 0036-8075. PMID 17841602. doi:10.1126/science.131.3410.1355. （原始内容存档于2022-10-15）（英语）.
The Ezra Klein Show. If 'All Models Are Wrong,' Why Do We Give Them So Much Power?. The New York Times. 2021-06-04 [2022-07-18]. ISSN 0362-4331. （原始内容存档于2023-02-15）.
Johnson, Steven; Iziev, Nikita. A.I. Is Mastering Language. Should We Trust What It Says?. The New York Times. 2022-04-15 [2022-07-18]. ISSN 0362-4331. （原始内容存档于2022-11-24）.
Naughton, John. The truth about artificial intelligence? It isn't that honest. The Observer. 2021-10-02 [2022-07-18]. ISSN 0029-7712. （原始内容存档于2023-02-13）.
Wells, Georgia; Deepa Seetharaman; Horwitz, Jeff. Is Facebook Bad for You? It Is for About 360 Million Users, Company Surveys Suggest. Wall Street Journal. 2021-11-05 [2022-07-19]. ISSN 0099-9660. （原始内容存档于2023-02-10）.
Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain. Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts. Journal of Artificial Intelligence Research. 2018-07-31, 62: 729–754 [2022-12-09]. ISSN 1076-9757. S2CID 8746462. doi:10.1613/jair.1.11222. （原始内容存档于2023-02-10）.
Zhang, Baobao; Anderljung, Markus; Kahn, Lauren; Dreksler, Noemi; Horowitz, Michael C.; Dafoe, Allan. Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers. Journal of Artificial Intelligence Research. 2021-08-02, 71 [2022-12-09]. ISSN 1076-9757. S2CID 233740003. doi:10.1613/jair.1.12895. （原始内容存档于2023-02-10）.
Christian, Brian. The alignment problem: Machine learning and human values. W. W. Norton & Company. 2020 [2022-12-12]. ISBN 978-0-393-86833-3. OCLC 1233266753. （原始内容存档于2023-02-10）.
Wiegel, Vincent. Wendell Wallach and Colin Allen: moral machines: teaching robots right from wrong. Ethics and Information Technology. 2010-12-01, 12 (4): 359–361 [2022-07-23]. ISSN 1572-8439. S2CID 30532107. doi:10.1007/s10676-010-9239-1. （原始内容存档于2023-03-15）.
Gabriel, Iason. Artificial Intelligence, Values, and Alignment. Minds and Machines. 2020-09-01, 30 (3) [2022-12-07]. ISSN 1572-8641. doi:10.1007/s11023-020-09539-2. （原始内容存档于2023-03-15）（英语）.
MacAskill, William. What we owe the future First edition. New York, NY. 2022. ISBN 978-1-5416-1862-6. OCLC 1314633519.
Naughton, John. The truth about artificial intelligence? It isn't that honest. The Observer. 2021-10-02 [2022-07-23]. ISSN 0029-7712. （原始内容存档于2023-02-13）.
Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume; Bryson, David M.; Cheney, Nick. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. Artificial Life. 2020, 26 (2): 274–306 [2022-12-14]. ISSN 1064-5462. PMID 32271631. S2CID 4519185. doi:10.1162/artl_a_00319. （原始内容存档于2022-10-10）（英语）.
The Guardian. A robot wrote this entire article. Are you scared yet, human?. The Guardian. 2020-09-08 [2022-07-23]. ISSN 0261-3077. （原始内容存档于2021-02-04）.
McCarthy, John; Minsky, Marvin L.; Rochester, Nathaniel; Shannon, Claude E. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Magazine. 2006-12-15, 27 (4): 12 [2022-12-21]. ISSN 2371-9621. S2CID 19439915. doi:10.1609/aimag.v27i4.1904. （原始内容存档于2023-01-31）（英语）.
Cohen, Michael K.; Hutter, Marcus; Osborne, Michael A. Advanced artificial agents intervene in the provision of reward. AI Magazine. 2022-08-29, 43 (3): 282–293 [2023-01-03]. ISSN 0738-4602. S2CID 235489158. doi:10.1002/aaai.12064. （原始内容存档于2023-02-10）（英语）.

wsj.com

Wells, Georgia; Deepa Seetharaman; Horwitz, Jeff. Is Facebook Bad for You? It Is for About 360 Million Users, Company Surveys Suggest. Wall Street Journal. 2021-11-05 [2022-07-19]. ISSN 0099-9660. （原始内容存档于2023-02-10）.

wwnorton.co.uk

Christian, Brian. The alignment problem: Machine learning and human values. W. W. Norton & Company. 2020 [2022-12-12]. ISBN 978-0-393-86833-3. OCLC 1233266753. （原始内容存档于2023-02-10）.