Alinhamento da inteligência artificial (Portuguese Wikipedia)

Lin, Stephanie; Hilton, Jacob; Evans, Owain (2022). «TruthfulQA: Measuring How Models Mimic Human Falsehoods». Dublin, Ireland: Association for Computational Linguistics. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (em inglês): 3214–3252. doi:10.18653/v1/2022.acl-long.229
Shuster, Kurt; Poff, Spencer; Chen, Moya; Kiela, Douwe; Weston, Jason (novembro de 2021). Retrieval Augmentation Reduces Hallucination in Conversation. EMNLP-Findings 2021. Punta Cana, Dominican Republic: Association for Computational Linguistics. pp. 3784–3803. doi:10.18653/v1/2021.findings-emnlp.320. Consultado em 23 de julho de 2022

acm.org

dl.acm.org

Prunkl, Carina; Whittlestone, Jess (7 de fevereiro de 2020). «Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society». New York NY USA: ACM. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (em inglês): 138–143. ISBN 978-1-4503-7110-0. doi:10.1145/3375627.3375803

analyticsindiamag.com

Bhattacharyya, Sreejani (14 de fevereiro de 2022). «DeepMind's "red teaming" language models with language models: What is it?». Analytics India Magazine. Consultado em 23 de julho de 2022

arstechnica.com

Edwards, Ben (26 de abril de 2022). «Adept's AI assistant can browse, search, and use web apps like a human». Ars Technica. Consultado em 9 de setembro de 2022

arxiv.org

Hendrycks, Dan; Carlini, Nicholas (16 de junho de 2022). «Unsolved Problems in ML Safety». arXiv:2109.13916 [cs.LG]
Carlsmith, Joseph (16 de junho de 2022). «Is Power-Seeking AI an Existential Risk?». arXiv:2206.13353 [cs.CY]
Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine (12 de julho de 2022). «On the Opportunities and Risks of Foundation Models». Stanford CRFM. arXiv:2108.07258
Ouyang, Long; Wu, Jeff (2022). «Training language models to follow instructions with human feedback». arXiv:2203.02155 [cs.CL]
Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter (11 de março de 2022). «Reward (Mis)design for Autonomous Driving» (PDF). arXiv:2104.13906
Amodei, Dario; Olah, Chris (21 de junho de 2016). «Concrete Problems in AI Safety». arXiv:1606.06565 [cs.AI]
Doshi-Velez, Finale; Kim, Been (2 de março de 2017). «Towards A Rigorous Science of Interpretable Machine Learning». arXiv:1702.08608 [cs, stat]
Mohseni, Sina; Wang, Haotao (7 de março de 2022). «Taxonomy of Machine Learning Safety: A Survey and Primer». arXiv:2106.04823 [cs.LG]
Manheim, David; Garrabrant, Scott. «Categorizing Variants of Goodhart's Law». arXiv:1803.04585 [cs.AI]
Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Madotto, Andrea (1 de fevereiro de 2022). «Survey of Hallucination in Natural Language Generation». ACM Computing Surveys. arXiv:2202.03629. doi:10.1145/3571730
Wei, Jason; Tay, Yi (15 de junho de 2022). «Emergent Abilities of Large Language Models». arXiv:2206.07682 [cs.CL]
Leike, Jan; Martic, Miljan (28 de novembro de 2017). «AI Safety Gridworlds». arXiv:1711.09883 [cs.LG]
Turner, Alexander Matt; Smith, Logan; Shah, Rohin; Critch, Andrew; Tadepalli, Prasad (3 de dezembro de 2021). «Optimal Policies Tend to Seek Power». Neural Information Processing Systems. 34. arXiv:1912.01683
Everitt, Tom; Lea, Gary (21 de maio de 2018). «AGI Safety Literature Review». arXiv:1805.01109 [cs.AI]
Hendrycks, Dan; Burns, Collin; Basart, Steven; Critch, Andrew; Li, Jerry; Song, Dawn; Steinhardt, Jacob (24 de julho de 2021). «Aligning AI With Shared Human Values». International Conference on Learning Representations. arXiv:2008.02275
Perez, Ethan; Huang, Saffron (7 de fevereiro de 2022). «Red Teaming Language Models with Language Models». arXiv:2202.03286 [cs.CL]
Wu, Jeff; Ouyang, Long (27 de setembro de 2021). «Recursively Summarizing Books with Human Feedback». arXiv:2109.10862 [cs.CL]
Christiano, Paul; Shlegeris, Buck (19 de outubro de 2018). «Supervising strong learners by amplifying weak experts». arXiv:1810.08575 [cs.LG]
Hendrycks, Dan; Carlini, Nicholas (16 de junho de 2022). «Unsolved Problems in ML Safety». arXiv:2109.13916 [cs.LG]
Leike, Jan; Krueger, David (19 de novembro de 2018). «Scalable agent alignment via reward modeling: a research direction». arXiv:1811.07871 [cs.LG]
Evans, Owain; Cotton-Barratt, Owen (13 de outubro de 2021). «Truthful AI: Developing and governing AI that does not lie». arXiv:2110.06674 [cs.CY]
Nakano, Reiichiro; Hilton, Jacob (1 de junho de 2022). «WebGPT: Browser-assisted question-answering with human feedback». arXiv:2112.09332 [cs.CL]
Menick, Jacob; Trebacz, Maja; Mikulik, Vladimir; Aslanides, John; Song, Francis; Chadwick, Martin; Glaese, Mia; Young, Susannah; Campbell-Gillingham, Lucy (21 de março de 2022). «Teaching language models to support answers with verified quotes». DeepMind. arXiv:2203.11147
Askell, Amanda; Bai, Yuntao (9 de dezembro de 2021). «A General Language Assistant as a Laboratory for Alignment». arXiv:2112.00861 [cs.CL]
Everitt, Tom; Lea, Gary; Hutter, Marcus (21 de maio de 2018). «AGI Safety Literature Review». 1805.01109. arXiv:1805.01109
Demski, Abram; Garrabrant, Scott (6 de outubro de 2020). «Embedded Agency». arXiv:1902.09469 [cs.AI]
Everitt, Tom; Ortega, Pedro A. (6 de setembro de 2019). «Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings». arXiv:1902.09980 [cs.AI]

basicbooks.com

MacAskill, William (2022). What we owe the future. New York, NY: Basic Books. ISBN 978-1-5416-1862-6. OCLC 1314633519

bbc.com

Wakefield, Jane (2 de fevereiro de 2022). «DeepMind AI rivals average human competitive coder». BBC News. Consultado em 9 de setembro de 2022
Wakefield, Jane (27 de setembro de 2015). «Intelligent Machines: Do we really need to fear AI?». BBC News. Consultado em 9 de fevereiro de 2021. Arquivado do original em 8 de novembro de 2020

berkeley.edu

people.eecs.berkeley.edu

«Human Compatible: AI and the Problem of Control». Consultado em 22 de julho de 2022

books.google.com

Rochon, Louis-Philippe; Rossi, Sergio (27 de fevereiro de 2015). The Encyclopedia of Central Banking (em inglês). [S.l.]: Edward Elgar Publishing. ISBN 978-1-78254-744-0

ca.gov

leginfo.legislature.ca.gov

California Assembly. «Bill Text - ACR-215 23 Asilomar AI Principles.». Consultado em 18 de julho de 2022

cam.ac.uk

turingarchive.kings.cam.ac.uk

Turing, Alan (1951). Intelligent machinery, a heretical theory (Discurso). Aula dada à 'Sociedade 51'. Manchester. Consultado em 22 de julho de 2022
Turing, Alan (15 de maio de 1951). «Can digital computers think?». Automatic Calculating Machines. Episódio 2. Can digital computers think?. BBC

cityam.com

Barber, Lynsey (31 de julho de 2016). «Phew! Facebook's AI chief says intelligent machines are not a threat to humanity». CityAM. Consultado em 26 de agosto de 2022

dagstuhl.de

drops.dagstuhl.de

Fürnkranz, Johannes; Hüllermeier, Eyke; Rudin, Cynthia; Slowinski, Roman; Sanner, Scott (2014). Marc Herbstritt. «Preference Learning». Dagstuhl Reports (em inglês). 4 (3): 27 pages. doi:10.4230/DAGREP.4.3.1

deepmind.com

Krakovna, Victoria; Uesato, Jonathan; Mikulik, Vladimir; Rahtz, Matthew; Everitt, Tom; Kumar, Ramana; Kenton, Zac; Leike, Jan; Legg, Shane (21 de abril de 2020). «Specification gaming: the flip side of AI ingenuity». Deepmind. Consultado em 26 de agosto de 2022
Orseau, Laurent; Armstrong, Stuart (1 de janeiro de 2016). «Safely Interruptible Agents». Consultado em 20 de julho de 2022
Menick, Jacob; Trebacz, Maja; Mikulik, Vladimir; Aslanides, John; Song, Francis; Chadwick, Martin; Glaese, Mia; Young, Susannah; Campbell-Gillingham, Lucy (21 de março de 2022). «Teaching language models to support answers with verified quotes». DeepMind. arXiv:2203.11147
Krakovna, Victoria; Legg, Shane. «Specification gaming: the flip side of AI ingenuity». Deepmind. Consultado em 6 de janeiro de 2021. Arquivado do original em 26 de janeiro de 2021

distill.pub

Irving, Geoffrey; Askell, Amanda (19 de fevereiro de 2019). «AI Safety Needs Social Scientists». Distill. 4 (2): 10.23915/distill.00014. ISSN 2476-0757. doi:10.23915/distill.00014

doi.org

dx.doi.org

Gabriel, Iason (1 de setembro de 2020). «Artificial Intelligence, Values, and Alignment». Minds and Machines. 30 (3): 411–437. ISSN 1572-8641. doi:10.1007/s11023-020-09539-2. Consultado em 23 de julho de 2022
Kober, Jens; Bagnell, J. Andrew; Peters, Jan (1 de setembro de 2013). «Reinforcement learning in robotics: A survey». The International Journal of Robotics Research (em inglês). 32 (11): 1238–1274. ISSN 0278-3649. doi:10.1177/0278364913495721
Stray, Jonathan (2020). «Aligning AI Optimization to Community Well-Being». International Journal of Community Well-Being (em inglês). 3 (4): 443–463. ISSN 2524-5295. PMC 7610010. PMID 34723107. doi:10.1007/s42413-020-00086-3
Russell, Stuart; Dewey, Daniel; Tegmark, Max (31 de dezembro de 2015). «Research Priorities for Robust and Beneficial Artificial Intelligence». AI Magazine. 36 (4): 105–114. ISSN 2371-9621. doi:10.1609/aimag.v36i4.2577
Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (6 de maio de 2021). «Cooperative AI: machines must learn to find common ground». Nature (em inglês). 593 (7857): 33–36. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. doi:10.1038/d41586-021-01170-0
Prunkl, Carina; Whittlestone, Jess (7 de fevereiro de 2020). «Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society». New York NY USA: ACM. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (em inglês): 138–143. ISBN 978-1-4503-7110-0. doi:10.1145/3375627.3375803
Irving, Geoffrey; Askell, Amanda (19 de fevereiro de 2019). «AI Safety Needs Social Scientists». Distill. 4 (2): 10.23915/distill.00014. ISSN 2476-0757. doi:10.23915/distill.00014
Wiener, Norbert (6 de maio de 1960). «Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers.». Science (em inglês). 131 (3410): 1355–1358. ISSN 0036-8075. PMID 17841602. doi:10.1126/science.131.3410.1355
Lin, Stephanie; Hilton, Jacob; Evans, Owain (2022). «TruthfulQA: Measuring How Models Mimic Human Falsehoods». Dublin, Ireland: Association for Computational Linguistics. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (em inglês): 3214–3252. doi:10.18653/v1/2022.acl-long.229
Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Madotto, Andrea (1 de fevereiro de 2022). «Survey of Hallucination in Natural Language Generation». ACM Computing Surveys. arXiv:2202.03629. doi:10.1145/3571730
Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain (31 de julho de 2018). «Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts». Journal of Artificial Intelligence Research. 62: 729–754. ISSN 1076-9757. doi:10.1613/jair.1.11222
Zhang, Baobao; Anderljung, Markus; Kahn, Lauren; Dreksler, Noemi; Horowitz, Michael C.; Dafoe, Allan (2 de agosto de 2021). «Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers». Journal of Artificial Intelligence Research. 71. ISSN 1076-9757. doi:10.1613/jair.1.12895
Hadfield-Menell, Dylan; Dragan, Anca; Abbeel, Pieter; Russell, Stuart (2017). The Off-Switch Game. pp. 220–227. doi:10.24963/ijcai.2017/32
Fürnkranz, Johannes; Hüllermeier, Eyke; Rudin, Cynthia; Slowinski, Roman; Sanner, Scott (2014). Marc Herbstritt. «Preference Learning». Dagstuhl Reports (em inglês). 4 (3): 27 pages. doi:10.4230/DAGREP.4.3.1
Wiegel, Vincent (1 de dezembro de 2010). «Wendell Wallach and Colin Allen: moral machines: teaching robots right from wrong». Ethics and Information Technology. 12 (4): 359–361. ISSN 1572-8439. doi:10.1007/s10676-010-9239-1. Consultado em 23 de julho de 2022
Banzhaf; Goodman; Sheneman; Trujillo; Worzel, eds. (2020). Genetic Programming Theory and Practice XVII. Col: Genetic and Evolutionary Computation. Cham: Springer International Publishing. ISBN 978-3-030-39957-3. doi:10.1007/978-3-030-39958-0. Consultado em 23 de julho de 2022
Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume (2020). «The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities». Artificial Life (em inglês). 26 (2): 274–306. ISSN 1064-5462. PMID 32271631. doi:10.1162/artl_a_00319
Shuster, Kurt; Poff, Spencer; Chen, Moya; Kiela, Douwe; Weston, Jason (novembro de 2021). Retrieval Augmentation Reduces Hallucination in Conversation. EMNLP-Findings 2021. Punta Cana, Dominican Republic: Association for Computational Linguistics. pp. 3784–3803. doi:10.18653/v1/2021.findings-emnlp.320. Consultado em 23 de julho de 2022
Zhang, Xiaoge; Chan, Felix T.S.; Yan, Chao; Bose, Indranil (2022). «Towards risk-aware artificial intelligence and machine learning systems: An overview». Decision Support Systems (em inglês). 159. 113800 páginas. doi:10.1016/j.dss.2022.113800
McCarthy, John; Minsky, Marvin L.; Rochester, Nathaniel; Shannon, Claude E. (15 de dezembro de 2006). «A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955». AI Magazine (em inglês). 27 (4). 12 páginas. ISSN 2371-9621. doi:10.1609/aimag.v27i4.1904
Cohen, Michael K.; Hutter, Marcus; Osborne, Michael A. (29 de agosto de 2022). «Advanced artificial agents intervene in the provision of reward». AI Magazine (em inglês). 43 (3): 282–293. ISSN 0738-4602. doi:10.1002/aaai.12064
Sotala, Kaj; Yampolskiy, Roman (19 de dezembro de 2014). «Responses to catastrophic AGI risk: a survey». Physica Scripta. 90 (1). 018001 páginas. Bibcode:2015PhyS...90a8001S. doi:10.1088/0031-8949/90/1/018001

doi.org

Gabriel, Iason (1 de setembro de 2020). «Artificial Intelligence, Values, and Alignment». Minds and Machines. 30 (3): 411–437. ISSN 1572-8641. doi:10.1007/s11023-020-09539-2. Consultado em 23 de julho de 2022
Wiegel, Vincent (1 de dezembro de 2010). «Wendell Wallach and Colin Allen: moral machines: teaching robots right from wrong». Ethics and Information Technology. 12 (4): 359–361. ISSN 1572-8439. doi:10.1007/s10676-010-9239-1. Consultado em 23 de julho de 2022

edge.org

Edge.org. «The Myth Of AI | Edge.org». Consultado em 19 de julho de 2022

elsevier.com

linkinghub.elsevier.com

Zhang, Xiaoge; Chan, Felix T.S.; Yan, Chao; Bose, Indranil (2022). «Towards risk-aware artificial intelligence and machine learning systems: An overview». Decision Support Systems (em inglês). 159. 113800 páginas. doi:10.1016/j.dss.2022.113800

erichorvitz.com

Horvitz, Eric (27 de junho de 2016). «Reflections on Safety and Artificial Intelligence» (PDF). Eric Horvitz. Consultado em 20 de abril de 2020

futureoflife.org

Future of Life Institute (11 de agosto de 2017). «Asilomar AI Principles». Consultado em 18 de julho de 2022
Selman, Bart, Intelligence Explosion: Science or Fiction? (PDF)

gcrinstitute.org

Baum, Seth (1 de janeiro de 2021). «2020 Survey of Artificial General Intelligence Projects for Ethics, Risk, and Policy». Consultado em 20 de julho de 2022

gov.uk

The National AI Strategy of the UK, 2021 (actions 9 and 10 of the section "Pillar 3 - Governing AI Effectively")

harvard.edu

ui.adsabs.harvard.edu

Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (6 de maio de 2021). «Cooperative AI: machines must learn to find common ground». Nature (em inglês). 593 (7857): 33–36. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. doi:10.1038/d41586-021-01170-0
Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Madotto, Andrea (1 de fevereiro de 2022). «Survey of Hallucination in Natural Language Generation». ACM Computing Surveys. arXiv:2202.03629. doi:10.1145/3571730
Sotala, Kaj; Yampolskiy, Roman (19 de dezembro de 2014). «Responses to catastrophic AGI risk: a survey». Physica Scripta. 90 (1). 018001 páginas. Bibcode:2015PhyS...90a8001S. doi:10.1088/0031-8949/90/1/018001

infoq.com

Dominguez, Daniel (19 de maio de 2022). «DeepMind Introduces Gato, a New Generalist AI Agent». InfoQ. Consultado em 9 de setembro de 2022
Alford, Anthony (13 de julho de 2021). «EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J». InfoQ. Consultado em 23 de julho de 2022

jair.org

Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain (31 de julho de 2018). «Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts». Journal of Artificial Intelligence Research. 62: 729–754. ISSN 1076-9757. doi:10.1613/jair.1.11222
Zhang, Baobao; Anderljung, Markus; Kahn, Lauren; Dreksler, Noemi; Horowitz, Michael C.; Dafoe, Allan (2 de agosto de 2021). «Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers». Journal of Artificial Intelligence Research. 71. ISSN 1076-9757. doi:10.1613/jair.1.12895

longtermrisk.org

Clifton, Jesse (2020). «Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda». Center on Long-Term Risk. Consultado em 18 de julho de 2022

lukemuehlhauser.com

Muehlhauser, Luke (29 de janeiro de 2016). «Sutskever on Talking Machines». Luke Muehlhauser. Consultado em 26 de agosto de 2022

machinethoughts.wordpress.com

McAllester (10 de agosto de 2014). «Friendly AI and the Servant Mission». Machine Thoughts

marktechpost.com

Kumar, Nitish (23 de dezembro de 2021). «OpenAI Researchers Find Ways To More Accurately Answer Open-Ended Questions Using A Text-Based Web Browser». MarkTechPost. Consultado em 23 de julho de 2022

medium.com

deepmindsafetyresearch.medium.com

Ortega, Pedro A.; Maini, Vishal; DeepMind safety team (27 de setembro de 2018). «Building safe artificial intelligence: specification, robustness, and assurance». DeepMind Safety Research - Medium. Consultado em 18 de julho de 2022
Medium. «DeepMind Safety Research». Medium. Consultado em 18 de julho de 2022
Kenton, Zachary; Everitt, Tom; Weidinger, Laura; Gabriel, Iason; Mikulik, Vladimir; Irving, Geoffrey (30 de março de 2021). «Alignment of Language Agents». DeepMind Safety Research - Medium. Consultado em 23 de julho de 2022
Ortega, Pedro A.; Maini, Vishal; DeepMind safety team (27 de setembro de 2018). «Building safe artificial intelligence: specification, robustness, and assurance». Medium. Consultado em 26 de agosto de 2022

medium.com

Chollet, François (8 de dezembro de 2018). «The implausibility of intelligence explosion». Medium. Consultado em 26 de agosto de 2022

mit.edu

direct.mit.edu

Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume (2020). «The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities». Artificial Life (em inglês). 26 (2): 274–306. ISSN 1064-5462. PMID 32271631. doi:10.1162/artl_a_00319

nature.com

Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (6 de maio de 2021). «Cooperative AI: machines must learn to find common ground». Nature (em inglês). 593 (7857): 33–36. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. doi:10.1038/d41586-021-01170-0

neurips.cc

proceedings.neurips.cc

Turner, Alexander Matt; Smith, Logan; Shah, Rohin; Critch, Andrew; Tadepalli, Prasad (3 de dezembro de 2021). «Optimal Policies Tend to Seek Power». Neural Information Processing Systems. 34. arXiv:1912.01683
Armstrong, Stuart; Mindermann, Sören (2018). Occam' s razor is insufficient to infer the preferences of irrational agents. NeurIPS 2018. 31. Montréal: Curran Associates, Inc. Consultado em 21 de julho de 2022

nih.gov

ncbi.nlm.nih.gov

Stray, Jonathan (2020). «Aligning AI Optimization to Community Well-Being». International Journal of Community Well-Being (em inglês). 3 (4): 443–463. ISSN 2524-5295. PMC 7610010. PMID 34723107. doi:10.1007/s42413-020-00086-3
Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (6 de maio de 2021). «Cooperative AI: machines must learn to find common ground». Nature (em inglês). 593 (7857): 33–36. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. doi:10.1038/d41586-021-01170-0
Wiener, Norbert (6 de maio de 1960). «Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers.». Science (em inglês). 131 (3410): 1355–1358. ISSN 0036-8075. PMID 17841602. doi:10.1126/science.131.3410.1355
Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume (2020). «The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities». Artificial Life (em inglês). 26 (2): 274–306. ISSN 1064-5462. PMID 32271631. doi:10.1162/artl_a_00319

nips.cc

papers.nips.cc

Hadfield-Menell, Dylan; Russell, Stuart J; Abbeel, Pieter; Dragan, Anca (2016). Cooperative Inverse Reinforcement Learning. NIPS'16. 29. ISBN 978-1-5108-3881-9. Consultado em 21 de julho de 2022

nscai.gov

NSCAI Final Report (PDF). Washington, DC: The National Security Commission on Artificial Intelligence. 2021

nytimes.com

The Ezra Klein Show (4 de junho de 2021). «If 'All Models Are Wrong,' Why Do We Give Them So Much Power?». The New York Times. ISSN 0362-4331. Consultado em 18 de julho de 2022
Johnson, Steven; Iziev, Nikita (15 de abril de 2022). «A.I. Is Mastering Language. Should We Trust What It Says?». The New York Times. ISSN 0362-4331. Consultado em 18 de julho de 2022
Marcus, Gary; Davis, Ernest (6 de setembro de 2019). «How to Build Artificial Intelligence We Can Trust». The New York Times. Consultado em 9 de fevereiro de 2021. Arquivado do original em 22 de setembro de 2020

nyu.edu

bhr.stern.nyu.edu

Barrett, Paul M.; Hendrix, Justin; Sims, J. Grant (setembro de 2021). How Social Media Intensifies U.S. Political Polarization-And What Can Be Done About It (Relatório). Center for Business and Human Rights, NYU

openai.com

Zaremba, Wojciech; Brockman, Greg; OpenAI (10 de agosto de 2021). «OpenAI Codex». OpenAI. Consultado em 23 de julho de 2022
OpenAI (15 de fevereiro de 2022). «Aligning AI systems with human intent». OpenAI. Consultado em 18 de julho de 2022
«Faulty Reward Functions in the Wild». OpenAI (em inglês). 22 de dezembro de 2016. Consultado em 10 de setembro de 2022
Amodei, Dario; Christiano, Paul; Ray, Alex (13 de junho de 2017). «Learning from Human Preferences». OpenAI. Consultado em 21 de julho de 2022
Hilton, Jacob; Gao, Leo (13 de abril de 2022). «Measuring Goodhart's Law». OpenAI. Consultado em 9 de setembro de 2022
Irving, Geoffrey; Amodei, Dario (3 de maio de 2018). «AI Safety via Debate». OpenAI. Consultado em 23 de julho de 2022
Leike, Jan; Schulman, John; Wu, Jeffrey (24 de agosto de 2022). «Our approach to alignment research». OpenAI. Consultado em 9 de setembro de 2022
Baker, Bowen; Kanitscheider, Ingmar; Markov, Todor; Wu, Yi; Powell, Glenn; McGrew, Bob; Mordatch, Igor (17 de setembro de 2019). «Emergent Tool Use from Multi-Agent Interaction». OpenAI. Consultado em 26 de agosto de 2022

openreview.net

Pan, Alexander; Bhatia, Kush; Steinhardt, Jacob (14 de fevereiro de 2022). The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models. International Conference on Learning Representations. Consultado em 21 de julho de 2022

pearson.com

Russel, Stuart J.; Norvig, Peter. (2020). Artificial intelligence: A modern approach. [S.l.]: Pearson. pp. 31–34. ISBN 978-1-292-40113-3
Russell, Stuart J.; Norvig, Peter (2020). Artificial intelligence: A modern approach 4th ed. [S.l.]: Pearson. pp. 4–5. ISBN 978-1-292-40113-3. OCLC 1303900751

penguinrandomhouse.com

Russell, Stuart J. (2020). Human compatible: Artificial intelligence and the problem of control. [S.l.]: Penguin Random House. ISBN 9780525558637. OCLC 1113410915

quantamagazine.org

Rorvig, Mordechai (14 de abril de 2022). «Researchers Gain New Understanding From Simple AI». Quanta Magazine. Consultado em 18 de julho de 2022
Wolchover, Natalie (21 de abril de 2015). «Concerns of an Artificial Intelligence Pioneer». Quanta Magazine. Consultado em 18 de julho de 2022
Ornes, Stephen (18 de novembro de 2019). «Playing Hide-and-Seek, Machines Invent New Tools». Quanta Magazine. Consultado em 26 de agosto de 2022

reddit.com

Schmidhuber, Jürgen (6 de março de 2015). «I am Jürgen Schmidhuber, AMA!» (Reddit Comment). r/MachineLearning. Consultado em 23 de julho de 2022

reuters.com

Shepardson, David (24 de maio de 2018). «Uber disabled emergency braking in self-driving car: U.S. agency». Reuters. Consultado em 20 de julho de 2022

sagepub.com

journals.sagepub.com

Kober, Jens; Bagnell, J. Andrew; Peters, Jan (1 de setembro de 2013). «Reinforcement learning in robotics: A survey». The International Journal of Robotics Research (em inglês). 32 (11): 1238–1274. ISSN 0278-3649. doi:10.1177/0278364913495721

science.org

Wiener, Norbert (6 de maio de 1960). «Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers.». Science (em inglês). 131 (3410): 1355–1358. ISSN 0036-8075. PMID 17841602. doi:10.1126/science.131.3410.1355

scientificamerican.com

Marcus, Gary (6 de junho de 2022). «Artificial General Intelligence Is Not as Imminent as You Might Think». Scientific American. Consultado em 26 de agosto de 2022
Shermer, Michael (1 de março de 2017). «Artificial Intelligence Is Not a Threat—Yet». Scientific American. Consultado em 26 de agosto de 2022

scottaaronson.blog

Aaronson, Scott (17 de junho de 2022). «OpenAI!». Shtetl-Optimized

smallake.kr

Li, Yuxi (25 de novembro de 2018). «Deep Reinforcement Learning: An Overview» (PDF). Lecture Notes in Networks and Systems Book Series

springer.com

link.springer.com

Banzhaf; Goodman; Sheneman; Trujillo; Worzel, eds. (2020). Genetic Programming Theory and Practice XVII. Col: Genetic and Evolutionary Computation. Cham: Springer International Publishing. ISBN 978-3-030-39957-3. doi:10.1007/978-3-030-39958-0. Consultado em 23 de julho de 2022

stanford.edu

fsi.stanford.edu

Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine (12 de julho de 2022). «On the Opportunities and Risks of Foundation Models». Stanford CRFM. arXiv:2108.07258

technologyreview.com

Heaven, Will Douglas (27 de janeiro de 2022). «The new version of GPT-3 is much better behaved (and should be less toxic)». MIT Technology Review. Consultado em 18 de julho de 2022
Heaven, Will Douglas (20 de julho de 2020). «OpenAI's new language generator GPT-3 is shockingly good—and completely mindless». MIT Technology Review. Consultado em 23 de julho de 2022

theguardian.com

Naughton, John (2 de outubro de 2021). «The truth about artificial intelligence? It isn't that honest». The Observer. ISSN 0029-7712. Consultado em 18 de julho de 2022
Naughton, John (2 de outubro de 2021). «The truth about artificial intelligence? It isn't that honest». The Observer. ISSN 0029-7712. Consultado em 23 de julho de 2022
The Guardian (8 de setembro de 2020). «A robot wrote this entire article. Are you scared yet, human?». The Guardian. ISSN 0261-3077. Consultado em 23 de julho de 2022

theregister.com

Richardson, Tim (22 de setembro de 2021). «UK publishes National Artificial Intelligence Strategy». The Register

towardsdatascience.com

Harris, Jeremie (16 de junho de 2021). «The case against (worrying about) existential risk from AI». Medium. Consultado em 26 de agosto de 2022
Moltzau, Alex (24 de agosto de 2019). «Debating the AI Safety Debate». Towards Data Science. Consultado em 23 de julho de 2022

un.org

Nações Unidas (2021). Our Common Agenda: Report of the Secretary-General (PDF) (Relatório). Nova Iorque: Nações Unidas
Secretary-General’s report on “Our Common Agenda”, 2021.

unite.ai

Anderson, Martin (5 de abril de 2022). «The Perils of Using Quotations to Authenticate NLG Content». Unite.AI. Consultado em 21 de julho de 2022

universitypressscholarship.com

oxford.universitypressscholarship.com

Wallach, Wendell; Allen, Colin (2009). Moral Machines: Teaching Robots Right from Wrong. New York: Oxford University Press. ISBN 978-0-19-537404-9. Consultado em 23 de julho de 2022

utexas.edu

cs.utexas.edu

Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter (11 de março de 2022). «Reward (Mis)design for Autonomous Driving» (PDF). arXiv:2104.13906

venturebeat.com

Wiggers, Kyle (5 de fevereiro de 2022). «Despite recent progress, AI-powered chatbots still have a long way to go». VentureBeat. Consultado em 23 de julho de 2022
Wiggers, Kyle (23 de setembro de 2021). «OpenAI unveils model that can summarize books of any length». VentureBeat. Consultado em 23 de julho de 2022
Wiggers, Kyle (20 de setembro de 2021). «Falsehoods more likely with large language models». VentureBeat. Consultado em 23 de julho de 2022

vetta.org

Shane (31 de agosto de 2009). «Funding safe AGI». vetta project

washingtonpost.com

Rossi, Francesca. «Opinion | How do you teach a machine to be moral?». Washington Post. ISSN 0190-8286

web.archive.org

Krakovna, Victoria; Legg, Shane. «Specification gaming: the flip side of AI ingenuity». Deepmind. Consultado em 6 de janeiro de 2021. Arquivado do original em 26 de janeiro de 2021
Wakefield, Jane (27 de setembro de 2015). «Intelligent Machines: Do we really need to fear AI?». BBC News. Consultado em 9 de fevereiro de 2021. Arquivado do original em 8 de novembro de 2020
Marcus, Gary; Davis, Ernest (6 de setembro de 2019). «How to Build Artificial Intelligence We Can Trust». The New York Times. Consultado em 9 de fevereiro de 2021. Arquivado do original em 22 de setembro de 2020

wiley.com

onlinelibrary.wiley.com

Cohen, Michael K.; Hutter, Marcus; Osborne, Michael A. (29 de agosto de 2022). «Advanced artificial agents intervene in the provision of reward». AI Magazine (em inglês). 43 (3): 282–293. ISSN 0738-4602. doi:10.1002/aaai.12064

worldcat.org

Gabriel, Iason (1 de setembro de 2020). «Artificial Intelligence, Values, and Alignment». Minds and Machines. 30 (3): 411–437. ISSN 1572-8641. doi:10.1007/s11023-020-09539-2. Consultado em 23 de julho de 2022
Russell, Stuart J. (2020). Human compatible: Artificial intelligence and the problem of control. [S.l.]: Penguin Random House. ISBN 9780525558637. OCLC 1113410915
Kober, Jens; Bagnell, J. Andrew; Peters, Jan (1 de setembro de 2013). «Reinforcement learning in robotics: A survey». The International Journal of Robotics Research (em inglês). 32 (11): 1238–1274. ISSN 0278-3649. doi:10.1177/0278364913495721
Stray, Jonathan (2020). «Aligning AI Optimization to Community Well-Being». International Journal of Community Well-Being (em inglês). 3 (4): 443–463. ISSN 2524-5295. PMC 7610010. PMID 34723107. doi:10.1007/s42413-020-00086-3
Russell, Stuart; Dewey, Daniel; Tegmark, Max (31 de dezembro de 2015). «Research Priorities for Robust and Beneficial Artificial Intelligence». AI Magazine. 36 (4): 105–114. ISSN 2371-9621. doi:10.1609/aimag.v36i4.2577
Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (6 de maio de 2021). «Cooperative AI: machines must learn to find common ground». Nature (em inglês). 593 (7857): 33–36. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. doi:10.1038/d41586-021-01170-0
Irving, Geoffrey; Askell, Amanda (19 de fevereiro de 2019). «AI Safety Needs Social Scientists». Distill. 4 (2): 10.23915/distill.00014. ISSN 2476-0757. doi:10.23915/distill.00014
Wiener, Norbert (6 de maio de 1960). «Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers.». Science (em inglês). 131 (3410): 1355–1358. ISSN 0036-8075. PMID 17841602. doi:10.1126/science.131.3410.1355
The Ezra Klein Show (4 de junho de 2021). «If 'All Models Are Wrong,' Why Do We Give Them So Much Power?». The New York Times. ISSN 0362-4331. Consultado em 18 de julho de 2022
Johnson, Steven; Iziev, Nikita (15 de abril de 2022). «A.I. Is Mastering Language. Should We Trust What It Says?». The New York Times. ISSN 0362-4331. Consultado em 18 de julho de 2022
Russell, Stuart J.; Norvig, Peter (2020). Artificial intelligence: A modern approach 4th ed. [S.l.]: Pearson. pp. 4–5. ISBN 978-1-292-40113-3. OCLC 1303900751
Naughton, John (2 de outubro de 2021). «The truth about artificial intelligence? It isn't that honest». The Observer. ISSN 0029-7712. Consultado em 18 de julho de 2022
Wells, Georgia; Deepa Seetharaman; Horwitz, Jeff (5 de novembro de 2021). «Is Facebook Bad for You? It Is for About 360 Million Users, Company Surveys Suggest». Wall Street Journal. ISSN 0099-9660. Consultado em 19 de julho de 2022
Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain (31 de julho de 2018). «Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts». Journal of Artificial Intelligence Research. 62: 729–754. ISSN 1076-9757. doi:10.1613/jair.1.11222
Zhang, Baobao; Anderljung, Markus; Kahn, Lauren; Dreksler, Noemi; Horowitz, Michael C.; Dafoe, Allan (2 de agosto de 2021). «Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers». Journal of Artificial Intelligence Research. 71. ISSN 1076-9757. doi:10.1613/jair.1.12895
Shanahan, Murray (2015). The technological singularity. Cambridge, Massachusetts: [s.n.] ISBN 978-0-262-33182-1. OCLC 917889148
Rossi, Francesca. «Opinion | How do you teach a machine to be moral?». Washington Post. ISSN 0190-8286
Christian, Brian (2020). The alignment problem: Machine learning and human values. [S.l.]: W. W. Norton & Company. ISBN 978-0-393-86833-3. OCLC 1233266753
Christian, Brian (2020). The alignment problem: Machine learning and human values. [S.l.]: W. W. Norton & Company. ISBN 978-0-393-86833-3. OCLC 1233266753
Wiegel, Vincent (1 de dezembro de 2010). «Wendell Wallach and Colin Allen: moral machines: teaching robots right from wrong». Ethics and Information Technology. 12 (4): 359–361. ISSN 1572-8439. doi:10.1007/s10676-010-9239-1. Consultado em 23 de julho de 2022
MacAskill, William (2022). What we owe the future. New York, NY: Basic Books. ISBN 978-1-5416-1862-6. OCLC 1314633519
Naughton, John (2 de outubro de 2021). «The truth about artificial intelligence? It isn't that honest». The Observer. ISSN 0029-7712. Consultado em 23 de julho de 2022
Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume (2020). «The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities». Artificial Life (em inglês). 26 (2): 274–306. ISSN 1064-5462. PMID 32271631. doi:10.1162/artl_a_00319
The Guardian (8 de setembro de 2020). «A robot wrote this entire article. Are you scared yet, human?». The Guardian. ISSN 0261-3077. Consultado em 23 de julho de 2022
Christian, Brian (2020). «Chapter 5: Shaping». The alignment problem: Machine learning and human values. [S.l.]: W. W. Norton & Company. ISBN 978-0-393-86833-3. OCLC 1233266753
McCarthy, John; Minsky, Marvin L.; Rochester, Nathaniel; Shannon, Claude E. (15 de dezembro de 2006). «A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955». AI Magazine (em inglês). 27 (4). 12 páginas. ISSN 2371-9621. doi:10.1609/aimag.v27i4.1904
Cohen, Michael K.; Hutter, Marcus; Osborne, Michael A. (29 de agosto de 2022). «Advanced artificial agents intervene in the provision of reward». AI Magazine (em inglês). 43 (3): 282–293. ISSN 0738-4602. doi:10.1002/aaai.12064

wsj.com

Wells, Georgia; Deepa Seetharaman; Horwitz, Jeff (5 de novembro de 2021). «Is Facebook Bad for You? It Is for About 360 Million Users, Company Surveys Suggest». Wall Street Journal. ISSN 0099-9660. Consultado em 19 de julho de 2022

wwnorton.co.uk

Christian, Brian (2020). The alignment problem: Machine learning and human values. [S.l.]: W. W. Norton & Company. ISBN 978-0-393-86833-3. OCLC 1233266753
Christian, Brian (2020). The alignment problem: Machine learning and human values. [S.l.]: W. W. Norton & Company. ISBN 978-0-393-86833-3. OCLC 1233266753
Christian, Brian (2020). «Chapter 5: Shaping». The alignment problem: Machine learning and human values. [S.l.]: W. W. Norton & Company. ISBN 978-0-393-86833-3. OCLC 1233266753