AI-Alignment (German Wikipedia)

Analysis of information sources in references of the Wikipedia article "AI-Alignment" in German language version.

refsWebsite
Global rank German rank
1st place
1st place
69th place
189th place
5th place
141st place
2nd place
3rd place
11th place
1,120th place
1,559th place
1,248th place
9,352nd place
low place
1,185th place
2,009th place
551st place
812th place
low place
low place
4th place
7th place
low place
3,886th place
616th place
1,875th place
low place
low place
97th place
125th place
6,413th place
5,804th place
1,943rd place
4,790th place
7th place
19th place
12th place
25th place
6,158th place
8,574th place
low place
low place
896th place
1,716th place
8,920th place
low place
432nd place
770th place
low place
low place
2,012th place
3,391st place
low place
low place
731st place
969th place
580th place
1,106th place
54th place
107th place
102nd place
1,110th place
low place
low place
low place
low place
1,160th place
2,116th place
low place
low place
1,047th place
1,512th place
low place
low place
18th place
181st place
179th place
460th place
low place
low place
79th place
283rd place
1,174th place
2,060th place
49th place
151st place
low place
low place
388th place
1,153rd place
low place
low place
low place
low place
34th place
113th place
low place
low place
low place
low place
1,082nd place
1,098th place
low place
low place
low place
low place
5,872nd place
low place
low place
low place
low place
low place
low place
low place
low place
low place
low place
low place
652nd place
864th place
274th place
152nd place
low place
low place
415th place
779th place
low place
low place
175th place
256th place
610th place
521st place
222nd place
272nd place
2,318th place
3,432nd place
3,700th place
6,780th place
low place
low place

80000hours.org

aaai.org

ojs.aaai.org

aclanthology.org

acm.org

dl.acm.org

analyticsindiamag.com

arstechnica.com

arxiv.org

  • Ngo, Richard; Chan, Lawrence; Mindermann, Sören (22. Februar 2023). „The alignment problem from a deep learning perspective“. arXiv:2209.00626 cs.AI.
  • Carlsmith, Joseph (16. Juni 2022). „Is Power-Seeking AI an Existential Risk?“. arXiv:2206.13353 cs.CY.
  • Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, J.; Hilton, Jacob; Kelton, Fraser; Miller, Luke E.; Simens, Maddie; Askell, Amanda; Welinder, P.; Christiano, P.; Leike, J.; Lowe, Ryan J. (2022). „Training language models to follow instructions with human feedback“. arXiv:2203.02155 cs.CL.
  • Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (21. Juni, 2016). „Concrete Problems in AI Safety“. arXiv:1606.06565 cs.AI.
  • Doshi-Velez, Finale; Kim, Been (2. März 2017). „Towards A Rigorous Science of Interpretable Machine Learning“. arXiv:1702.08608 stat.ML.
  • Mohseni, Sina; Wang, Haotao; Yu, Zhiding; Xiao, Chaowei; Wang, Zhangyang; Yadawa, Jay (7. März 2022). „Taxonomy of Machine Learning Safety: A Survey and Primer“. arXiv:2106.04823 cs.LG.
  • Hendrycks, Dan; Carlini, Nicholas; Schulman, John; Steinhardt, Jacob (16. Juni 2022). „Unsolved Problems in ML Safety“. arXiv:2109.13916 cs.LG.
  • David Manheim, Scott Garrabrant(2018). „Categorizing Variants of Goodhart's Law“. arXiv:1803.04585 cs.AI.
  • Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Madotto, Andrea; Fung, Pascale (1. Februar 2022). „Survey of Hallucination in Natural Language Generation“. ACM Computing Surveys. 55 (12): 1–38. arXiv:2202.03629. doi:10.1145/3571730. S2CID 246652372. Archiviert (Memento
  • Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine; Brunskill, Emma; Brynjolfsson, Erik (12. Juli 2022). „On the Opportunities and Risks of Foundation Models“. Stanford CRFM. arXiv:2108.07258.
  • Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (26. Oktober 2022). „Emergent Abilities of Large Language Models“. Transactions on Machine Learning Research. arXiv:2206.07682. ISSN 2835-8856.
  • Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). „Broken Neural Scaling Laws“. International Conference on Learning Representations (ICLR), 2023.
  • Pan, Alexander; Shern, Chan Jun; Zou, Andy; Li, Nathaniel; Basart, Steven; Woodside, Thomas; Ng, Jonathan; Zhang, Emmons; Scott, Dan; Hendrycks (3. April, 2023). „Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark“. Proceedings of the 40th International Conference on Machine Learning. PMLR. arXiv:2304.03279.
  • Perez, Ethan; Ringer, Sam; Lukošiūtė, Kamilė; Nguyen, Karina; Chen, Edwin; Heiner, Scott; Pettit, Craig; Olsson, Catherine; Kundu, Sandipan; Kadavath, Saurav; Jones, Andy; Chen, Anna; Mann, Ben; Israel, Brian; Seethor, Bryan (19. Dezember 2022). „Discovering Language Model Behaviors with Model-Written Evaluations“. arXiv:2212.09251 cs.CL.
  • Leike, Jan; Martic, Miljan; Krakovna, Victoria; Ortega, Pedro A.; Everitt, Tom; Lefrancq, Andrew; Orseau, Laurent; Legg, Shane (28. November 2017). „AI Safety Gridworlds“. arXiv:1711.09883 cs.LG.
  • Everitt, Tom; Lea, Gary; Hutter, Marcus (21. Mai 2018). „AGI Safety Literature Review“. arXiv:1805.01109 cs.AI.
  • Gao, Leo; Schulman, John; Hilton, Jacob (19. Oktober 2022). „Scaling Laws for Reward Model Overoptimization“. arXiv:2210.10760 cs.LG.
  • Hendrycks, Dan; Burns, Collin; Basart, Steven; Critch, Andrew; Li, Jerry; Song, Dawn; Steinhardt, Jacob (24. Juli 2021). „Aligning AI With Shared Human Values“. International Conference on Learning Representations. arXiv:2008.02275.
  • Perez, Ethan; Huang, Saffron; Song, Francis; Cai, Trevor; Ring, Roman; Aslanides, John; Glaese, Amelia; McAleese, Nat; Irving, Geoffrey (7. Februar 2022). „Red Teaming Language Models with Language Models“. arXiv:2202.03286 cs.CL. Bhattacharyya, Sreejani (14. Februar 2022). "DeepMind's „red teaming“ language models with language models: What is it?". Analytics India Magazine. Archived (Memento vom 13. Februar 2023 im Internet Archive) aus dem Original am 13. Februar 2023. Abgerufen am 23. Juli 2022.
  • Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nisan; Lowe, Ryan; Leike, Jan; Christiano, Paul (27. September 2021). „Recursively Summarizing Books with Human Feedback“. arXiv:2109.10862 cs.CL.
  • Pearce, Hammond; Ahmad, Baleegh; Tan, Benjamin; Dolan-Gavitt, Brendan; Karri, Ramesh (2022). „Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions“. 2022 IEEE Symposium on Security and Privacy (SP). 2022 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA: IEEE. pp. 754–768. arXiv:2108.09293. doi:10.1109/SP46214.2022.9833571. ISBN 978-1-66541-316-9.
  • Christiano, Paul; Shlegeris, Buck; Amodei, Dario (19. Oktober 2018). „Supervising strong learners by amplifying weak experts“. arXiv:1810.08575 cs.LG.
  • Leike, Jan; Krueger, David; Everitt, Tom; Martic, Miljan; Maini, Vishal; Legg, Shane (19. November 2018). „Scalable agent alignment via reward modeling: a research direction“. arXiv:1811.07871.
  • Saunders, William; Yeh, Catherine; Wu, Jeff; Bills, Steven; Ouyang, Long; Ward, Jonathan; Leike, Jan (13. Juni 2022). „Self-critiquing models for assisting human evaluators“. arXiv:2206.05802 cs.CL. Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; Askell, Amanda; Kernion, Jackson; Jones, Andy; Chen, Anna; Goldie, Anna; Mirhoseini, Azalia; McKinnon, Cameron; Chen, Carol; Olsson, Catherine; Olah, Christopher; Hernandez, Danny; Drain, Dawn (15. Dezember 2022). „Constitutional AI: Harmlessness from AI Feedback“. arXiv:2212.08073 cs.CL.
  • Evans, Owain; Cotton-Barratt, Owen; Finnveden, Lukas; Bales, Adam; Balwit, Avital; Wills, Peter; Righetti, Luca; Saunders, William (13. Oktober 2021). „Truthful AI: Developing and governing AI that does not lie“. arXiv:2110.06674 cs.CY.
  • Alford, Anthony (13. Juli 2021). „EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J“. InfoQ. Archiviert (Memento vom 10. Februar 2023 im Internet Archive) vom Original am 10. Februar 2023. Abgerufen am 23. Juli 2022. Rae, Jack W.; Borgeaud, Sebastian; Cai, Trevor; Millican, Katie; Hoffmann, Jordan; Song, Francis; Aslanides, John; Henderson, Sarah; Ring, Roman; Young, Susannah; Rutherford, Eliza; Hennigan, Tom; Menick, Jacob; Cassirer, Albin; Powell, Richard (21. Januar 2022). „Scaling Language Models: Methods, Analysis & Insights from Training Gopher“. arXiv:2112.11446.
  • Nakano, Reiichiro; Hilton, Jacob; Balaji, Suchir; Wu, Jeff; Ouyang, Long; Kim, Christina; Hesse, Christopher; Jain, Shantanu; Kosaraju, Vineet; Saunders, William; Jiang, Xu; Cobbe, Karl; Eloundou, Tyna; Krueger, Gretchen; Button, Kevin (1. Juni 2022). „WebGPT: Browser-assisted question-answering with human feedback“. arXiv:2112.09332 cs.CL. Kumar, Nitish (23. Dezember 2021). „OpenAI Researchers Find Ways To More Accurately Answer Open-Ended Questions Using A Text-Based Web Browser“. MarkTechPost. Archiviert (Memento
  • Askell, Amanda; Bai, Yuntao; Chen, Anna; Drain, Dawn; Ganguli, Deep; Henighan, Tom; Jones, Andy; Joseph, Nicholas; Mann, Ben; DasSarma, Nova; Elhage, Nelson; Hatfield-Dodds, Zac; Hernandez, Danny; Kernion, Jackson; Ndousse, Kamal (9. Dezember 2021). „A General Language Assistant as a Laboratory for Alignment“. arXiv:2112.00861 cs.CL.
  • Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon (22. Juli 2020). „Language Models are Few-Shot Learners“. arXiv:2005.14165 cs.CL. Laskin, Michael; Wang, Luyu; Oh, Junhyuk; Parisotto, Emilio; Spencer, Stephen; Steigerwald, Richie; Strouse, D. J.; Hansen, Steven; Filos, Angelos; Brooks, Ethan; Gazeau, Maxime; Sahni, Himanshu; Singh, Satinder; Mnih, Volodymyr (25. Oktober 2022). „In-context Reinforcement Learning with Algorithm Distillation“. arXiv:2210.14215 cs.LG.
  • Shah, Rohin; Varma, Vikrant; Kumar, Ramana; Phuong, Mary; Krakovna, Victoria; Uesato, Jonathan; Kenton, Zac (2. November 2022). „Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals“. Medium. arXiv:2210.01790. Abgerufen am 2. April, 2023.
  • Hubinger, Evan; van Merwijk, Chris; Mikulik, Vladimir; Skalse, Joar; Garrabrant, Scott (1. Dezember 2021). „Risks from Learned Optimization in Advanced Machine Learning Systems“. arXiv:1906.01820.
  • Demski, Abram; Garrabrant, Scott (6. Oktober 2020). „Embedded Agency“. arXiv:1902.09469 cs.AI.
  • Everitt, Tom; Ortega, Pedro A.; Barnes, Elizabeth; Legg, Shane (6. September 2019). „Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings“. arXiv:1902.09980 cs.AI.

berkeley.edu

aima.cs.berkeley.edu

cityam.com

dagstuhl.de

drops.dagstuhl.de

deepmind.com

distill.pub

docs.google.com

doi.org

edge.org

elsevier.com

linkinghub.elsevier.com

erichorvitz.com

forbes.com

futureoflife.org

gcrinstitute.org

georgetown.edu

cset.georgetown.edu

gov.uk

handle.net

hdl.handle.net

harvard.edu

ui.adsabs.harvard.edu

ieee.org

ieeexplore.ieee.org

infoq.com

jair.org

longtermrisk.org

lukemuehlhauser.com

machinethoughts.wordpress.com

marktechpost.com

medium.com

deepmindsafetyresearch.medium.com

medium.com

mit.edu

direct.mit.edu

mlr.press

proceedings.mlr.press

  • Langosco, Lauro Langosco Di; Koch, Jack; Sharkey, Lee D.; Pfau, Jacob; Krueger, David (28. Juni 2022). „Goal Misgeneralization in Deep Reinforcement Learning“. Proceedings of the 39th International Conference on Machine Learning. International Conference on Machine Learning. PMLR. pp. 12004–12019. Abgerufen am 11. März 2023.

neurips.cc

proceedings.neurips.cc

  • Zhuang, Simon; Hadfield-Menell, Dylan (2020). „Consequences of Misaligned AI“. Advances in Neural Information Processing Systems. Vol. 33. Curran Associates, Inc. pp. 15763–15773. Abgerufen am 11. März 2023.

nih.gov

ncbi.nlm.nih.gov

nscai.gov

  • NSCAI Final Report (PDF; 14 MB). Washington, DC: The National Security Commission on Artificial Intelligence. 2021. Archiviert (PDF) vom Original (Memento vom 15. Februar 2023 im Internet Archive) am 15. Februar 2023. Abgerufen am 17. Oktober 2022.

nytimes.com

nyu.edu

bhr.stern.nyu.edu

openai.com

openreview.net

pearson.com

penguinrandomhouse.com

quantamagazine.org

reddit.com

reuters.com

safe.ai

sagepub.com

journals.sagepub.com

science.org

scientificamerican.com

scottaaronson.blog

semanticscholar.org

api.semanticscholar.org

springer.com

link.springer.com

stanford.edu

fsi.stanford.edu

technologyreview.com

theguardian.com

theregister.com

towardsdatascience.com

un.org

unite.ai

venturebeat.com

vetta.org

vice.com

washingtonpost.com

web.archive.org

whatweowethefuture.com

wikipedia.org

en.wikipedia.org

  • Gao, Leo; Schulman, John; Hilton, Jacob (19. Oktober 2022). „Scaling Laws for Reward Model Overoptimization“. arXiv:2210.10760 cs.LG.

wiley.com

onlinelibrary.wiley.com

worldcat.org

wsj.com