An Alignment Journal: Coming Soon

Etchings of artificial and natural objects

Contact us_ if you’re interested in participating as an author, reviewer, or editor, or if you know someone who might be.

Cross posted to LessWrong. Please go there for comments and discussion

Experimental Infrastructure for Foundational Alignment Research

This is the first in a series of “build-in-the-open” updates regarding the incubation of a new peer-reviewed journal dedicated to AI alignment. Later updates will contain much more detail, but we want to put this out soon to draw community participation early. Fill out this form to express your interest in participating as an author, reviewer, editor, developer, manager, or board member, or to recommend someone who might be interested.

The Core Bet

Peer review is a crucial public good: it applies scarce researcher time to sort new ideas for focused attention from the community, but is under-supplied because individual reviewers are poorly incentivized. Peer review in alignment research is particularly fragmented. While some parts of the alignment research community are served by existing venues, such as journals and ML conferences, there are significant gaps. These gaps arise from a combination of factors including the lack of appropriate reviewer pools for some kinds of work. Moreover, none of these institutions move as fast we we think they could in this era, mainly because of inertia. Various preprint servers and online forums avoid these problems, but generally at the expense of quality certification and institutional legitimacy. Furthermore, their review coverage can suffer when attention is misallocated due to trends and hype.

Our bet is that we can create a venue that provides institutional leverage (coordination, compensation) and legibility (citations, archival records, stable indexing) without the institutional friction that kills speed. Instead we can operate a small, agile scale that provides dedicated tooling and rapid experimentation.

Operational Design

We are designing the journal around a few specific, high-leverage hypotheses:

  1. Reviewer Attention as the Scarce Resource: The “uncompensated committee” model is flawed. We are experimenting with attributed and paid peer review, calibrated for quality and speed. We will invest in getting a reviewer’s full, focused attention.
  2. The “Reviewer Abstract”: Instead of a binary Accept/Reject, or alternatively an undigested public transcript of the review discussion, we will output signals with higher information density in the review process.1 Accepted papers will ship with a reviewer-written guide: Who is this for? What is the core contribution? What are the specific caveats?
  3. Automation: We are betting that targeted use of LLM-powered automation can ease several steps of the editorial cycle by, e.g., flagging checkable errors, identifying and filtering candidate reviewers, auditing reviewer comments against the paper’s actual content, preemptively asking authors to consider addressing likely reviewer objections, and preparing multi-format publication. Our goal is to avoid wasting editor, author, and reviewer labor on mundane work and make decisions more auditable and reversible.

Our forthcoming formal description of the journal will have much more detail. Contact us to help shape it.

Scope

“AI Alignment” is a broad and often contested label. To provide a high-signal environment from day one, we are making a deliberate choice regarding our starting point:

  • Initial Focus: Foundational Research. At launch, we will lean toward works that contribute conceptual and theoretical understanding of AI Alignment. This includes, but is not limited to advances in the theory of agents, formal safety proofs, computational and learning theoretic properties of AI models, scalable oversight, theoretical underpinnings of interpretability, as well as empirical work that informs any of the above. We’ve chosen this because it is an area commonly reported to be under-served by the current conference cycle.2
  • Gap Strategy: Our priority is work that would benefit from a more rigorous assessment than a blog post but which doesn’t have the right shape for a high chance of acceptance at an ML conference (e.g. which does not advance capability metrics on a widely-accepted benchmark etc). We want to build a home for the careful, often difficult-to-evaluate foundational work that the field relies on for long-term progress. In the long term, the exact shape of this gap will be determined by the editorial board.
  • Academic truth-seeking: Within the topic area scope, papers will be assessed primarily on both theoretical soundness, and the question Does the work deepen our understanding? Although we are motivated to found a journal by the desire for a flourishing future, it is neither feasible nor appropriate for an academic venue to assess papers against their contribution to any political agenda, nor to place excessive certainty in our ability to assess their long-term wider-world impact. We will maintain an ethics review for demonstrable, immediate harms.

This is just a starting point. The current team is not the final arbiter of what constitutes “alignment” for all time. While we are setting the initial direction to get the engine running, the long-term responsibility for expanding, narrowing, or shifting the scope will belong to the editorial board. Our job right now is to build a vessel sturdy enough to support those debates.

Governance

This project is in its incubation phase. As the “plumbing” of the journal grows, editorial and strategic authority will be taken up by an editorial board of respected researchers from the alignment community. The journal will be philanthropically funded, so our funders will naturally influence how the journal develops, but we are committed to building a self-sustaining, public-good institution that belongs to the field.

Advisory board

We are grateful for the advice and support from the initial members of our advisory board:

Institutional stewardship

This project could fail. Poor execution could create a status-chasing bottleneck, further pollute the signal-to-noise ratio in alignment research, or just waste researchers’ time. Poor coordination with other initiatives could hinder rather than help the field.

To reduce this risk, we will engage as a good citizen with the alignment research community. We will track and publish our own performance metrics: turnaround times, reviewer load, and author satisfaction, and solicit assessment by the wider community whether we are participating cooperatively and productively in the publication ecosystem. Continuing the journal will be contingent upon positive community feedback and the editorial board’s continuing reassessment of counterfactually positive impact. Accepted papers will remain online, regardless of the ultimate fate of the project.

Next steps

Join the founding team

A journal is only as good as its community, and you could be part of it. We want participation in the Alignment Journal±as an editor, author, or reviewer—to be credibly status-accruing. This should be a justifiable use of time toward your career goals.

  • Time-Efficient: Respecting your expertise by using automation to remove the grunt work.
  • Visible: Ensuring that high-quality editorial and review work is recognized as a first-class contribution to the field.
  • Impactful: Giving participants a direct hand in shaping the standards and content of alignment research.

If you believe this infrastructure is a missing piece of the safety ecosystem, we want your help.

  • Editors: We need people with the judgment to steer the journal and act as moderators of a fair, rigorous review process.
  • Reviewers: We are building a pool of deep experts across technical and conceptual alignment, interpretability, and governance.
  • Authors: If your work is rigorous and important for AI Alignment, we want to hear what review experience you would value.
  • Governance: If you have experience building high-trust community institutions or designing governance transitions, we specifically want to hear from you.

We’ll soon share an initial description of our design and plans for the journal with much more detail, so reach out now if you’d like to shape it.

Support us online

We welcome you following us on all the usual platforms,

Above all, our content will be hosted at our main site alignmentjournal.org.

Contributors to this document

In addition to the authors, we are grateful to the following people for their support for or feedback on this post.

We are grateful to Geoffrey Irving, Victoria Krakovna and David Duvenaud for their support and feedback on this post.

Participation in this post does not indicate that all authors commit to every detail of the journal strategy outline, in perpetuity. This is the first stage in an ongoing consultation, and we expect to adjust our positions in the face of new evidence about best strategies. The authors do not commit to every detail of the journal strategy outline, in perpetuity. This is the first stage in an ongoing consultation, and we expect to adjust our positions in the face of new evidence about best strategies. All responsibility for mistakes in content or execution resides with the current managing editors, Jess Riedel and Dan MacKinlay.


  1. We intend to experiment with a variety of possible ratings, certifications and other quality signals. This is our starting proposal, as it is one we have some experience with. ↩︎

  2. The practical implications of the emphasis on achieving State-of-the-Art results on benchmarks in machine learning research is complicated and contentious, and, we argue, not yet well understood even inside the field. For an opinionated introduction, see Moritz Hardt’s book, The Emerging Science of Machine Learning Benchmarks. ↩︎