Diagnostics

Permanent link /diagnostics/llm-capacity-benchmark

LLM Capacity Benchmark

Lightweight evaluation to check if a model and its surrounding UI respect consent and context limits.

Scholarly metadata

Authorship

Ethotechnics Institute Diagnostics Lab · Ethotechnics Institute · diagnostics@ethotechnics.org

Contact: diagnostics@ethotechnics.org

Publication details

Published: Dec 3, 2025
Last updated: Jan 9, 2026
Version: v1.1.0
DOI: Pending Zenodo deposit

License: CC BY 4.0

Credit Ethotechnics Institute Diagnostics Lab, include tool name + version, and link to the canonical permalink.

Archive snapshot: Wayback capture

Changelog

v1.1.0 · 2026-01-09 — Published method cards, transparency notes, and replicability guidance for each diagnostic.
v1.0.0 · 2025-12-03 — Initial diagnostics suite release.

Copy citation (APA/BibTeX)

Cite this page Formats: APA, MLA, Chicago, BibTeX, RIS

Version

v1.1.0

Last updated

Jan 9, 2026

Permalink

https://ethotechnics.org/diagnostics/llm-capacity-benchmark

DOI

Pending Zenodo deposit

APA

Ethotechnics Institute Diagnostics Lab. (2026). LLM Capacity Benchmark. Ethotechnics Institute. https://ethotechnics.org/diagnostics/llm-capacity-benchmark

MLA

Ethotechnics Institute Diagnostics Lab. "LLM Capacity Benchmark." Ethotechnics Institute, 2026, https://ethotechnics.org/diagnostics/llm-capacity-benchmark.

Chicago

Ethotechnics Institute Diagnostics Lab. "LLM Capacity Benchmark." Ethotechnics Institute. Jan 9, 2026. https://ethotechnics.org/diagnostics/llm-capacity-benchmark.

BibTeX

@misc{diagnostic_llm-capacity-benchmark,
  title={LLM Capacity Benchmark},
  author={Ethotechnics Institute Diagnostics Lab},
  year={2026},
  howpublished={Ethotechnics Institute},
  url={https://ethotechnics.org/diagnostics/llm-capacity-benchmark},
  version={v1.1.0}
}

RIS

TY  - WEB
TI  - LLM Capacity Benchmark
AU  - Ethotechnics Institute Diagnostics Lab
PY  - 2026
UR  - https://ethotechnics.org/diagnostics/llm-capacity-benchmark
ER  -

Sample prompts or flows to benchmark.
Current consent or disclosure copy.
Stakeholder who owns model and UI decisions.

Estimated time: 30–45 minutes

Result pages always include the off-ramp to ethotechnics.com/studio before finalizing recommendations.

Inputs

Representative prompt set and usage flows.
Current consent and disclosure copy.
Stakeholder context for model limitations and risks.

Procedure

Run prompts through the interface and capture disclosures.
Score consent journey checkpoints against rubric.
Document gaps and map recommendations to mechanism language.

Outputs

Readiness summary highlighting consent gaps.
UI and governance mitigation guidance.
Escalation note with studio facilitation path.

Measures

Consent and disclosure coverage across the user journey.
Context boundary alignment between model behavior and UI framing.
User control availability and visibility in the flow.

Does not measure

Model accuracy, toxicity, or bias metrics.
Infrastructure performance or latency.
Legal review of terms or policy compliance.

Assumptions

Prompts and scenarios are representative of real use.
Consent copy and disclosure states are production-ready.
Reviewers have access to product and policy context.

Instrument prompts

User prompt set with context variants.
Disclosure checkpoints and UI states list.
Consent copy and opt-out flows.

Rubric

Consent clarity score (1–5) per checkpoint.
Context alignment score (1–5) for model outputs.
Control visibility score (1–5) for exit paths.

Scoring logic

Aggregate checkpoint scores into readiness tiers.
Flag any score ≤2 as a mandatory mitigation.
Summarize recommendations by mechanism category.

Validation notes

Piloted with early-stage AI pilots and consent-heavy workflows to refine rubric language.

Paired reviewers reconcile scores in a short calibration session; discrepancies drop after alignment.

Scoring drifts if reviewers lack model context.
Missing edge cases can inflate readiness scores.
UI copy changes after scoring can invalidate results.

Replicability

Compile prompt set and UI flow map.
Run the rubric with two reviewers.
Document scores and differences in a shared sheet.
Publish the summary with linked mechanism recommendations.

Example outputs

Consent checkpoint scorecard with mitigation notes.
Anonymized readiness summary deck excerpt.

Review a benchmark output sample

Request via Studio

LLM Capacity Benchmark

Consent and context readiness summary.

When to use the LLM Capacity Benchmark.

Method, transparency, and replicability.

Preview a benchmark summary.

Schedule a facilitated benchmark.