{"openapi":"3.1.0","info":{"title":"arXiv API","version":"1.0.0","description":"Search the entire arXiv scholarly-preprint corpus as an API — millions of papers across physics, mathematics, computer science, quantitative biology and finance, statistics, electrical engineering and economics. Query by free text, title, author and/or subject category (e.g. q=transformer&category=cs.AI), with paging and sort by relevance, submission or last-update date, or pull full metadata for any paper by its arXiv id (e.g. 1706.03762 → \"Attention Is All You Need\"). Every result carries the title, full author list, abstract, primary and cross-list categories, DOI, journal reference, comments and a direct PDF link. Ideal for literature-review and research tools, citation managers, ML/AI paper trackers, academic search and discovery, and science newsletters.","contact":{"name":"PremiumApi","url":"https://www.oanor.com/by/premiumapi"}},"servers":[{"url":"https://api.oanor.com/arxiv-api","description":"oanor gateway"}],"tags":[{"name":"Papers"},{"name":"Reference"},{"name":"Meta"}],"components":{"securitySchemes":{"oanorKey":{"type":"apiKey","in":"header","name":"x-oanor-key","description":"Get your key at https://www.oanor.com/developer/keys"}}},"security":[{"oanorKey":[]}],"paths":{"/v1/paper":{"get":{"operationId":"get_v1_paper","tags":["Papers"],"summary":"Full metadata for one or more papers by arXiv id","description":"","parameters":[{"name":"id","in":"query","required":true,"description":"arXiv id(s), comma-separated up to 20, e.g. 1706.03762","schema":{"type":"string"},"example":"1706.03762"}],"security":[{"oanorKey":[]}],"responses":{"200":{"description":"OK","content":{"application/json":{"example":{"data":{"paper":{"id":"1706.03762v7","title":"Attention Is All You Need","authors":["Ashish Vaswani","Noam Shazeer","Niki Parmar","Jakob Uszkoreit","Llion Jones","Aidan N. Gomez","Lukasz Kaiser","Illia Polosukhin"],"comment":"15 pages, 5 figures","pdf_url":"https://arxiv.org/pdf/1706.03762v7","summary":"The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.","updated":"2023-08-02T00:41:18Z","published":"2017-06-12T17:57:34Z","categories":["cs.CL","cs.LG"],"abstract_url":"http://arxiv.org/abs/1706.03762v7","primary_category":"cs.CL"}},"meta":{"timestamp":"2026-06-01T00:04:40.463Z","request_id":"46a34ebe-3e5f-4493-95e3-772b3d2bd437"},"status":"ok","message":"Paper retrieved","success":true}}}},"401":{"description":"Missing or invalid x-oanor-key header"},"402":{"description":"Active subscription required"},"429":{"description":"Rate-limit or monthly quota reached"},"502":{"description":"Upstream did not respond"}}}},"/v1/search":{"get":{"operationId":"get_v1_search","tags":["Papers"],"summary":"Search the arXiv corpus","description":"","parameters":[{"name":"q","in":"query","required":false,"description":"Free-text query (all fields), e.g. transformer","schema":{"type":"string"},"example":"transformer"},{"name":"category","in":"query","required":false,"description":"Subject category, e.g. cs.AI, math.CO, stat.ML","schema":{"type":"string"},"example":"cs.AI"},{"name":"author","in":"query","required":false,"description":"Author name, e.g. Hinton","schema":{"type":"string"}},{"name":"title","in":"query","required":false,"description":"Title term","schema":{"type":"string"}},{"name":"sort","in":"query","required":false,"description":"relevance | submitted | updated (default relevance)","schema":{"type":"string"},"example":"submitted"},{"name":"start","in":"query","required":false,"description":"Pagination offset (0-30000)","schema":{"type":"string"},"example":"0"},{"name":"max_results","in":"query","required":false,"description":"Results per page (1-100, default 10)","schema":{"type":"string"},"example":"10"}],"security":[{"oanorKey":[]}],"responses":{"200":{"description":"OK","content":{"application/json":{"example":{"data":{"sort":"submitted","count":10,"query":"all:transformer AND cat:cs.AI","start":0,"total":20611,"papers":[{"id":"2605.31590v1","title":"TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation","authors":["Ruotong Liao","Guowen Huang","Qing Cheng","Guangyao Zhai","Lei Zhang","Xun Xiao","Thomas Seidl","Daniel Cremers","Volker Tresp"],"comment":"17 pages, 13 figures","pdf_url":"https://arxiv.org/pdf/2605.31590v1","summary":"Text-to-video (T2V) generation faces challenging questions when generating videos with long horizons containing multiple events. Inspired by the intrinsics of the diffusion process, we probe video diffusion transformers (DiTs) and uncover intrinsic turning points in the DiT denoising trajectory where conditioning text affects generation from global layout to fine-grained details. Building on this finding, we present TunerDiT, a simple yet effective progressive steering method that requires no additional training for multi-event generation. TunerDiT comprises two steering handles: (1) Event-Partitioned Masking that enforces event boundaries while allowing cross-event transition bands; (2) Cross-Event Prompt Fusion that injects neighboring event semantics for late-stage refinement. We contribute a self-curated prompt suite for benchmarking multi-event generation, i.e., Meve. TunerDiT achieves state-of-the-art performance across 8 metrics and offers a tunable trade-off between video consistency and event separation, compared with other training-free methods. The improvement in text alignment increases with the event count, indicating a scaling possibility with increasing event count.","updated":"2026-05-29T17:56:09Z","published":"2026-05-29T17:56:09Z","categories":["cs.CV","cs.AI"],"abstract_url":"http://arxiv.org/abs/2605.31590v1","primary_category":"cs.CV"},{"id":"2605.31564v1","title":"What Gets Unmasked First? Trajectory Analysis of Diffusion Models for Graph-to-Text Generation","authors":["Qing Wang","Jacob Devasier","Chengkai Li"],"pdf_url":"https://arxiv.org/pdf/2605.31564v1","summary":"We present the first systematic study of masked diffusion language models (MDLMs) for graph-to-text generation. We analyze MDLM generation trajectories -- the order in which tokens are unmasked during iterative decoding -- and find that, unlike autoregressive LLMs which generate text linearly, MDLMs naturally prioritize entities first, followed by relational and function words, with structural tokens resolved last. We further identify a previously undocumented failure mode of supervised fine-tuning: SFT disrupts this strategy by prematurely anchoring structural sentence-ending tokens early in the decoding trajectory, effectively fixing the output length which can lead to omitted or hallucinated information. To address this, we propose lambda-scaled structural decoding, a training-free inference-time modification that downweights structural token confidence and recovers +9.4 BLEU-4. Finally, we introduce Graph-LLaDA, which integrates a Graph Transformer encoder into LLaDA's decoding process to explicitly incorporate relational graph structure. Cross-dataset evaluation on LAGRANGE reveals that previous baselines overfit to dataset-specific patterns, while LLM- and MDLM-based approaches generalize significantly better.","updated":"2026-05-29T17:29:35Z","published":"2026-05-29T17:29:35Z","categories":["cs.CL","cs.AI"],"abstract_url":"http://arxiv.org/abs/2605.31564v1","primary_category":"cs.CL"},{"id":"2605.31558v1","title":"Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization","authors":["Felipe Urrutia","Juan José Alegría","Cinthia Sanchez Macias","Jorge Salas","Cristian B. Calderon","Cristobal Rojas"],"pdf_url":"https://arxiv.org/pdf/2605.31558v1","summary":"Transformer-based language models are widespread in today's society. As such, understanding the mechanisms by which they solve structured tasks and predicting how they may behave in novel scenarios is of great importance for safe deployment. We study the learning dynamics of attention heads in a controlled setting by training a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks: a number task requiring positional reasoning and a letter task requiring symbolic reasoning. Using a recently introduced metric that classifies attention-head behavior as positional or symbolic for a given prompt, we show that successful learning is associated with the emergence of pure heads, i.e., heads that express themselves as either positional or symbolic. Despite the tasks' structural equivalence, they impose different mechanistic demands: the number task requires both positional and symbolic heads, whereas the letter task requires only symbolic heads. We then identify the computational roles of these heads, characterize the basic functions they implement, and give theoretical constructions showing how single-layer RoPE-based attention can realize these functions through geometrically interpretable query, key, and value operations. This analysis yields a quantitative separation between positional and symbolic mechanisms in their robustness to longer sequences, formalized through a novel notion of discrepancy. We empirically validate the resulting predictions in both controlled and real-world models, showing that symbolic mechanisms extrapolate more reliably to longer sequences while positional mechanisms face sharper limitations.","updated":"2026-05-29T17:22:04Z","published":"2026-05-29T17:22:04Z","categories":["cs.LG","cs.AI"],"abstract_url":"http://arxiv.org/abs/2605.31558v1","primary_category":"cs.LG"},{"id":"2605.31535v1","title":"RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video","authors":["Ulrich Prestel","Stefan Andreas Baumann","Nick Stracke","Björn Ommer"],"comment":"Project Page: https://compvis.github.io/rayder","pdf_url":"https://arxiv.org/pdf/2605.31535v1","summary":"Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data, largely due to the brittleness of training on realistic videos and the hard-to-predict scaling behavior of multi-network system designs. We introduce RayDer, a unified, feed-forward transformer that consolidates camera estimation, scene reconstruction, and rendering into a single backbone, turning self-supervised NVS into a well-posed single-model scaling problem. A minimal dynamic state, treated as a nuisance factor, absorbs time-varying content and enables stable training on unconstrained real-world video. Importantly, RayDer keeps static-scene NVS as its target task: dynamic content is leveraged purely as scalable supervision, not reconstructed as in dynamic-scene (4D) NVS. Across multiple model sizes and orders of magnitude in data, RayDer exhibits clean power-law scaling with data and compute, and outperforms static-scene data mixtures. On a large number of benchmarks, RayDer achieves strong zero-shot open-set performance competitive with state-of-the-art supervised approaches. Project Page: https://compvis.github.io/rayder","updated":"2026-05-29T16:50:27Z","published":"2026-05-29T16:50:27Z","categories":["cs.CV","cs.AI","cs.LG"],"abstract_url":"http://arxiv.org/abs/2605.31535v1","primary_category":"cs.CV"},{"id":"2605.31500v1","title":"On Efficient Scaling of GNNs via IO-Aware Layers Implementations","authors":["Daria Fomina","Daniil Krasylnikov","Alexey Boykov","Andrey Dolgovyazov","Vyacheslav Zhdanovskiy","Fedor Velikonivtsev"],"comment":"International Conference on Machine Learning (ICML) 2026, Spotlight Paper","pdf_url":"https://arxiv.org/pdf/2605.31500v1","summary":"Graph Neural Networks (GNNs) are bottlenecked by sparse, irregular memory access. Popular frameworks such as DGL and PyTorch Geometric support general message passing, but complex layers often materialize edge-wise intermediates, increasing memory traffic and limiting scalability on large graphs. We take an I/O- and arithmetic-intensity--centric view and show that widely used layers fall into three kernel families: SpMM-based convolutions, reduction-based aggregations, and attention-based layers (GATv2/Graph Transformer). For each family, we develop GPU kernels that reduce data movement, improve locality, and remain robust across realistic graphs. We also study graph reordering and find that its impact depends on the kernel mapping: it benefits neighbor-parallel (gather-dominated) kernels more consistently than feature-parallel designs. Empirically, our fused attention kernels reach up to $\\textbf{3.9}\\times$ speedup for Graph Transformer (median $\\textbf{1.6}\\times$), with Tensor Core (block-sparse) variants up to $\\textbf{7.3}\\times$ on locally dense graphs; for GATv2 we reach up to $\\textbf{8.5}\\times$ speedup (median $\\textbf{2.0}\\times$) while reducing peak memory by up to $\\textbf{76}\\times$ (median $\\textbf{6}\\times$). Our degree-aware reduction kernels achieve up to $\\textbf{10}\\times$ speedup (median $\\textbf{2.6}\\times$). For SpMM-based layers, properly cached cuSPARSE achieves up to $\\textbf{8}\\times$ speedup over DGL and outperforms evaluated custom baselines in the majority of evaluations. We release our implementations as drop-in replacements to support reproducible, hardware-aware GNN acceleration.","updated":"2026-05-29T16:22:45Z","published":"2026-05-29T16:22:45Z","categories":["cs.LG","cs.AI"],"abstract_url":"http://arxiv.org/abs/2605.31500v1","primary_category":"cs.LG"},{"id":"2605.31393v1","title":"Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models","authors":["Pedro Dal Bianco","Jean Paul Nunes Reinhold","Oscar Stanchi","Facundo Quiroga","Franco Ronchetti","Ulisses Brisolara Corrêa"],"comment":"Accepted at GenSign (https://genai4sl.github.io/) at CVPR 2026. Non proceedings track","pdf_url":"https://arxiv.org/pdf/2605.31393v1","summary":"Sign language translation (SLT) remains constrained by limited paired sign-video/text corpora and heavy-tailed target vocabularies. We study target-side augmentation in which GPT-4o generates controlled paraphrase variants of reference sentences while the sign input remains unchanged. A Signformer-style pose-based Transformer is trained under a two-stage schedule: pre-training on the augmented corpus followed by fine-tuning on the original references. We evaluate on three datasets spanning complementary challenges: PHOENIX14T (German Sign Language), with moderate lexical diversity; GSL (Greek Sign Language), with highly ontrolled, repetitive recordings; and LSA-T (Argentinian Sign Language), with severe long-tail sparsity. On PHOENIX14T, augmentation improves BLEU-4 from 9.56 to 10.33. The near-saturated GSL baseline and extremely sparse LSA-T setting reveal the limits of the approach. To our knowledge, this is the first study to apply LLM-generated target-side araphrases and LLM-as-a-Judge evaluation to SLT. The semantic evaluation reveals gains in fidelity that lexical overlap metrics understate.","updated":"2026-05-29T14:58:21Z","published":"2026-05-29T14:58:21Z","categories":["cs.CL","cs.AI"],"abstract_url":"http://arxiv.org/abs/2605.31393v1","primary_category":"cs.CL"},{"id":"2605.31295v1","title":"Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation","authors":["Ioannis Prokopiou","Pantelis Vikatos","Maximos Kaliakatsos-Papakostas","Theodoros Giannakopoulos","Themos Stafylakis"],"comment":"Accepted at EUSIPCO 2026 (34th European Signal Processing Conference), 5 pages, 2 figures","pdf_url":"https://arxiv.org/pdf/2605.31295v1","summary":"Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.","updated":"2026-05-29T13:31:56Z","published":"2026-05-29T13:31:56Z","categories":["cs.SD","cs.AI","cs.IR","cs.LG"],"abstract_url":"http://arxiv.org/abs/2605.31295v1","primary_category":"cs.SD"},{"id":"2605.31286v1","title":"DeMaVLA: A Vision-Language-Action Foundation Model for Generalizable Deformable Manipulation","authors":["Taiyi Su","Jian Zhu","Tianjian Wang","Youzhang He","Zitai Huang","Jianjun Zhang","Chong Ma","Hanyang Wang","Tianjiao Zhang","Munan Yin","Weihao Ding","Yi Xu"],"comment":"14 pages, 2 figures","pdf_url":"https://arxiv.org/pdf/2605.31286v1","summary":"Real-world household robots require Vision-Language-Action (VLA) foundation models that can acquire reusable manipulation skills across diverse objects, task conditions, and household environments. Deformable-object folding is a representative challenge, requiring robots to handle clothing items from random initial states across varying categories, geometries, materials, and scenes. However, existing VLA systems commonly train separate policies for different object categories, while naively mixed multi-task training often suffers from task interference and degraded performance. To move beyond category-specific folding policies, we introduce DeMaVLA, a VLA foundation model for generalizable Deformable Manipulation. DeMaVLA adopts a VLM backbone with an action expert and formulates continuous action generation using flow matching. To improve efficiency, the action expert is constructed by pruning every other transformer layer while preserving layer-wise alignment with the VLM backbone, reducing training and inference cost. DeMaVLA is first pre-trained on approximately 5,000 hours of selected real-world dual-arm demonstrations to acquire general manipulation priors. It is then post-trained on mixed folding data that aggregates self-collected demonstrations and corrective trajectories from real-robot failures across multiple folding tasks through a human-in-the-loop Data Aggregation~(DAgger) pipeline. Experiments show that DeMaVLA achieves competitive performance on RoboTwin and strong real-world results on our household folding benchmark. These results highlight the value of scalable real-world data, efficient action generation, and corrective learning for general-purpose VLA policies in deformable-object manipulation.","updated":"2026-05-29T13:20:08Z","published":"2026-05-29T13:20:08Z","categories":["cs.RO","cs.AI"],"abstract_url":"http://arxiv.org/abs/2605.31286v1","primary_category":"cs.RO"},{"id":"2605.31196v1","title":"Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration","authors":["Jun Wang","Xiaohao Xu","Xiaonan Huang"],"comment":"31 pages, 9 figures","pdf_url":"https://arxiv.org/pdf/2605.31196v1","summary":"Safe human--robot collaboration requires more than visual description: a monitor must determine whether the robot body is safely separated, already colliding with the scene or a person, or about to collide. We call this capability collision grounding: binding visual observations to robot body geometry, camera viewpoint, scene layout, human proximity, and temporal motion in order to infer present and imminent contact. We introduce TouchSafeBench, a physics-grounded benchmark for evaluating collision grounding in vision-language models (VLMs). Built in Habitat~3.0, TouchSafeBench contains 2,940 simulated indoor co-presence episodes across social navigation and social rearrangement, with synchronized multi-view RGB-D observations, top-down trajectory maps, calibrated camera metadata, and simulator-derived contact labels. We study two deployment-facing tasks: classifying the current safety state and warning about imminent collision before contact. Across three frontier or robotics-oriented VLMs and nine visual representations, current models remain far from reliable: the best average Macro-F1 stays below 50\\%, explicit depth is not automatically transformed into robot-body collision evidence, and robot--scene contact is consistently harder than human-contact risk. TouchSafeBench reveals a central limitation of embodied VLMs: visual fluency does not imply physical accountability. Reliable robot safety monitors will need representations that explicitly bind viewpoint, robot morphology, metric geometry, and future collision. We will release the benchmark upon acceptance.","updated":"2026-05-29T12:04:38Z","published":"2026-05-29T12:04:38Z","categories":["cs.CV","cs.AI","cs.CL","cs.RO"],"abstract_url":"http://arxiv.org/abs/2605.31196v1","primary_category":"cs.CV"},{"id":"2605.31064v1","title":"Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA","authors":["Hao Chen","Xing Tang","Qirui Liu","Weijie Shi","Shiwei Li","Fuyuan Lyu","Weihong Luo","Xiku Du","Xiuqiang He"],"comment":"Accepted by KDD 2026 ADS track","pdf_url":"https://arxiv.org/pdf/2605.31064v1","summary":"Large Language Models (LLMs) have significantly advanced online data services, particularly in the domain of financial question answering (FinQA). However, such systems remain susceptible to numerical reasoning hallucinations, which critically undermine reliability in high-stakes financial applications. Although retrieval-augmented generation (RAG) has been widely adopted to ground responses in external knowledge, it introduces three persistent challenges: noise sensitivity, calculation fragility, and an auditability crisis. Existing model-centric approaches, which primarily focus on optimizing either the retriever or generator in isolation, still struggle to address these issues in an integrated manner. In this work, we pioneer a data-centric paradigm and propose a novel framework, the Data-centric Reasoning Compiler (DCRC). The framework operates through three cohesive phases: (1) adversarial data construction, which synthesizes training examples with controlled noise to teach robustness; (2) multi-stage training that cultivates a Data-centric Structuring Agent (DSA) capable of explicit evidence auditing and program synthesis; and (3) a compile-and-execute inference process, where the DSA transforms user queries and retrieved documents into verifiable, executable reasoning programs. This data-driven framework ensures faithful numerical reasoning by design. We conduct extensive experiments on established offline benchmarks and further validate our framework through deployment in a real-world online financial QA system.","updated":"2026-05-29T09:35:11Z","published":"2026-05-29T09:35:11Z","categories":["cs.IR","cs.AI"],"abstract_url":"http://arxiv.org/abs/2605.31064v1","primary_category":"cs.IR"}]},"meta":{"timestamp":"2026-06-01T17:08:25.241Z","request_id":"4a706d9a-04ca-4e47-9cfe-68bf97f075a4"},"status":"ok","message":"Papers searched","success":true}}}},"401":{"description":"Missing or invalid x-oanor-key header"},"402":{"description":"Active subscription required"},"429":{"description":"Rate-limit or monthly quota reached"},"502":{"description":"Upstream did not respond"}}}},"/v1/categories":{"get":{"operationId":"get_v1_categories","tags":["Reference"],"summary":"The arXiv subject taxonomy","description":"","parameters":[],"security":[{"oanorKey":[]}],"responses":{"200":{"description":"OK","content":{"application/json":{"example":{"data":{"note":"Use a top-level id (e.g. cs) or a sub-category (e.g. cs.AI, math.CO, stat.ML) as the category= filter on /v1/search.","count":20,"categories":[{"id":"cs","name":"Computer Science"},{"id":"math","name":"Mathematics"},{"id":"physics","name":"Physics (general)"},{"id":"astro-ph","name":"Astrophysics"},{"id":"cond-mat","name":"Condensed Matter"},{"id":"gr-qc","name":"General Relativity & Quantum Cosmology"},{"id":"hep-ex","name":"High Energy Physics — Experiment"},{"id":"hep-lat","name":"High Energy Physics — Lattice"},{"id":"hep-ph","name":"High Energy Physics — Phenomenology"},{"id":"hep-th","name":"High Energy Physics — Theory"},{"id":"math-ph","name":"Mathematical Physics"},{"id":"nlin","name":"Nonlinear Sciences"},{"id":"nucl-ex","name":"Nuclear Experiment"},{"id":"nucl-th","name":"Nuclear Theory"},{"id":"quant-ph","name":"Quantum Physics"},{"id":"q-bio","name":"Quantitative Biology"},{"id":"q-fin","name":"Quantitative Finance"},{"id":"stat","name":"Statistics"},{"id":"eess","name":"Electrical Engineering & Systems Science"},{"id":"econ","name":"Economics"}]},"meta":{"timestamp":"2026-06-01T00:04:40.768Z","request_id":"91fdaebe-cef3-4c63-84da-28245c44944d"},"status":"ok","message":"Categories retrieved","success":true}}}},"401":{"description":"Missing or invalid x-oanor-key header"},"402":{"description":"Active subscription required"},"429":{"description":"Rate-limit or monthly quota reached"},"502":{"description":"Upstream did not respond"}}}},"/v1/meta":{"get":{"operationId":"get_v1_meta","tags":["Meta"],"summary":"Source & usage notes","description":"","parameters":[],"security":[{"oanorKey":[]}],"responses":{"200":{"description":"OK","content":{"application/json":{"example":{"data":{"note":"Scholarly preprints. /v1/search = search by q (all fields), title, author and/or category, with start/max_results paging and sort=relevance|submitted|updated (e.g. q=transformer&category=cs.AI); /v1/paper = full metadata for one or more arXiv ids (e.g. id=1706.03762, comma-separate up to 20); /v1/categories = the arXiv taxonomy. Each paper carries title, authors, abstract, categories, DOI, journal ref and a PDF link. Please respect arXiv's terms; data courtesy of arXiv.org.","source":"arXiv API (export.arxiv.org)","endpoints":["/v1/search","/v1/paper","/v1/categories","/v1/meta"]},"meta":{"timestamp":"2026-06-01T00:04:40.837Z","request_id":"3269f376-7bad-4dc8-80ed-925d7cb8a8ef"},"status":"ok","message":"Meta retrieved","success":true}}}},"401":{"description":"Missing or invalid x-oanor-key header"},"402":{"description":"Active subscription required"},"429":{"description":"Rate-limit or monthly quota reached"},"502":{"description":"Upstream did not respond"}}}}},"x-oanor-pricing":[{"slug":"free","name":"Free","price_cents_month":0,"monthly_call_quota":3900,"rps_limit":2,"hard_limit":true},{"slug":"starter","name":"Starter","price_cents_month":560,"monthly_call_quota":58000,"rps_limit":8,"hard_limit":true},{"slug":"pro","name":"Pro","price_cents_month":1440,"monthly_call_quota":290000,"rps_limit":20,"hard_limit":true},{"slug":"mega","name":"Mega","price_cents_month":3790,"monthly_call_quota":1450000,"rps_limit":50,"hard_limit":true}],"x-oanor-marketplace-url":"https://www.oanor.com/api/arxiv-api"}