Sign Languages are the primary means of communication for millions of deaf individuals worldwide [1], [2]. Isolated Sign Language Recognition (ISLR) remains an open research area at the intersection of computer vision (CV) and natural language processing (NLP) [1], [3]. Similar to spoken language research, there is a large discrepancy in the efficacy of state-of-the-art solutions between high-resource and low-resource languages. Unlike the widely studied American (ASL), British (BSL) and Chinese Sign Language (CSL), Italian Sign Language (LIS) remains under-resourced, lacking a large-scale, annotated corpora required for the training of a deep neural network that can recognise it effectively [2].

To overcome this limitation, recent research has pivoted toward transfer learning and few-shot recognition, leveraging models pre-trained on large multilingual datasets [4]. One of these models is SignCLIP, which utilises contrastive learning to project spoken language text and sign language videos into a shared embedding space. It is pre-trained on Spreadthesign, a dataset containing approximately 500,000 video clips in up to 44 different sign languages [1], including LIS. However, downstream evaluations for LIS recognition and text-video retrieval tasks were entirely omitted in its original benchmarks [1].

Consequently, the applicability of these multi-lingual priors to low-resource LIS datasets remains unexamined. To address this research gap, we present the first investigation for LIS that explores the performance of zero-shot, few-shot, and fine-tuning paradigms using SignCLIP as a foundation model, evaluating both cross-modal Video-Text retrieval and ISLR. For this evaluation, we used two datasets: A3LIS-147, introduced in [5], and SignIT [2]. These datasets enable complementary evaluations by contrasting a controlled, balanced multi-signer environment with domain-specific vocabulary against naturalistic, unbalanced, core-vocabulary signs, respectively.

Related works

Italian sign language recognition

LIS ISLR research has largely targeted small-scale, controlled settings, utilising the A3LIS-147 dataset as the primary benchmark [6], [7]. Early approaches with Hidden Markov Models (HMMs) have been improved upon by more recent work reaching an accuracy of 80.4% with fully-supervised CNN models (Inception3D and SlowFast) [6].

The above work suffers from several structural limitations in the context of scalable SLR. The architectures employed are incapable of adapting to out-of-dictionary vocabulary without retraining. Furthermore, they are optimised for clean artifact-free datasets, potentially suffering in performance with out-of-distribution noisy data seen during real-world deployment [8].

To address the latter, the SignIT dataset was recently introduced to benchmark LIS ISLR on real-world data. Baseline evaluations of the SignIT dataset demonstrate that current state-of-the-art approaches struggle to effectively classify LIS signs at the gloss level, as opposed to the categorical level [2]¹.

Zero-shot, few-shot and cross-lingual recognition

Early Zero-Shot SLR attempts struggled due to cross-lingual complexities between signs and natural language, as well as high variation in sign execution [10], [11], resulting in a pivot towards few-shot, visual retrieval paradigms [4].

Bilge et al. introduced Few-Shot Sign Language Recognition (FSSLR) via a meta-learning framework across sign languages, proving sparse source examples can generalise to unseen target languages. They discovered “synonym” subsets between languages failed to yield higher performance, suggesting signs are heavily diversified rather than net-iconic [4].

Similarly, Vandendriessche et al. (2025) embedded pose key points for distance-based visual retrieval, enabling one-shot ISLR that generalises to out-of-domain vocabularies without any retraining. Both frameworks operate entirely within a visual domain; achieving high cross-lingual transferability, but lack any inherent coupling to natural language text or semantic meaning [8].

In contrast, Cheng et al. utilise contrastive learning in CiCo to model retrieval as a cross-lingual problem, successfully aligning a single sign language video modality directly to a spoken language text space (e.g., ASL to English). It trains a domain-agnostic sign encoder before the domain-aware retrieval. [12].

SignCLIP, multilingual corpora, and multilingual sign language

SignCLIP aligns multilingual signs to a single text space in English (as a matter of efficiency). Their work relies on the ‘Iconicity Hypothesis’ - that universal motion primitives are semantically shared across sign languages, and adapts the distributional hypothesis to sign language. The model captures the core meaning of a sign as a ‘cluster centre’ in the embedding space, preserving the individual variance of different signers. However, mapping these clusters is made difficult by the Spreadthesign corpus, which is skewed to only one video per sign per language [1].

SignCLIP uses cross-lingual contrastive learning with prefixed language identifying tokens, e.g ‘<en> <ase> {word}’ for ASL. Ultimately, the authors note that the model’s zero-shot performance on out-of-domain data is deficient, and they posit that few-shot learning or fine-tuning is necessary to achieve noticeable performance [1].

The authors do not investigate the underlying architectural or semantic mechanisms that cause this failure, leaving the specific limitations of their cross-modal alignment unexamined.²

Datasets

A3LIS-147 characteristics

SignIT characteristics

Preprocessing

We follow the same pipeline used for the training of the frozen SignCLIP backbone [1], including:

Methodology

We investigate whether SignCLIP’s multilingual pretraining generalises to LIS, a language present in the pretraining corpus but excluded from the original evaluation. Our approach tests this through three phases: Zero-shot evaluation, to assess the frozen multilingual prior’s native LIS structure; few-shot adaptation, to evaluate whether this structure supports recognition from a minimal number of examples using a frozen backbone; and a fine-tuning ablation, to determine the performance ceiling of lightweight fine-tuning whilst preserving cross-modal alignment.

Dual-dataset evaluation

Zero-shot evaluation

The zero-shot evaluation applies the frozen SignCLIP checkpoint directly to both datasets. Predictions are generated by computing the cosine similarity between the video embedding and the text embedding of the English gloss (prompted as <en> <lis> [gloss]).

We report Recall@1, 5, 10, and Median Rank. To better investigate the “Iconicity Hypothesis” and transfer ability, we perform a per-class analysis stratified by Category, Median Rank, qualitative ASL/BSL similarity (iconicity proxy), and Spreadthesign presence.

Translation. We manually translated A3LIS-147 using Spreadthesign. Remaining out-of-vocabulary (OOV) terms were translated as accurately as possible. We also recreated the unavailable categories. Both are listed in Appendix D.

Few-shot evaluation

We evaluate few-shot ISLR to determine whether the solid results reported in the SignCLIP paper generalise to LIS.

Fine-tuning and loss function ablation on A3LIS-147

We initialise from the baseline checkpoint and fine-tune on A3LIS-147 using a 70/10/20 signer-stratified split. This ensures that our evaluation measures generalisation to unseen signers (see Appendix D for the exact partition). Each configuration is trained for 50 epochs and evaluated across zero-shot retrieval, linear probing, and prototypical retrieval. For all the details about the hyperparameters used, see Appendix B.

The text Transformer

𝑓_{𝜃_{𝑡}}

and CNN backbone

𝑓_{𝜃_{CNN}}

are frozen to preserve pre-trained semantic anchors. We unfreeze the visual adaptation parameters

Θ_{adapt} = {𝜃_{MLP}, 𝜃_{𝑣}, 𝜏}

, denoting the video token MLP, video Transformer encoder, and logit-scale temperature, respectively.

Because contrastive models exhibit high sensitivity to objectives and batch scales on low-resource datasets, we conduct an ablation evaluating the following optimisation regimes:

SignIT fine-tuning

The single best-performing fine-tuning regime identified on A3LIS-147 is applied to SignIT (details in Appendix B). To address the dataset’s naturalistic acquisition and long-tailed distribution, we apply light spatial augmentation to preserve semantic meaning, and heavier temporal augmentation (aug_sigma_temporal: 0.25, aug_sigma_spatial: 0.15, aug_sigma_noise: 0.002, aug_p_flip: 0.0,aug_strength_max: 3.5). SignIT’s richer macro-categories, and its previous literature motived additional experiments on category zero-shot and few-shot retrieval. For these experiments, we include recall, precision, and F1 alongside R@1 for better comparison with the original authors.

Experiments

Zero-shot complete-dataset evaluations

Baseline zero-shot evaluations in Table 1 and Table 2 show poor overall performance, in line with Jiang et al. findings for out-of-domain transfer [1]. However, there is stratification between categories. In SignIT, the ‘Food’ domain achieves the highest exact retrieval (10.96% R@1), while ‘Emotions’ demonstrates superior neighbourhood alignment (51.60% R@10). Similarly, A3LIS-147 exhibits a split between early recall (‘Common Life’, 7.22% R@1) and broader neighbourhood density (‘Public Institute’, MedR 42.5). This variance indicates that while overall cross-lingual transfer is weak, the model successfully transfers universal, cross-lingual iconic primitives from the pretraining distribution for specific semantic clusters. For gloss-level and more category details, see Appendix C.

Zero-shot medianK stratification, iconicity, and OOV analysis

Cat.	R@1	R@5	R@10	MedR
Animals	0.041	0.149	0.3108	23.6
Colors	0.0572	0.2321	0.4353	18.2
Emotions	0.04	0.3117	0.516	13.2
Family	0.0071	0.0155	0.0496	42.1
Food	0.1096	0.3614	0.4947	13.7
Overall	0.0506	0.1876	0.326	24.0

Cat.	R@1	R@5	R@10	MedR
Common Life	0.0722	0.2167	0.2722	48.9
Education	0.0433	0.1233	0.17	61.0
Highway	0.025	0.125	0.25	50.1
Hospital	0.0263	0.0895	0.1684	46.0
Public Institute	0.0447	0.1342	0.2	42.5
Railway Station	0.0083	0.0333	0.0833	47.9
Overall	0.0356	0.1114	0.1732	49.2

Tier	MedR Range	Portion	Cum. MedR
Great	1–3	0.537	1.6
Good	3.1–15	0.1678	7.2
Fair	15.1–40	0.2282	18.4
Neutral	40.1–74	0.3087	33.0
Adverse	74.1–148	0.2416	49.2

Tier	MedR Range	Portion	Cum. MedR
Great	1–3	0.43	1.9
Good	3.1–10	0.226	5.6
Fair	10.1–25	0.366	12.3
Neutral	25.1–47	0.226	18.5
Adverse	47.1–93	0.140	24.0

LIS Sign in STS	R@1	R@5	R@10	MedR
No	0.0413	0.1109	0.1543	51.1
Yes	0.0337	0.1169	0.1888	48.4
Yes, but different	0.0286	0.0786	0.1357	47.9

Iconicity Proxy (UK/US)	R@1	R@5	R@10	MedR
Kind of	0.0143	0.0571	0.1214	60.1
No	0.026	0.1135	0.1698	50.5
Yes	0.0667	0.1256	0.2	42.0

SignCLIP’s cross-lingual alignment induces a structurally bimodal transfer effect. We argue that since the pre-trained text encoder operates in an English-centric semantic space, language prefix identifiers provide insufficient separation. Consequently, the objective forces visually disparate sign videos toward a quasi-singular text anchor. This semantic asymmetry creates an optimisation conflict that marginalises low-resource languages, resulting in negative transfer, evidenced by the adverse tiers in A3LIS-147 (24.16%) and SignIT (14.0%) in Table 3 and Table 4.

For iconic signs, the shared anchor is beneficial (achieving a MedR of 42.0); for non-iconic signs, the anchor provides a weak or adversarial signal, collapsing retrieval accuracy (MedR 60.1) in Table 6. Pre-training exposure does not overcome this issue, Table 5 shows OOV LIS signs marginally outperform in-vocabulary signs at R@1 (4.13% vs. 3.37%).

We believe data scaling is unlikely to resolve these failures. Shared human articulatory constraints result in heavy overlap in the discriminative features between languages, a problem further complicated by high individual signer-variance (Figure 1 in Appendix). Thus, diversification within synonym classes [4] and cross-lingual “false friends” lead to gradient conflicts. Our findings suggest these factors limit zero-shot performance for any architecture imposing a single joint embedding space without language-gated alignment. These issues can be resolved by monolingual fine-tuning (Table 8), likely at the expense of multilingual understanding, but this remains unexamined.

Linear probing on the frozen backbone achieves 66.78% R@1 on A3LIS-147 in Table 7, confirming that the video encoder learns robust representations.

A3LIS fine-tuning ablation

Table 7 shows that GlobalNCE yields the strongest fine-tuning performance on A3LIS and the linear-probe matches previous SOTA [6]. We attribute this to its global negative sampling across distributed batches, providing the critical density of hard negatives required to stabilise contrastive gradients. ProLIP achieves within 0.3% R@1 of GlobalNCE at zero-shot (75.84% vs. 76.17%) while adapting only the final MLP layer and logit scale, making it the preferred regime when compute or overfitting risk is the primary concern.

SignIT few-shot and fine-tuning ablation

Method	R@1	R@5	R@10	MedR
Baseline	Zero	0.0369	0.1309	0.1946	40
Proto	0.6477	0.9094	0.9698	1
LP	0.6678	0.9329	0.9698	1
GlobalNCE16	Zero	0.7617	0.9262	0.9698	1
Proto	0.7886	0.9430	0.9732	1
LP	0.8020	0.9430	0.9732	1
PLIP16	Zero	0.7584	0.9161	0.953	1
Proto	0.7718	0.9396	0.9597	1
LP	0.7785	0.9364	0.9564	1

Table 8 shows that augmentation of SignIT improves generalisation. Our results trail the LLaVA-OneVision (Acc 0.238 video+pose) of the SignIT authors [2]. We outperform all non-video baselines they evaluated, including pose-only LLaVA (Acc 0.121), establishing a competitive key point-only result.

SignIT macro-category retrieval

Model	Mode	R@1	R@5	R@10	MedR
Baseline	Zero	0.0359	0.1692	0.2769	22.0
Proto	0.0974	0.3077	0.4462	13.0
LP	0.0923	0.3641	0.5333	10.0
Fine-tune	Zero	0.1385	0.4308	0.5538	7.0
Proto	0.1487	0.4256	0.5692	8.0
LP	0.1538	0.4308	0.5744	7.0
Fine-tune + Aug	Zero	0.1436	0.4308	0.6154	8.0
Proto	0.1744	0.4103	0.6103	8.0
LP	0.1744	0.4462	0.5897	8.0

Zero-shot on categories achieves an F1-score (0.48) that is competitive with some fully supervised video baselines, such as I3D (0.34 F1)[2]. Because this relies on measuring the distance between visual embeddings and the textual embeddings of broad macro-categories, these results highlight an advantage of contrastive pretraining: the latent space is semantically organised, allowing the model to generalise to categorical distributions it never explicitly encountered during pretraining. Our strongest few-shot linear-probe configuration reaches 64.62% R@1, approaching the performance of SignIT’s best fully supervised MLP (0.726 Accuracy) [2].

Sign language identification

Model	Mode	R@1	Pr	Re	F1
Baseline	Zero	0.3744	0.54	0.34	0.30
Proto	0.4103	0.3909	0.3921	0.3844
LP	0.5846	0.61	0.52	0.55
Fine-tune	Zero	0.4872	0.48	0.55	0.48
Proto	0.5641	0.5219	0.5371	0.5251
LP	0.6462	0.64	0.59	0.61
Fine-tune + Aug	Zero	0.4974	0.49	0.52	0.48
Proto	0.5949	0.5561	0.5708	0.5503
LP	0.6103	0.68	0.57	0.59

Random Chance	R@1	R@2	MedR
0.1250	0.3510	0.6523	2.0 / 8
False positives: lsf - 20, bsl - 688, ngt - 227, and lse - 32.

The Sign language identification of Table 10 complicates our earlier finding that in-vocabulary LIS signs do not outperform OOV. This simplified retrieval task suggests that SignCLIP does learn some language separation, as shown by the R@2 (65.23%). However, performance drops sharply at R@1 (35.10%), with substantial confusion between LIS, BSL, and NGT (Appendix A.3). It may be worth investigating if this is due to higher inter-language iconicity.

Conclusion

This work demonstrates that SignCLIP’s contrastive alignment induces a structurally bimodal transfer effect on LIS, beneficial for iconic vocabulary, adverse for non-iconic signs, indicating a geometric limitation of the shared embedding space paradigm rather than a data-scaling problem. Few-shot and fine-tuning strategies mitigate these limitations, confirming that the video encoder learns discriminative representations that zero-shot retrieval cannot exploit without fine-tuning in a monolingual context.

We see two promising directions for future research. Since pretraining exposure to LIS signs does not guarantee positive transfer, fine-tuning on the LIS-specific Spreadthesign subset could be adequate for OOD LIS. A more effective multilingual embedding space requires language-conditioned projections that both allow for iconicity transfer and decouple text anchors for non-iconic glosses across sign languages.

References

[1] Z. Jiang, G. Sant, A. Moryossef, M. Müller, R. Sennrich, and S. Ebling, “SignCLIP: Connecting Text and Sign Language by Contrastive Learning.” 2024.
[2] A. Micieli, G. M. Farinella, and F. Ragusa, “SignIT: A Comprehensive Dataset and Multimodal Analysis for Italian Sign Language Recognition.” 2025.
[3] M. Boháček and M. Hrúz, “Learning from What is Already Out There: Few-shot Sign Language Recognition with Online Dictionaries,” in Proceedings of the Face & Gestures Conference, IEEE, 2023.
[4] Y. C. Bilge, N. Ikizler-Cinbis, and R. G. Cinbis, “Cross-lingual few-shot sign language recognition,” Pattern Recognition, vol. 151, 2024.
[5] M. Fagiani, E. Principi, S. Squartini, and F. Piazza, “A New Italian Sign Language Database,” in Advances in Brain Inspired Cognitive Systems: 5th International Conference (BICS 2012), Springer, 2012.
[6] F. M. Vargas, “"LIStudio": Computer Vision Models for studying Italian Sign Language,” Master's thesis, 2024.
[7] M. Marchisio, A. Mazzei, and D. Sammaruga, “Introducing Deep Learning with Data Augmentation and Corpus Construction for LIS,” in CLiC-it 2023: 9th Italian Conference on Computational Linguistics, 2023.
[8] T. Vandendriessche, M. D. Coster, A. Lejon, and J. Dambre, “Representing Signs As Signs: One-Shot ISLR To Facilitate Functional Sign Language Technologies.” 2024.
[9] G. Caligiore, R. Mineo, C. Spampinato, E. Ragonese, S. Palazzo, and S. Fontana, “Multisource Approaches to Italian Sign Language (LIS) Recognition: Insights from the MultiMedaLIS Dataset,” in CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, 2024.
[10] Y. C. Bilge, N. Ikizler-Cinbis, and R. G. Cinbis, “Zero-Shot Sign Language Recognition: Can Textual Data Uncover Sign Languages?,” in Proceedings of the British Machine Vision Conference (BMVC), 2019.
[11] R. Rastgoo, K. Kiani, S. Escalera, and M. Sabokrou, “Multi-Modal Zero-Shot Sign Language Recognition.” 2021.
[12] Y. Cheng, F. Wei, J. Bao, D. Chen, and W. Zhang, “CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[13] Z. Jiang and others, “Segment, Embed, and Align.” 2024.
[14] J. Snell, K. Swersky, and R. Zemel, “Prototypical Networks for Few-shot Learning,” vol. 30. Curran Associates, Inc., p. , 2017.
[15] A. van den Oord, Y. Li, and O. Vinyals, “Representation Learning with Contrastive Predictive Coding.” 2019.
[16] P. Khosla et al., “Supervised Contrastive Learning.” 2021.
[17] T. Koleilat, H. Asgariandehkordi, H. Rivaz, and Y. Xiao, “MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation.” 2024.
[18] C. Tang, X. Yang, J. Lv, and Z. He, “Zero-shot learning by mutual information estimation and maximization.” 2020.
[19] M. Fahes, T.-H. Vu, A. Bursuc, P. Perez, and R. de Charette, “Fine-tuning CLIP’s Last Visual Projector: A Few-Shot Cornucopia.” 2024.

A.1 Leave-one-signer-out baseline linear-probe

Metric	Mean	Std. Dev.
R@1	0.7148	0.0631
R@5	0.9403	0.0276
R@10	0.9725	0.0176

Signer variability presented in Figure 1 primarily degrades R@1, seen by its

\pm

6.3% standard deviation. Broader retrieval remains robust. This variance underscores cross-signer generalisation as a persistent difficulty.

A.2 Complete A3LIS fine-tuning ablation

In Section 4.5, we presented a condensed view of our A3LIS-147 fine-tuning ablation, highlighting the performance of the default SignCLIP objective (NCE) against our best-performing GlobalNCE regime. Table 11 presents the comprehensive results across all evaluated loss functions, batch sizes, and sampling strategies.

A.3 Sign language identification scores

B.1 SignIT with augmentation fine-tuning hyperparameters

B.2 A3LIS and no augmentation fine-tuning hyperparameters

Note for ProLIP, there are two additional hyperparamters set: prolip_lambda: 0.5, and prolip_lambda_mode: inv_n

C.1 SignIT glosses by median rank

Method	R@1	R@5	R@10	MedR
Baseline	Zero	0.0369	0.1309	0.1946	40
Proto	0.6477	0.9094	0.9698	1
LP	0.6678	0.9329	0.9698	1
InfoNCE128	Zero	0.7248	0.906	0.9396	1
Proto	0.7584	0.9430	0.9765	1
LP	0.7617	0.9597	0.9799	1
SupCon32x4	Zero	0.5912	0.8591	0.9128	1
Proto	0.7013	0.9128	0.9664	1
LP	0.7785	0.9396	0.9765	1
Cross-Entropy 16	Zero	0.0503	0.1611	0.245	33
Proto	0.772	0.946	0.987	1
LP	0.7651	0.9463	0.9799	1
GlobalNCE 16	Zero	0.7617	0.9262	0.9698	1
Proto	0.7886	0.9430	0.9732	1
LP	0.802	0.943	0.9732	1
ProLIP 16	Zero	0.7584	0.9161	0.953	1
Proto	0.7718	0.9396	0.9597	1
LP	0.7785	0.9364	0.9564	1
DHN-NCE 64	Zero	0.7081	0.8926	0.9295	1
Proto	0.7651	0.9497	0.9732	1
LP	0.7617	0.9564	0.9765	1

Target language	Count	Proportion
`<en> <lis>`	523	0.351
`<en> <ase>`	0	0
`<en> <dgs>`	0	0
`<en> <lsf>`	20	0.0134
`<en> <bsl>`	688	0.4618
`<en> <ngt>`	227	0.1523
`<en> <lse>`	32	0.0215
`<en> <csl>`	0	0

Parameter	Value
Base Checkpoint	signclip_v1_1
Model Architecture	`MMFusionSeparate`
Video Encoder	`MMBertForEncoder` (12 layers, dim: 609)
Text Encoder	`BertModel` (`bert-base-cased`)
Loss Function	`GlobalNCE`
Optimiser	Adam ( $𝛽_{1} = 0.9, 𝛽_{2} = 0.98$ )
Base Learning Rate	5.0e-05
LR Scheduler	Polynomial Decay (122 warmup updates)
Weight Decay	0.02
Gradient Clipping	2.0 (Max Norm)
Max Epochs	50
Batch Size	16
Precision	FP16 Mixed Precision
Max Sequence Length	Video: 256 frames / Text: 64 tokens
Pose Components	`reduced_face`
Data Augmentation	Temporal ( $𝜎 = 0.25$ ), Spatial ( $𝜎 = 0.15$ ), Noise ( $𝜎 = 0.002$ )

Parameter	Value
Base Checkpoint	signclip_v1_1
Model Architecture	`MMFusionSeparate`
Video Encoder	`MMBertForEncoder` (12 layers, dim: 609)
Text Encoder	`BertModel` (`bert-base-cased`)
Loss Function	(depends on experiment)
Video SupCon Weight	`0.5`
Optimiser	Adam ( $𝛽_{1} = 0.9, 𝛽_{2} = 0.98$ )
Base Learning Rate	5.0e-05
LR Scheduler	Polynomial Decay (122 warmup updates)
Weight Decay	0.01
Gradient Clipping	2.0 (Max Norm)
Max Epochs	50
Batch Size	16
Precision	FP16 Mixed Precision
Max Sequence Length	Video: 256 frames / Text: 64 tokens
Pose Components	`reduced_face`
Data Augmentation	Temporal Augmentation Enabled

2. Good (3.1-10): anger, brown, cake, chocolate, cow, fear, fuchsia, giraffe, grey, joy, light colors, orange, pizza, relatives, rooster, salt, sheep, snail, tiger, vegetable, wine.

3. Fair (10.1-25): apple, banana, bird, blue, butterfly, candy, cat, dark colors, disgust, donkey, family, fish, frog, fruit, grandfather, green, horse, light blue, lion, meat, monkey, parents, pasta, pear, pig, pineapple, pink, purple, rabbit, rice, spider, turtle, yellow, zebra.

4. Neutral / Random (25.1-47): aunt, black, brother-in-law, bull, cousin, crocodile, dad, daughter-in-law, dog, elephant, goat, goose, grandmother, milk, parrot, red, sadness, sky blue, uncle, water, wolf.

5. Perverse (47.1-93): boyfriend, brother, hen, husband, mom, mouse, nephew, sister, snake, son, son-in-law, white, wife.

C.2 A3LIS-147 glosses by median rank

1. Great (1-3): caldo, data, falconara, freddo, giudizio, iniezione, scadenza, senigallia.

2. Good (3.1-15): abitare, affitto, ancona, aperto, avviso, consegnare, dirigente, dolore, emergenza, jesi, macerata, modello, modulo, multa, notte, pomeriggio, presente, pubblica, ritirare_il_numero, sciopero, sostegno, traffico, tratta, vacanze, verde.

3. Fair (15.1-40): acqua, allegare, ambulanza, annullato, arrivo, ascoli, banca, binario, cambio, commissione, compilare, costo, cura, domenica, esame, fermo, giallo, giovedì, giorno, infermiere, infezione, istituto, marche, mattina, medico, operazione, partenza, promosso, provincia, ritardo, s.benedetto, tassa, torino, università.

4. Neutral / Random (40.1-74): abbonamento, allergia, amministrazione, andata, andata_e_ritorno, assente, assistente_alla_comunicazione, bidello, biglietto, bocciato, casa, casello, chiuso, cibo, civitanova, comune, diploma, disinfettare, fano, venerdì, giorni, ieri, laurea, litro, lunedì, martedì, mercoledì, mesi, obliterare, ospedale, pesaro-urbino, posta, rallentamenti, regione, ricevuta, ritorno, roma, rosso, segretario, sera, sindaco, stazione, strada, treno.

5. Perverse (74.1-148): asilo_nido, assessore, assistente, autostrada, domani, elementari, ente_pubblico, entro, flebo, impiegato, interprete, lingua_dei_segni, malattia, mangiare, marca_da_bollo, medie, nota, oggi, obliteratrice, orari, preside, professore, pronto_soccorso, registro, sabato, sala_d’attesa, scuola, scuola_materna, sil, superiori, sportello, studente, tecnico, telefono, ufficio_informazioni, voto.

C.3 SignIT median-rank category proportions

C.4 A3LIS-147 median-rank category proportions

The following table provides the full mapping used for our A3LIS-147 analysis, including category classification, presence in the SpreadTheSign (STS) corpus, and our qualitative iconicity proxy (visual similarity to English-speaking sign languages).

Italian	English	Category	In STS?	Iconicity Proxy
abbonamento	subscription	railway station	yes but different	no
abitare	live	common life	yes	no
acqua	water	common life	yes	no
affitto	rent	common life	yes	no
allegare	attach	education	no	yes
allergia	allergy	hospital	yes	no
ambulanza	ambulance	hospital	yes	no
amministrazione	administration	public institute	yes	yes
ancona	ancona	public institute	no	no
andata	one way	railway station	no	no
andata_e_ritorno	round trip	railway station	no	no
annullato	cancelled	railway station	yes	yes
aperto	open	common life	yes	yes
arrivo	arrival	railway station	yes	no
ascoli	ascoli	public institute	no	no
asilo_nido	day nursery	education	yes but different	no
assente	absent	education	yes	no
assessore	assessor	public institute	no	no
assistente	assistant	public institute	yes	no
assistente_alla_comunicazione	communication assistant	public institute	no	no
autostrada	motorway	highway	yes	kind of
avviso	notice	education	yes	yes
banca	bank	public institute	yes	no
bidello	janitor	education	no	no
biglietto	ticket	railway station	yes	yes
binario	platform	railway station	yes but different	no
bocciato	failed	education	yes but different	no
caldo	hot	common life	yes but different	no
cambio	change	railway station	no	no
casa	home	common life	yes	no
casello	toll gate	highway	yes	yes
chiuso	closed	common life	yes but different	yes
cibo	food	common life	yes	yes
civitanova	civitanova	public institute	no	no
commissione	commission	education	yes but different	no
compilare	compile	public institute	yes	no
comune	municipality	public institute	yes	no
consegnare	deliver	common life	yes	yes
costo	cost	common life	yes	kind of
cura	care	hospital	yes	yes
data	date	public institute	yes	no
diploma	diploma	education	yes	yes
dirigente	executive	public institute	yes	yes
disinfettare	disinfect	hospital	no	no
dolore	pain	hospital	yes but different	no
domani	tomorrow	railway station	yes but different	kind of
domenica	sunday	railway station	yes	no
elementari	elementary school	education	no	no
emergenza	emergency	hospital	yes	no
ente_pubblico	public body	public institute	no	no
entro	within	education	no	no
esame	exam	education	yes	no
falconara	falconara	public institute	no	no
fano	fano	public institute	no	no
fermo	still	railway station	no	no
flebo	intravenous drip	hospital	no	no
freddo	cold	common life	yes	yes
giallo	yellow	hospital	yes	no
giorni	days	railway station	yes	no
giorno	day	railway station	yes	no
giovedì	thursday	railway station	yes but different	no
giudizio	judgement	education	no	yes
ieri	yesterday	railway station	yes	yes
impiegato	employee	public institute	yes but different	no
infermiere	nurse	hospital	yes but different	kind of
infezione	infection	hospital	no	no
iniezione	injection	hospital	no	yes
interprete	interpreter	public institute	yes	no
inviare_sms	messaging	common life	no	no
istituto	institute	education	yes	no
jesi	jesi	public institute	no	no
laurea	graduation	education	yes	no
lingua_dei_segni	sign language	common life	yes but different	no
litro	litre	common life	yes	yes
lunedì	monday	railway station	yes	no
macerata	macerata	public institute	no	no
malattia	illness	hospital	yes	no
mangiare	eat	common life	yes	yes
marca_da_bollo	revenue stamp	public institute	no	no
marche	marche	public institute	no	no
martedì	tuesday	railway station	yes	no
mattina	morning	railway station	yes	kind of
medico	doctor	hospital	yes	yes
medie	middle school	education	no	no
mercoledì	wednesday	railway station	yes	no
mesi	months	railway station	yes	no
modello	model	public institute	yes	no
modulo	form	public institute	yes	yes
multa	fine	highway	yes	yes
nota	note	education	yes	kind of
notte	night	railway station	yes	yes
obliterare	stamp	railway station	no	no
obliteratrice	stamping machine	railway station	no	no
oggi	today	railway station	yes	no
operazione	operation	hospital	no	no
orari	times	railway station	no	no
ospedale	hospital	hospital	yes	no
partenza	departure	railway station	yes	no
pesaro-urbino	pesaro-urbino	public institute	no	no
pomeriggio	afternoon	railway station	yes	yes
posta	mail	public institute	yes	kind of
presente	present	education	yes	no
preside	headmaster	education	yes	no
professore	professor	education	yes	no
promosso	promoted	education	no	yes
pronto_soccorso	first aid	hospital	yes	yes
provincia	province	public institute	yes but different	no
pubblica	public	public institute	yes	yes
rallentamenti	slowdowns	highway	no	yes
regione	region	public institute	yes	kind of
registro	log book	education	yes	yes
ricevuta	receipt	public institute	no	no
ritardo	delay	railway station	no	no
ritirare_il_numero	take the number	public institute	no	no
ritorno	return	railway station	no	no
roma	rome	public institute	yes	no
rosso	red	hospital	yes	kind of
s.benedetto	s.benedetto	public institute	no	no
sabato	saturday	railway station	yes	no
sala_d’attesa	waiting room	hospital	yes	no
scadenza	expiration	education	yes	no
sciopero	strike	railway station	yes	yes
scontrino	receipt	public institute	yes	kind of
scuola	school	education	yes	no
scuola_materna	nursery school	education	yes	no
segretario	secretary	education	yes	no
senigallia	senigallia	public institute	no	no
sera	evening	railway station	yes	kind of
sil	silence sign	common life	no	no
sindaco	mayor	public institute	yes	no
sostegno	aid	education	yes	kind of
sportello	reception window	public institute	yes	yes
stazione	station	railway station	yes	no
strada	street	highway	yes	yes
studente	student	education	yes	no
superiori	high school	education	yes	yes
tassa	fee	public institute	yes	kind of
tecnico	technician	highway	yes	yes
telefono	telephone	common life	yes	yes
torino	turin	public institute	no	no
traffico	traffic	highway	yes	no
tratta	section	highway	no	yes
treno	train	railway station	yes	kind of
ufficio_informazioni	information office	public institute	no	no
università	university	education	yes	no
vacanze	vacation	common life	yes	yes
venerdì	friday	railway station	yes	no
verde	green	hospital	yes	no
voto	voting	education	yes	yes

The Limits of Cross-Lingual Transfer: Evaluating SignCLIP on LIS

Introduction