9+ NLTK Translate Meteor Score: Improve Translation!

A compound term used within the Natural Language Toolkit (NLTK) library, this refers to a specific implementation of a metric used to evaluate the quality of machine translation. Specifically, it leverages a harmonized mean of unigram precision and recall, incorporating stemming and synonymy matching, along with fragmentation penalty. An example of its use involves comparing a generated translation against one or more reference translations, yielding a score that reflects the similarity in meaning and wording between the two.

The significance of this metric lies in its ability to provide a more nuanced assessment of translation quality than simpler metrics like BLEU. By considering recall, stemming, and synonymy, it better captures semantic similarity. Its value is particularly apparent in situations where paraphrasing or lexical variation is present. Its historical context is rooted in the need for improved automated evaluation tools that correlate more closely with human judgments of translation quality, addressing limitations found in earlier, more simplistic methods.

The subsequent discussion will delve into the practical application of this NLTK functionality, examining its parameters, interpreting its output, and comparing its performance against alternative evaluation methodologies, providing a detailed understanding of its capabilities and limitations within the broader context of machine translation evaluation.

1. Harmonic Mean

The harmonic mean is a critical component within the NLTK’s implementation for evaluating machine translation quality. It provides a balanced measure of precision and recall, which are both essential indicators of translation accuracy. This averaging method is particularly suitable when dealing with rates and ratios, offering a more representative score than a simple arithmetic mean in contexts where both precision and recall need to be simultaneously high.

Balancing Precision and Recall

The harmonic mean serves to penalize a model that heavily favors either precision or recall at the expense of the other. High precision means the generated translation contains mostly relevant content, but it may miss crucial information. High recall means the generated translation captures most of the relevant information, but it might also include irrelevant content. The harmonic mean ensures that both these metrics are reasonably high, leading to a more accurate overall assessment. For instance, if a system translates “The cat sat on the mat” as “cat mat,” it has high precision but low recall. Conversely, if it translates it as “The animal was somewhere near a floor covering,” it has high recall but low precision. The harmonic mean would penalize both scenarios, favoring a translation that balances both.
Sensitivity to Low Values

The harmonic mean is more sensitive to low values than the arithmetic mean. This characteristic is beneficial because a low precision or recall score significantly degrades the overall evaluation. If either precision or recall is close to zero, the harmonic mean will also be close to zero, regardless of the other metric’s value. This behavior is desirable because a translation with extremely low precision or recall is practically useless, even if the other metric is high. Imagine a scenario where a translation system returns gibberish 90% of the time (very low precision) but produces perfect translations the remaining 10% (high recall). The harmonic mean would accurately reflect the system’s overall poor performance.
Mathematical Formulation

Mathematically, the harmonic mean is calculated as the reciprocal of the arithmetic mean of the reciprocals of the values. In the context of the NLTK’s translation metric, this typically translates to: 2 / ((1 / Precision) + (1 / Recall)). This formula underscores the interdependence of precision and recall; improving one metric without a corresponding improvement in the other has a diminishing effect on the final score. In simpler terms, doubling precision only substantially increases the harmonic mean if recall is also reasonably high. If recall remains low, the increase in the overall score is limited.

In summary, the harmonic mean within the NLTK’s translation assessment provides a crucial mechanism for simultaneously optimizing both precision and recall in machine translation. Its sensitivity to low values and its mathematical formulation ensure that the final evaluation score accurately reflects the overall quality and usefulness of the translation, making it a robust and reliable metric for comparing different translation systems or evaluating improvements to a single system over time. Its use ensures a balanced and realistic assessment of translation performance.

2. Unigram Precision

Unigram precision forms a foundational element. Within its implementation, this term quantifies the proportion of individual words (unigrams) in the machine-generated translation that also appear in the reference translation. A higher unigram precision indicates a greater degree of lexical overlap between the generated and reference translations, suggesting the generated translation is accurately conveying the intended meaning, at least at the word level. The metrics design acknowledges that a good translation should, at a minimum, accurately reproduce the words present in the reference. Without reasonable unigram precision, the overall quality is fundamentally compromised. For instance, if a reference translation states “The quick brown fox jumps over the lazy dog,” and a machine translation outputs “quick fox lazy,” the unigram precision would be relatively high, reflecting the presence of three matching words. If, however, the output were “automobile vertebrate slow,” the unigram precision would be zero, signaling a complete failure to capture the lexical content of the reference.

It is essential to recognize that unigram precision, in isolation, offers an incomplete picture of translation quality. A translation could achieve perfect unigram precision by simply reproducing a subset of the reference, thereby omitting crucial information. Furthermore, unigram precision does not account for word order, semantic nuances, or the presence of synonyms or paraphrases. Consequently, relies on other components, such as unigram recall, stemming, and synonym matching, to address these limitations and provide a more comprehensive evaluation. The fragmentation penalty further discourages translations that achieve high precision by only matching isolated words or phrases while ignoring the overall coherence and fluency of the text.

In summary, unigram precision represents a crucial, yet insufficient, metric. While its use alone cannot fully assess translation quality, it forms an indispensable base upon which other factors are incorporated to achieve a more accurate and nuanced evaluation. Therefore, understanding unigram precision is crucial for interpreting its performance and appreciating its role within the broader framework of machine translation assessment.

3. Unigram Recall

Unigram recall, as a component of the aforementioned library function, measures the proportion of unigrams present in the reference translation that are also found in the machine-generated translation. A higher unigram recall score suggests the generated translation comprehensively covers the content of the reference translation. Its integration into the overall scoring mechanism is critical because it addresses a significant shortcoming of relying solely on precision. While precision assesses the accuracy of the generated translation, recall evaluates its completeness. For example, if the reference translation is “The cat sat on the mat,” and the machine translation is “The cat sat,” the precision is high, but the recall is low, indicating that some information has been omitted. In such scenarios, the inclusion of unigram recall ensures the evaluation system penalizes translations that, while accurate, are not exhaustive.

The practical significance of understanding the interplay between unigram recall and this function lies in its effect on the translation process itself. Translation systems often employ various strategies to optimize for different metrics. Without adequately considering recall, a system might prioritize generating concise and accurate translations that, however, leave out crucial details. By explicitly incorporating recall into the evaluation process, system developers are incentivized to produce translations that are not only accurate but also comprehensive. The weighting assigned to recall, relative to precision, within the metric can be adjusted to reflect the specific requirements of the translation task. For instance, in scenarios where completeness is paramount, a higher weight can be assigned to recall.

In summary, unigram recall is a vital element. Its contribution lies in its ability to counterbalance the potential biases introduced by precision-focused evaluation, thereby encouraging the development of translation systems that generate both accurate and comprehensive translations. The challenge lies in striking the appropriate balance between precision and recall, and the aforementioned NLTK function provides the mechanisms necessary to fine-tune this balance according to the specific needs of a translation task. Understanding this relationship is essential for both evaluating existing translation systems and developing new and improved methodologies.

4. Stemming Impact

Stemming, a process of reducing words to their root or base form, significantly influences the performance when assessing machine translation quality. By removing suffixes and prefixes, stemming aims to consolidate variations of the same word, thereby allowing for a more generalized comparison between translated and reference texts. The extent of this impact is multifaceted, affecting both the calculated precision and recall values and the overall interpretability of the metric.

Enhancement of Matching

Stemming enables the identification of matches between words that might otherwise be missed due to morphological differences. For instance, the words “running,” “runs,” and “ran” would all be reduced to the stem “run.” Without stemming, a translation containing “running” might not be recognized as a match for a reference containing “runs,” leading to an underestimation of translation quality. This is particularly relevant in languages with rich morphology, where words can have numerous inflections. Within the NLTK’s implementation, this enhanced matching capability contributes to a more lenient and, arguably, more accurate assessment of translation accuracy.
Potential for Overgeneralization

While stemming can improve matching, it also introduces the risk of overgeneralization. By reducing words to their stems, subtle differences in meaning can be lost. For example, “general” and “generally” might both be stemmed to “general,” even though they have distinct functions and meanings within a sentence. In the context of its usage, this overgeneralization can lead to an inflation of the score, as the metric might incorrectly identify matches between words that are not truly semantically equivalent. Careful consideration of the stemming algorithm used and its potential for overgeneralization is, therefore, crucial.
Influence on Precision and Recall

The application of stemming directly affects both precision and recall. By increasing the number of identified matches, stemming generally leads to higher recall values, as more words from the reference translation are found in the machine translation. However, it can also impact precision, particularly if overgeneralization occurs. If the stemming process leads to the identification of incorrect matches, the precision score may decrease. The overall effect on the score depends on the balance between these two competing influences and the specific characteristics of the translations being evaluated.
Algorithm Dependency

The impact of stemming is highly dependent on the specific stemming algorithm employed. Different algorithms, such as Porter, Lancaster, and Snowball stemmers, vary in their aggressiveness and accuracy. A more aggressive stemmer might reduce words more drastically, leading to greater overgeneralization but potentially higher recall. A less aggressive stemmer might be more accurate but less effective at identifying matches between morphologically related words. The choice of stemming algorithm should, therefore, be guided by the specific requirements of the translation task and the characteristics of the languages involved.

In conclusion, stemming represents a double-edged sword. While its application enhances the ability to recognize semantic similarities between translated and reference texts, it also introduces the risk of overgeneralization and can differentially affect precision and recall depending on the algorithm used. Therefore, careful consideration of stemming’s impact and its interaction with other components within, is essential for accurate and meaningful evaluation of machine translation quality.

5. Synonymy Matching

Synonymy matching represents a crucial component within a machine translation evaluation framework, significantly influencing its ability to accurately assess translation quality. This component addresses the limitations of purely lexical matching by accounting for cases where a machine translation employs synonyms or near-synonyms of words present in the reference translation. Without synonymy matching, the assessment would unfairly penalize translations that, while conveying the same meaning, utilize different vocabulary. This leads to a more robust and nuanced evaluation of semantic similarity.

The inclusion of synonymy matching in an evaluation metric like this provides a mechanism for recognizing valid paraphrases and alternative word choices. For example, if a reference translation uses the word “happy,” and the machine translation uses the word “joyful,” a purely lexical comparison would treat these as mismatches. However, with synonymy matching enabled, these words are recognized as semantically equivalent, contributing to a higher and more accurate evaluation score. The practical implication is that translation systems are not penalized for utilizing valid alternative expressions, fostering greater flexibility and naturalness in machine-generated translations. The utilization of WordNet or similar lexical resources is common to identify synonymous terms and their relationship to the translated text.

In summary, synonymy matching enhances the overall accuracy and reliability by compensating for lexical variations that do not necessarily indicate a loss of meaning or translation quality. By integrating synonym recognition, this metric moves beyond superficial word-by-word comparisons, offering a more comprehensive and semantically grounded assessment of machine translation performance. Challenges remain in accurately identifying synonyms within specific contexts and managing the potential for false positives, but the benefits of synonymy matching in capturing semantic equivalence outweigh these limitations in many translation scenarios.

6. Fragmentation Penalty

The fragmentation penalty functions as an integral component within the assessment, specifically designed to mitigate the inflation of scores arising from translations that exhibit discontinuous matches. It addresses the issue of translations achieving high precision and recall through isolated, disjointed segments, rather than coherent and fluent phrases. This mechanism actively penalizes such fragmented translations, ensuring that a high score reflects not only lexical similarity but also structural integrity.

Quantifying Discontinuity

The fragmentation penalty operates by assessing the contiguity of matching n-grams (sequences of n words) between the generated translation and the reference translation. A lower penalty is applied when matches are continuous, indicating that the translation system has successfully captured coherent phrases. Conversely, a higher penalty is imposed when matches are scattered, suggesting that the translation lacks fluency and structural coherence. For instance, consider a reference translation: “The quick brown fox jumps over the lazy dog.” A fragmented translation like “The fox dog lazy” would exhibit high unigram precision and recall for the matching words but would incur a substantial fragmentation penalty due to the discontinuity of the matches. This penalization reflects the diminished quality of the fragmented translation despite its lexical overlap with the reference.
Impact on Overall Score

The fragmentation penalty directly affects the overall score by reducing it proportionally to the degree of fragmentation observed in the translation. The penalty factor is typically a function of the number of disjointed matching segments. A translation with numerous short, disconnected matches will suffer a greater penalty than a translation with fewer, longer, continuous matches. The specific mathematical formulation of the penalty can vary, but it generally aims to diminish the contribution of translations that sacrifice fluency for lexical accuracy. The extent of the score reduction is configurable, allowing for the adjustment of the penalty’s influence based on the specific requirements of the translation task.
Incentivizing Coherence

By penalizing fragmentation, it incentivizes translation systems to generate outputs that are not only lexically accurate but also structurally coherent and fluent. This encourages the development of models that prioritize the capture of meaningful phrases and idiomatic expressions, rather than simply maximizing the number of individual word matches. The penalty promotes translations that read more naturally and are more easily understood by human readers. This bias towards coherence is particularly valuable in scenarios where the primary goal is to produce human-readable translations, as opposed to translations intended solely for machine processing.
Contextual Dependence

The effectiveness of the fragmentation penalty can be influenced by the specific characteristics of the languages involved and the nature of the translation task. In some languages, a more flexible word order may be permissible without significantly impacting comprehensibility. In such cases, a relatively lenient fragmentation penalty might be appropriate. Conversely, in languages with strict word order requirements, a more stringent penalty may be necessary to ensure that translations adhere to the expected grammatical structure. Similarly, the optimal penalty level can vary depending on the domain of the translated text. Technical or scientific texts, for instance, may tolerate a higher degree of fragmentation than literary or journalistic texts.

In conclusion, the fragmentation penalty serves as a critical mechanism. It encourages the generation of fluent and coherent translations, preventing the inflation of scores by fragmented outputs. Its impact on the overall score and its incentivization of coherence make it an indispensable tool for evaluating machine translation systems and promoting the development of high-quality translation models. The consideration of contextual factors when configuring this ensures that this continues to provide an accurate and meaningful assessment of translation quality across diverse languages and tasks.

7. NLTK Implementation

The NLTK implementation provides the accessible realization of the aforementioned evaluation metric. Its presence within the library facilitates its widespread use in the natural language processing community, rendering a previously complex evaluation process readily available. This integration is not merely a packaging of the algorithm, but a specific design choice with implications for its application and interpretation.

Module Availability

The integration within NLTK as a readily available module ensures a standardized implementation. Users can directly import the function without needing to implement the underlying algorithms themselves. This contrasts with situations where such metrics are only available through research publications, necessitating custom coding and potential variations in implementation. This availability promotes reproducibility and comparability across different research and development efforts. For instance, a researcher comparing different translation models can rely on the consistent behavior of the NLTK implementation to ensure a fair comparison. Should it be absent, each researcher might use a slightly different interpretation of the method, making comparisons harder.
Parameter Exposure

The implementation exposes various parameters that control its behavior. These parameters include weights for precision and recall, stemming algorithms, and synonym databases. This granularity enables users to fine-tune its behavior to suit specific translation tasks and language characteristics. For example, when evaluating translations in a domain where accuracy is paramount, users can increase the weight assigned to precision. Conversely, in scenarios where fluency is more important, a higher weight can be given to recall. The ability to customize these parameters provides flexibility and allows for more meaningful evaluation results. Without such parameter exposure, the would be a rigid black box, potentially ill-suited to diverse translation scenarios.
Data Dependency

This function’s specific usage is inherently reliant on the availability of supporting data, such as pre-trained language models and synonym databases (e.g., WordNet). The NLTK module often provides utilities for accessing and managing these resources. The performance depends heavily on the quality and coverage of these external datasets. In scenarios where a particular language or domain is poorly represented in the available datasets, the accuracy of the may be compromised. The implementation documentation typically provides guidance on selecting and preparing appropriate data sources. An insufficient dataset would lead to less reliable assessments.
Computational Efficiency

The practical value of the NLTK implementation is partially determined by its computational efficiency. Machine translation evaluation can be computationally intensive, particularly when dealing with large datasets. The implementation must strike a balance between accuracy and speed. While it might not be the most optimized implementation possible, its inclusion in NLTK suggests a reasonable level of performance for typical use cases. In situations where computational resources are limited, users may need to consider alternative implementations or techniques to accelerate the evaluation process. The built-in functionality prioritizes ease of use over peak efficiency to reach a broader audience.

These facets of the NLTK implementation underscore its significance in making this type of translation evaluation accessible and practical. Its availability, parameterization, data dependency, and computational efficiency collectively determine its utility in real-world applications. Understanding these aspects is crucial for effectively utilizing to assess machine translation quality and driving improvements in translation system design.

8. Evaluation Metric

The term “evaluation metric” broadly refers to a quantitative measure employed to assess the performance of a system or algorithm. In the context of machine translation, an evaluation metric quantifies the quality of a translated text compared to a reference translation. The “nltk translate meteor_score” is a specific instantiation of such a metric, residing within the NLTK library. The understanding of “evaluation metric” is therefore foundational; it establishes the category to which “nltk translate meteor_score” belongs. Without the concept of an “evaluation metric,” the purpose and significance of this function within NLTK would remain undefined.

The practical significance of viewing “nltk translate meteor_score” as an “evaluation metric” lies in its utility for comparing different translation systems or assessing the impact of modifications to a single system. For example, a researcher might use this tool to compare the performance of two different neural machine translation architectures. The resulting scores would provide a basis for determining which architecture produces higher-quality translations. Additionally, developers can track the progress of system improvements over time by monitoring changes in scores after implementing new features or training the system on additional data. This facilitates evidence-based decision-making in the development and optimization of machine translation technology.

In summary, “nltk translate meteor_score” is a member of the category of “evaluation metrics,” enabling the quantifiable assessment of machine translation quality. Its function as such is critical for comparing systems, tracking improvements, and guiding the development of more effective translation technologies. Challenges remain in designing metrics that perfectly correlate with human judgments of translation quality, but the continued development and refinement of metrics like this within tools like NLTK are essential for advancing the field of machine translation.

9. Translation Quality

Translation quality, as a concept, represents the fidelity with which a translated text conveys the meaning, intent, and style of the original source text. It serves as the ultimate benchmark against which machine translation systems are evaluated. This metric, available through the NLTK library, provides a means to quantify translation quality by assessing various aspects such as lexical similarity, semantic equivalence, and fluency. Consequently, translation quality is the overarching goal, while this tool is an instrument designed to measure progress toward that goal. For example, a machine translation system that produces highly accurate and fluent translations will receive a high score when evaluated, indicating superior translation quality. Conversely, a system that generates inaccurate or incoherent translations will receive a low score, reflecting poor quality. The correlation is direct; improved translation quality, by human standards, should lead to higher function scores.

The significance of this assessment in driving improvements in machine translation technology is undeniable. By providing a quantifiable measure of quality, this tool enables researchers and developers to objectively compare different translation approaches, fine-tune model parameters, and identify areas for improvement. For instance, if a particular machine translation system consistently scores poorly, developers can analyze the system’s outputs to identify specific weaknesses, such as inaccurate handling of idiomatic expressions or poor lexical choice. The results can then guide targeted interventions, such as retraining the model on a larger dataset or incorporating a more sophisticated lexicon. Without an objective metric, assessing the impact of such interventions becomes challenging, hindering progress in machine translation. The iterative process of evaluation, analysis, and refinement, facilitated by this tool, is essential for advancing the state-of-the-art in machine translation.

In summary, translation quality constitutes the core objective, and provides a quantitative mechanism for its assessment. It serves as a crucial feedback loop for improving translation systems and advancing the field of machine translation. While challenges remain in perfectly aligning automated metrics with human perception of quality, the continued refinement and utilization of metrics such as this one is essential for achieving the ultimate goal: machine translation that seamlessly bridges linguistic and cultural divides. The practical use of this tool in analyzing and adjusting system performance ultimately contributes to the broader aim of high-quality translation.

Frequently Asked Questions

This section addresses common inquiries and misconceptions regarding the “nltk translate meteor_score” function, clarifying its purpose, functionality, and limitations within the broader context of machine translation evaluation.

Question 1: What is the primary purpose of the “nltk translate meteor_score” function?

The primary purpose is to provide an automated metric for evaluating the quality of machine-generated translations. It quantifies the similarity between a candidate translation and one or more reference translations, producing a score that reflects the overall quality of the machine-generated output.

Question 2: How does “nltk translate meteor_score” differ from simpler metrics like BLEU?

Unlike BLEU, which relies primarily on n-gram precision, this function incorporates both precision and recall, uses stemming to normalize word forms, includes synonymy matching to account for lexical variations, and applies a fragmentation penalty to discourage discontinuous matches. These features enable a more nuanced and comprehensive assessment of translation quality compared to simpler metrics.

Question 3: What types of input data are required to use “nltk translate meteor_score”?

This function requires two types of input: a list of candidate translations (the machine-generated outputs) and a list of reference translations (human-generated or gold-standard translations). Both sets of translations should be tokenized into individual words or subword units.

Question 4: Can the parameters of “nltk translate meteor_score” be customized?

Yes, several parameters can be customized. These include the weights assigned to precision and recall, the stemming algorithm used, and the synonym database employed. Customization allows users to tailor the metric to specific translation tasks and language characteristics.

Question 5: What are the limitations of using “nltk translate meteor_score” for translation evaluation?

While it offers a more comprehensive assessment than some alternatives, the metric does not perfectly correlate with human judgments of translation quality. It may still be susceptible to rewarding translations that are grammatically correct but semantically inaccurate or that lack fluency. Additionally, its performance depends on the quality and coverage of the synonym database used.

Question 6: Is “nltk translate meteor_score” suitable for evaluating translations in all languages?

It can be applied to translations in various languages; however, its effectiveness may vary depending on the availability of appropriate stemming algorithms and synonym resources for a given language. Languages with limited resources may present challenges in achieving accurate and reliable evaluation results.

These answers illuminate the key aspects of this term, providing a foundation for effective utilization and interpretation within the context of machine translation evaluation.

The subsequent section will delve into comparative analyses, examining its performance relative to other machine translation evaluation techniques.

Enhancing Machine Translation Evaluation

This section presents a series of practical recommendations aimed at maximizing the effectiveness when evaluating machine translation systems. Adhering to these guidelines promotes more accurate and meaningful assessments of translation quality.

Tip 1: Leverage Multiple Reference Translations: Employing several reference translations provides a more comprehensive benchmark against which to evaluate machine-generated outputs. Variations in phrasing and lexical choice among multiple references can mitigate biases introduced by a single reference, resulting in a more robust assessment.

Tip 2: Customize Parameter Weights: Adjust the weights assigned to precision and recall to reflect the specific requirements of the translation task. In scenarios where accuracy is paramount, prioritize precision. Conversely, for tasks where completeness is more critical, emphasize recall.

Tip 3: Select an Appropriate Stemming Algorithm: The choice of stemming algorithm can significantly impact results. Consider the morphological characteristics of the languages involved and select a stemmer that balances aggressiveness and accuracy to avoid overgeneralization or under-stemming.

Tip 4: Utilize a High-Quality Synonym Database: The effectiveness of synonymy matching depends on the quality and coverage of the synonym database employed. Ensure that the database is comprehensive and relevant to the domain of the translated text to accurately capture semantic equivalence.

Tip 5: Calibrate the Fragmentation Penalty: Fine-tune the fragmentation penalty to strike a balance between rewarding fluency and penalizing discontinuous matches. The optimal penalty level may vary depending on the linguistic characteristics of the languages and the expected level of fluency in the translated text.

Tip 6: Consider Contextual Factors: When interpreting results, consider contextual factors such as the domain of the translated text, the intended audience, and the purpose of the translation. These factors can influence the relative importance of different evaluation criteria.

Tip 7: Supplement with Human Evaluation: While automated metrics provide a valuable tool for quantitative assessment, it is crucial to supplement them with human evaluation. Human evaluators can assess aspects of translation quality, such as naturalness, idiomaticity, and cultural appropriateness, that are not easily captured by automated metrics.

By adhering to these guidelines, the user can harness the full potential of the metric, achieving more accurate and insightful evaluations of machine translation systems. Its use ensures a more balanced, valid and reliable assessment of translation system output.

The final section provides a synthesis of the information, highlighting key advantages, disadvantages, and future research directions.

Conclusion

This exploration has elucidated the function within NLTK, detailing its constituent components: harmonic mean of precision and recall, stemming influence, synonymy matching, and fragmentation penalty. Its role as an automated evaluation metric for machine translation quality has been thoroughly examined, highlighting its advantages over simpler metrics and outlining its practical application, parameter customization, and inherent limitations. These analyses emphasize the necessity of thoughtful utilization, recognizing its strengths in capturing semantic similarities while acknowledging potential biases and dependencies on external data.

Continued research should focus on refining automated evaluation methodologies to more closely align with human assessments of translation quality. While it represents a significant advancement in machine translation evaluation, it remains a tool, not a replacement, for human judgment. Future development should prioritize reducing bias and improving its applicability across diverse languages and domains, thereby contributing to the ultimate goal of achieving seamless and accurate cross-lingual communication.