Bearbeiten von „Elationships amongst general English words are undocumented. The“
Du bearbeitest diese Seite unangemeldet. Wenn du sie abspeicherst, wird deine aktuelle IP-Adresse in der Versionsgeschichte aufgezeichnet und ist damit unwiderruflich öffentlich einsehbar.
Die Bearbeitung kann rückgängig gemacht werden.
Bitte prüfe den Vergleich unten, um sicherzustellen, dass du dies tun möchtest, und speichere dann unten deine Änderungen, um die Bearbeitung rückgängig zu machen.
Aktuelle Version | Dein Text | ||
Zeile 1: | Zeile 1: | ||
− | The overlap | + | Though we did not specifically encode this bias into our statistical framework, our mixture-modeling strategy [http://s154.dzzj001.com/comment/html/?207232.html Cepts (blue, below x-axis) and undocumented synonyms paired to] captured it properly (see Figure 2G). Our approach also captured other varieties of bias and variability present inside the thesauri (e.g., a preference for certain parts-of-speech, see Figure S4A), because the annotation rates for unique mixture components varied considerably across terminologies (Figure S4A and S4B). Finally, we note that the continual production and conglomeration of manually curated thesauri is unlikely to be a fruitful tactic for collecting undocumented general-English near-synonymy. It would need approximately 2000 independently collected, WordNet-sized dictionaries to unearth 90 on the undocumented relationships (Figure 2H). Therefore, option approaches is going to be essential to uncover a considerable fraction of undocumented English nearsynonymy. In the following section, we utilize a single such method to uncover previously undocumented English near-synonyms.Experimental Validation of Undocumented Eng.Elationships among basic English words are undocumented. The overlap among the (A) headwords and (B) synonymous relationships annotated inside nine general-English thesauri. (C) The amount of known (above x-axis) and undocumented (beneath x-axis) headwords belonging to every of your ten, headword-specific mixture model elements (see Supporting Facts Text S1). (D) The amount of identified (above x-axis) and undocumented (beneath x-axis) synonymous relationships belonging to every mixture element. The blue bars indicate undocumented relationships paired to known headwords when the red bars indicate undocumented relationships paired to latentPLOS Computational Biology | www.ploscompbiol.orgSynonymy Matters for Biomedicineheadwords. (E) The number of synonymous relationships is shown as a function with the total variety of headwords inside the English language. The width with the line indicates the 99 confidence interval for the estimate (see Supporting Facts Text S1). (F) The distribution over the number of synonyms annotated per headword (gray) is when compared with the theoretical distribution obtained working with best-fitting statistical annotation model (blue). The R2-value indicates the fraction of variance in synonym number explained by the model. For reference, log-Gaussian and geometric models have been match to the data at the same time (red and green, respectively), while their good quality of fit was quite a few thousand orders of magnitude worse than the very best fitting annotation model (in accordance with marginal likelihood). (G) Box-whisker plots depicting the imply relative word frequencies (1,000 bootstrapped resamples) for every with the ten headword-specific mixture elements. For reference, the probability of headword annotation, marginalized more than all possible synonym pairs, is plotted in green. (H) The 3 curves indicate the anticipated fraction of undocumented synonymy that will be discovered upon repeatedly and independently constructing additional lexical resources (x-axis) identical for the total dataset (blue), WordNet only (red), and WordNet plus Webster's New Globe (green). doi:10.1371/journal.pcbi.1003799.gconstructed having a bias for writing more than reading, following in the observation that thesauri are normally employed to add richness and assortment when composing text. In assistance of this hypothesis, we found that headwords in our dictionaries tended to become shorter and much more frequent than non-headwords (see Figure S3A and S3B, respectively). Even though we didn't particularly encode this bias into our statistical framework, our mixture-modeling method captured it properly (see Figure 2G). |