Adversarial classification

2017-07-24 — 2025-06-07

classification

classification and society

collective knowledge

confidentiality

culture

ethics

game theory

how do science

incentive mechanisms

sociology

statistics

wonk

Content warning:

Discussion of hot-button contentious issues, such as gender identities, and Israel-Palestine affairs, upon which I conspicuously avoid taking a position, while analysing the semantics of public debate about these issues. This will risk being considered favouring a side. But also, since I am talking about weaponization of meaning, I do not see an alternative except to discuss weaponised meanings.

1 Case study: A Chair

2 When categories have value

Case study on gender

Figure 7: Chick the Cherub (Baum and Neill 1906), a non-binary children’s book character from 1906.

3 When categories are teams

Islamophobia, Antisemitism, genocide, war crimes

Misunderstanding antisemitism in America

Now, I am no ethnographer or political scientist, but as best I can tell the simplest group membership structure that explains the current situation would look something like this (My apologies in advance to those who I have inevitably crudely simplified away):

Code

import pandas as pd
from upsetplot import UpSet, from_memberships
import matplotlib.pyplot as plt

# Define archetypal memberships.
archetypes_definitions = [
    {"groups": ["Israeli Citizen", "Jew", "Likud"], "count": 15},
    {"groups": ["Israeli Citizen", "Jew", "WB Resident"], "count": 7},
    {"groups": ["Israeli Citizen", "Jew"], "count": 25},
    {"groups": ["Israeli Citizen", "Arab", "Muslim", "Sunni"], "count": 12},
    {"groups": ["Israeli Citizen", "Druze", "Arab"], "count": 3},
    {"groups": ["Israeli Citizen", "Druze"], "count": 2},
    {"groups": ["Israeli Citizen", "Arab", "Bedouin", "Muslim", "Sunni"], "count": 4},
    {"groups": ["Israeli Citizen", "Arab", "Christian"], "count": 3},
    {
        "groups": [
            "Palestinian",
            "WB Resident",
            "Arab",
            "Muslim",
            "Sunni",
            "PA Affiliate",
        ],
        "count": 18,
    },
    {
        "groups": ["Palestinian", "Gaza Resident", "Arab", "Muslim", "Sunni", "Hamas"],
        "count": 12,
    },
    {"groups": ["Palestinian", "WB Resident", "Arab", "Christian"], "count": 2},
    {
        "groups": ["Palestinian", "Gaza Resident", "Arab", "Muslim", "Sunni"],
        "count": 25,
    },
    {"groups": ["Palestinian", "WB Resident", "Arab", "Muslim", "Sunni"], "count": 20},
    {"groups": ["Hezbollah", "Muslim", "Shiite", "Arab"], "count": 8},
    {"groups": ["Houthi", "Muslim", "Shiite", "Arab"], "count": 6},
]

memberships_data = []
for archetype in archetypes_definitions:
    current_groups = list(archetype["groups"])
    if "WB Resident" in current_groups or "Gaza Resident" in current_groups:
        if "OT Resident" not in current_groups:
            current_groups.append("OT Resident")
    if "Muslim" in current_groups and not (
        "Sunni" in current_groups or "Shiite" in current_groups
    ):
        if not any(
            group_name in current_groups for group_name in ["Hezbollah", "Houthi"]
        ):
            current_groups.append("Sunni")
    for _ in range(archetype["count"]):
        memberships_data.append(current_groups)

# Create the data structure for UpSet plot
# from_memberships with this input type will return a pandas Series
# where the index is a MultiIndex of booleans (sets), and values are 1s.
# The index will NOT be unique if archetypes are repeated.
upset_data_series = from_memberships(memberships_data)

## Debugging
# print(f"Type of data from from_memberships: {type(upset_data_series)}")
# if isinstance(upset_data_series, pd.Series) and isinstance(
#     upset_data_series.index, pd.MultiIndex
# ):
#     print(f"Series head (index may not be unique yet):\n{upset_data_series.head()}")
#     print(f"Index names: {upset_data_series.index.names}")
#     print(
#         f"Total archetypal individuals represented (length of series): {len(upset_data_series)}"
#     )
#     print(f"Is index unique before UpSet: {upset_data_series.index.is_unique}")

# else:
#     print(
#         f"Warning: Data from from_memberships is not the expected Series with MultiIndex. Type: {type(upset_data_series)}"
#     )
#     if isinstance(upset_data_series, pd.Series):
#         print(f"Series index type: {type(upset_data_series.index)}")

final_data_for_plot = upset_data_series

try:
    # Create the plot
    # Set subset_size='sum' to aggregate the 'ones' from from_memberships.
    upset_plot_object = UpSet(
        final_data_for_plot,
        subset_size="sum",
        min_subset_size=1,
        show_counts=True,
        sort_by="cardinality",  # Sorts intersection bars
        sort_categories_by="cardinality",  # Sorts the sets on the left
    )

    # Styling subsets: 'present' should use names that are in upset_data_series.index.names
    if "OT Resident" in upset_data_series.index.names:
        upset_plot_object.style_subsets(
            present=["OT Resident"],
            facecolor="lightblue",
            label="In Occupied Territories",
        )
    if "Hamas" in upset_data_series.index.names:
        upset_plot_object.style_subsets(
            present=["Hamas"],
            edgecolor="red",
            hatch="xx",
            linewidth=2,
            label="Hamas Affiliation",
        )
    if "Likud" in upset_data_series.index.names:
        upset_plot_object.style_subsets(
            present=["Likud"],
            edgecolor="blue",
            hatch="//",
            linewidth=2,
            label="Likud Affiliation",
        )

    fig = plt.figure(
        # figsize=(18, 10)
    )
    upset_plot_object.plot(fig=fig)

    # plt.suptitle(
    #     "Conceptual Group Memberships In and Around Israel-Palestine",
    #     fontsize=16,
    # )
    plt.figtext(
        0.5,
        0.01,
        "NOTE: Intersection sizes are based on defined archetypes, not actual population data. This is a conceptual model.",
        ha="center",
        fontsize=8,
        style="italic",
    )
    plt.tight_layout(rect=[0, 0.03, 1, 0.95])
    # plt.savefig("israel_palestine_upset_plot.png", dpi=300)
    # plt.show()

    # print("UpSet plot 'israel_palestine_upset_plot.png' generated.")
    # Access the aggregated intersections from the UpSet object for the count
    # aggregated_intersections = upset_plot_object.intersections
    # print(
    #     f"Number of unique intersections shown: {len(aggregated_intersections[aggregated_intersections >= 1])}"
    # )
    # print(f"Total sum of intersection sizes: {aggregated_intersections.sum()}")


except Exception as e:
    print(f"An error occurred during plot generation: {e}")
    print(
        f"Data passed to UpSet (head):\n {final_data_for_plot.head() if isinstance(final_data_for_plot, pd.Series) else 'Data not a Series'}"
    )
    if isinstance(final_data_for_plot, pd.Series) and isinstance(
        final_data_for_plot.index, pd.MultiIndex
    ):
        print(f"Index names of data: {final_data_for_plot.index.names}")

Conceptual Group Memberships In and Around Israel-Palestine

4 Arguing the boundaries of categories

TODO: likelihood principle, compressions, $M$ -open…

4.1 Decision theory

Motte and Bailey. P-hack thyself.

5 Recommender systems and collective culture

See recommender dynamics.

6 Incoming

To read: Sen and Wasow (2016).

Jon Stokes, Google’s Colosseum

This map is contentious precisely because of its role in our red vs. blue power struggle, as a way of elevating some voices and silencing others. As such, it’s a remarkable example of the main point I’m trying to make in this post: the act of extracting a limited feature set from a natural paradigm, and then representing those higher-value features in a cultural product of some kind, is always about power on some level.

See also Affirming the Consequent and Tribal thermodynamics.

Henry Farrell and Marion Fourcade, The Moral Economy of High-Tech Modernism

While people in and around the tech industry debate whether algorithms are political at all, social scientists take the politics as a given, asking instead how this politics unfolds: how algorithms concretely govern. What we call “high-tech modernism”—the application of machine learning algorithms to organize our social, economic, and political life—has a dual logic. On the one hand, like traditional bureaucracy, it is an engine of classification, even if it categorizes people and things very differently. On the other, like the market, it provides a means of self-adjusting allocation, though its feedback loops work differently from the price system. Perhaps the most important consequence of high-tech modernism for the contemporary moral political economy is how it weaves hierarchy and data-gathering into the warp and woof of everyday life, replacing visible feedback loops with invisible ones, and suggesting that highly mediated outcomes are in fact the unmediated expression of people’s own true wishes.

When Does Instagram Decide a Nipple Becomes Female?
The Egg Yolk Principle: Human Sexuality Will Always Outsmart Prudish Algorithms and Hateful Politicians

Patreon’s changes to its terms also threw the “adult baby/diaper lover” community into chaos, in a perfect illustration of my point: A lot of participants inside that fandom insist it’s not sexual. A lot of people outside find it obscene. Who’s correct?

As part of answering that question for this article, I tried to find examples of content that’s arousing but not actually pornographic, like the egg yolks. This, as it happens, is a very “I know it when I see it” type of thing. Foot pottery? Obviously intended to arouse, but not explicitly pornographic. This account of AI-generated ripped women? Yep, and there’s a link to “18+” content in the account’s bio. Farting and spitting are too obviously kinky to successfully toe the line, but a woman chugging milk as part of a lactose intolerance experiment then recording herself suffering (including closeups of her face while farting) fits the bill, according to my entirely arbitrary terms. Confirming my not-porn-but-still-horny assessment, the original video—made by user toot_queen on TikTok, was reposted to Instagram by the lactose supplement company Dairy Joy. Fleece straightjackets, and especially tickle sessions in them, are too recognizably BDSM. This guy making biscuits on a blankie? I guess, man. Context matters: Eating cereal out of a woman’s armpit is way too literal to my eye, but it’d apparently fly on Patreon no problem.

“If the Russians want to slow down negotiations they demand we agree upon a taxonomy.”

7 References

Barocas, and Selbst. 2016. “Big Data’s Disparate Impact.” SSRN Scholarly Paper ID 2477899.

Baum, and Neill. 1906. John Dough and the cherub.

Borrás Pérez. n.d. “Facebook Doesn’t Like Sexual Health or Sexual Pleasure: Big Tech’s Ambiguous Content Moderation Policies and Their Impact on the Sexual and Reproductive Health of the Youth.” International Journal of Sexual Health.

Burrell. 2016. “How the Machine ’Thinks’: Understanding Opacity in Machine Learning Algorithms.” Big Data & Society.

Che, Zhang, Sohl-Dickstein, et al. 2020. “Your GAN Is Secretly an Energy-Based Model and You Should Use Discriminator Driven Latent Sampling.” arXiv:2003.06060 [Cs, Stat].

Dean, and Morgenstern. 2022. “Preference Dynamics Under Personalized Recommendations.”

Dressel, and Farid. 2018. “The Accuracy, Fairness, and Limits of Predicting Recidivism.” Science Advances.

Dutta, Wei, Yueksel, et al. 2020. “Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing.” In Proceedings of the 37th International Conference on Machine Learning.

Dwork, Hardt, Pitassi, et al. 2012. “Fairness Through Awareness.” In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference. ITCS ’12.

Farrell, and Fourcade. 2023. “The Moral Economy of High-Tech Modernism.” Daedalus.

Feldman, Friedler, Moeller, et al. 2015. “Certifying and Removing Disparate Impact.” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’15.

Gornet, and Viard. 2023. “Queer Identities and Machine Learning.”

Gozli. 2023. “Principles of Categorization: A Synthesis.” Seeds of Science.

Ho, Kastner, and Wong. 1978. “Teams, Signaling, and Information Theory.” IEEE Transactions on Automatic Control.

Kleinberg, and Raghavan. 2021. “Algorithmic Monoculture and Social Welfare.” Proceedings of the National Academy of Sciences.

Laufer. 2020. “Compounding Injustice: History and Prediction in Carceral Decision-Making.” arXiv:2005.13404 [Cs, Stat].

Lee, and Skrentny. 2010. “Race Categorization and the Regulation of Business and Science.” Law & Society Review.

Leqi, Hadfield-Menell, and Lipton. 2021. “When Curation Becomes Creation: Algorithms, Microcontent, and the Vanishing Distinction Between Platforms and Creators.” Queue.

Menon, and Williamson. 2018. “The Cost of Fairness in Binary Classification.” In Proceedings of the 1st Conference on Fairness, Accountability and Transparency.

O’Neil. 2017. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.

Pleiss, Raghavan, Wu, et al. 2017. “On Fairness and Calibration.” In Advances In Neural Information Processing Systems.

Raghavan. 2021. “The Societal Impacts of Algorithmic Decision-Making.”

Saperstein, Penner, and Light. 2013. “Racial Formation in Perspective: Connecting Individuals, Institutions, and Power Relations.” Annual Review of Sociology.

Sen, and Wasow. 2016. “Race as a Bundle of Sticks: Designs That Estimate Effects of Seemingly Immutable Characteristics.” Annual Review of Political Science.

Stray, Halevy, Assar, et al. 2022. “Building Human Values into Recommender Systems: An Interdisciplinary Synthesis.”

Venkatasubramanian, Scheidegger, Friedler, et al. 2021. “Fairness in Networks: Social Capital, Information Access, and Interventions.” In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. KDD ’21.

Verma, and Rubin. 2018. “Fairness Definitions Explained.” In Proceedings of the International Workshop on Software Fairness. FairWare ’18.

Wu, and Zhang. 2016. “Automated Inference on Criminality Using Face Images.” arXiv:1611.04135 [Cs].

Xu, and Dean. 2023. “Decision-Aid or Controller? Steering Human Decision Makers with Algorithms.”