A large chunk of the Fediverse was scraped; your posts are being “released”
@email@example.com @tastytea sorry that i misinterpreted the dataset and paper and posted rashly urging caution, lol. i'm just used to people on here going off on stuff like this before checking the set
anyway it might be productive to send a letter to the UMilan IRB too
@firstname.lastname@example.org @tastytea okay i'm pretty sure that this is assuming "implicit consent" which is a No No with IRBs. however!!! this may not be the kind of study that "needs" irb review.
it probably violates the GDPR though, and definitely some kind of institutional data ethics policy. i do work collecting behavioral data (with consent!) and we have to be really really careful with that even though it's not even under IRB review so? they could _definitely_ get in trouble if even one person whose stuff got in the scrape were to state that they didn't consent
@email@example.com @tastytea okay, useful links:
research ethics committee of uni milan https://www.unimi.it/en/research/research-lastatale/policies-and-principles/ricerca-e-innovazione-responsabile-rri/research-ethics
page with more about it and contact links (all italian) https://www.unimi.it/en/node/449
2019 code of ethics (it) https://www.unimi.it/sites/default/files/2019-05/Codice%20etico%202019.pdf
2019 code of ethics (en) https://www.unimi.it/sites/default/files/2019-07/Code%20of%20Ethics%20and%20for%20Research%20Integrity%20%28EN%29.pdf
@firstname.lastname@example.org @tastytea useful stuff:
art 16.1, Feasibility, and social and environmental impact: "Should the project likely produce a significant impact on the objects of the research or, in general, on society, the environment or the biosphere, researchers shall responsibly examine the potential impact, providing details of these assessments in the appropriate documentation". which they sort of did, but not really
25.1, Protection of persons involved in research: "Researchers shall pay due respect to all persons involved in their research, without compromising their health, the wellbeing of the community, and the safety and healthiness of the environment in which they work."
and here's the fucking kicker:
26.1, Informed consent: "Without prejudice to the principle of due respect for human dignity and autonomy, should theresearch entail the involvement of recruited participants, the research leader ensures that applicable norms on informed consent are respected, with special regard to incompetent subjects or, in any event, to individuals unable to give consent."
which they did not do! so they grossly violated UMilan's ethics code and they _absolutely_ can get in trouble
okay, i condensed everything the scrapers at UMilan violated and some grievance policy documentation. have fun and don't trash my doc!
@bgcarlisle i feel like you might be interested in this, particularly the fact that 10k posts from scholar.social made it in (as noted in the doc one post upthread) in contravention of your use policy
@er1n thank you for writing this, it's really helpful!
@melodicake this is a bunch of people at this point! but thank you <3
We counted around a half dozen "free speech" instances and none of the major ones are there, so they're basically targetting left-wingers here, which means majority members of marginalized groups and therefore people that are more likely to be victims of state-sponsored violence.
@bstacey They say they have it from https://instances.social/api/1.0/instances/list%0A?count=0. Page 3 on the left in the PDF I linked in the original post.
1. I wrote an introduction to summarize the incident.
2. I added a description of what local and federated timelines are to the "Mistaken classification of post privacy" section, in order to help distinguish what the authors conflate.
3. In "Failure to de-identify data", I shortened the big paragraph, since some of what it talked about didn't seem applicable to scraping by way of public TLs. Also, I added a quick paragraph about the risk of de-anonymization.
4. I snarked about their not disclosing their funding sources.
5. I added the sentence, "We express our gratitude to the administrators of Harvard Dataverse for acting promptly to deaccession the dataset."
re: Responding to Fediverse scraping
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!