Facultad de Derecho

28 de agosto de 2023

ACADEMIC RESEARCH AND BIG DATA: A LEGAL PERSPECTIVE

Por: Daniel Peña Valenzuela

INTRODUCTION

The sources of scientific research are changing rapidly. The documentary and bibliographical search, as well as laboratory experimentation, is complemented by data that can be collected from both people and objects in an almost unlimited manner in terms of quantity and magnitude.

The data allows analysis, inferences and building premises based on more objective elements than traditional sources. Many scientific investigations were carried out without empirical information or with gaps that were filled with arguments or study hypotheses.

The change towards data as a source of research brings with it various problems such as intellectual property regarding data analysis models or regarding the results or conclusions of the data. Also regarding the consent of the individuals whose data is used in an investigation and the scope that these subjects of analysis must have regarding the final results of the research, among others.

In this article it is described in a preliminary way the trajectory of scientific research towards data as the main source, although not the only one, as well as the way in which this research can influence legal changes in the protection and access to the sources and results of scientific research.

 

SOCIAL SCIENCE AND DIGITALIZATION

The relationship between social sciences and digitalization is multifaceted and has significantly impacted the way social scientists conduct research, analyze data, and understand human behavior in contemporary society. Digitalization refers to the integration of digital technologies and data into various aspects of human life, and this transformation has had several effects on social sciences:

  • Data Collection and Analysis: Digitalization has led to the generation of vast amounts of digital data from various sources, such as social media, online surveys, digital archives, and administrative records. Social scientists now have access to large and diverse datasets, enabling them to study social phenomena in more depth and detail
  • New Research Methods: Digital technologies have introduced innovative research methods and analytical tools in social sciences. Researchers use data mining, natural language processing, sentiment analysis, and machine learning to analyze digital data and gain insights into social behavior, opinions, and trends.
  • Interdisciplinary Collaboration: Digitalization has encouraged collaboration between social scientists and experts in fields like computer science, information technology, and data science. This interdisciplinary approach fosters novel research projects that leverage digital tools and methods to address complex social issues.
  • Studying Online Behavior and Communities: Social scientists can now study online communities, social networks, and virtual interactions. These digital spaces provide unique opportunities to understand how individuals form social connections, communicate, and influence each other in the digital world.
  • Real-time Data and Timely Analysis: Digital data collection allows social scientists to access real-time information on social events, public sentiment, and trends. This timely analysis helps in understanding and responding to societal developments more effectively.
  • Ethical and Privacy Concerns: Digitalization raises ethical concerns regarding data privacy, informed consent, and the responsible use of data. Social scientists must navigate these ethical considerations while conducting research using digital data.
  • Impact on Society and Social Dynamics: Digitalization has transformed how people interact, communicate, and participate in society. Social scientists study the impact of digital technologies on social dynamics, cultural changes, and political behavior.
  • Policy and Governance: Social scientists play a crucial role in shaping policies and governance related to digital technologies. They investigate the societal implications of digitalization and contribute to the development of regulations and guidelines that balance technological advancements with ethical and social considerations.
  • Democratization of Research: Digitalization has facilitated the democratization of research and dissemination of findings. Online platforms, open-access journals, and social media allow researchers to share their work with a broader audience, promoting transparency and collaboration.

Thus, the relationship between social sciences and digitalization is mutually influential. Social sciences inform the understanding and responsible development of digital technologies, while digitalization provides new opportunities and challenges for social science research and its impact on society. As technology continues to evolve, social scientists will need to adapt their methods and theories to keep pace with the ever-changing digital landscape.

In Colombia, the social sciences have a special function for society in order to critically explain phenomena such as violence, corruption and the influence that cocaine cultivation and the international trafficking of hallucinogenic substances have had in our country. In this sense, it is important to have quantitative analysis tools that allow new analyzes and conclusions.

 

DIGITAL HUMANITIES AS A NEW PARADIGM

A point of convergence between data sciences and social sciences is the digital humanities. Whenever the human being has been the center of the world, be it in Ancient Greece or the Renaissance, humanism has been the forger of scientific experimentation, abstract thought and ethics.

The genesis and historic development of Digital Humanities can be traced back to the mid-20th century. The field emerged from the intersection of humanities disciplines, computing technology, and a growing interest in using computers to enhance humanistic research. Here is an overview of the key milestones in the development of Digital Humanities:

  • Early Computing and Humanities: The roots of Digital Humanities can be found in the early efforts to apply computing technology to humanities research. In the 1940s and 1950s, scholars and researchers explored ways to use computers for linguistic analysis, text encoding, and machine translation.
  • The Emergence of Text Analysis: In the 1960s and 1970s, computational linguistics and text analysis gained momentum. Scholars like Roberto Busa worked on large-scale projects, such as the Index Thomisticus, which involved encoding the works of Thomas Aquinas into a machine-readable format.
  • The Advent of Digital Archives: In the 1970s and 1980s, efforts were made to digitize and create digital archives of cultural heritage materials, such as manuscripts, texts, and artworks. These digital collections facilitated access and preservation while opening up new avenues for research.
  • Hypertext and Hypermedia: The development of hypertext and hypermedia in the 1980s brought new possibilities for structuring and linking textual information. Ted Nelson’s concept of hypertext and the invention of the World Wide Web by Tim Berners-Lee in 1989 further accelerated the integration of digital technologies with humanities research.
  • Growth of Digital Humanities Centers: In the 1990s, Digital Humanities centers began to emerge at universities and research institutions, fostering collaboration among scholars and encouraging the use of computational methods in humanities research.
  • Expansion of Digital Humanities Projects: During the late 1990s and early 2000s, there was a significant expansion of Digital Humanities projects in various disciplines. Scholars began to use digital tools for data visualization, text mining, GIS, and network analysis to study literature, history, linguistics, and other humanities subjects.
  • The Influence of Open Access and Open Data: The Open Access and Open Data movements gained momentum in the early 2000s, promoting the sharing of research findings, data, and tools in the Digital Humanities community, leading to increased collaboration and transparency.
  • Big Data and Data-Driven Research: With the rise of big data and advanced data analytics, Digital Humanities embraced data-driven approaches, allowing researchers to analyze large datasets to explore patterns, trends, and cultural phenomena.
  • Digital Humanities and Social Media: The popularity of social media platforms in the late 2000s and 2010s opened new opportunities for studying human behavior, discourse, and cultural expressions in digital spaces.
  • Interdisciplinary Collaboration: Digital Humanities increasingly embraced interdisciplinary collaboration, integrating expertise from computer science, information science, data science, and other fields to tackle complex research questions.

Digital Humanities continues to evolve and diversify, with scholars exploring innovative methods, technologies, and theories to understand human culture, history, and expression in the digital age. The field has grown to include a wide range of topics, from data visualization and computational linguistics to cultural analytics and digital storytelling. As technology continues to advance, Digital Humanities is likely to play an increasingly significant role in shaping humanistic research and understanding in the years to come.

n Colombia, the Colombian Network of Digital Humanities was founded, which is an open community since September 2016 as an initiative of a group of academics and entities interested in promoting and supporting the field of Digital Humanities in Colombia. Its objectives are:

  • Build bridges between people, projects and institutions that work at the intersection between the humanities, digital culture and digital technology in Colombia.
  • Promote spaces for dialogue, experimentation, collaboration, research and dissemination to strengthen the field of Digital Humanities.
  • Articulate Digital Humanities initiatives with cultural and memory institutions, universities, the media and civil society organizations.
  • Design and support projects and initiatives of the Digital Humanities assuming a critical and interdisciplinary view of digital technologies.
  • Contribute to the consolidation of Digital Humanities made from Latin America

 

OPEN DATA AS A NEW MODEL FOR RESEARCH IN THE ACADEMIC LANDSCAPE

The analysis of open data has emerged as a new paradigm of academic research that holds significant promise for advancing scientific understanding, promoting collaboration, and fostering transparency. Open data refers to datasets that are made available to the public, often with minimal or no restrictions on access, usage, and redistribution. This shift towards open data has several implications for academic research:

 

  • Transparency and Reproducibility: Open data allows researchers to provide transparent documentation of their findings, methodologies, and datasets. This transparency enables other researchers to reproduce and validate the results, which is essential for the advancement of knowledge and the prevention of scientific misconduct.

 

  • Collaboration and Interdisciplinary Research: Open data encourages collaboration across different disciplines and research groups. Researchers from various fields can access and utilize the same datasets, leading to interdisciplinary research that can uncover novel insights and address complex, cross-disciplinary questions.

 

  • Efficiency and Resource Optimization: By sharing data openly, researchers can avoid duplicating efforts and resources. This efficiency is especially important for large-scale projects where data collection can be time-consuming and expensive. Open data can also lead to more efficient use of public funding.

 

  • Innovation and New Discoveries: Open data can serve as a fertile ground for innovation, as it allows researchers to explore new research questions and test novel hypotheses using existing datasets. This can lead to unexpected discoveries and insights that might not have been possible with limited access to data.

 

  • Data Reusability: Open datasets are often available long after the original research is published. This reusability enables researchers to explore new questions using the same data, compare findings across different studies, and conduct meta-analyses to draw more robust conclusions.

 

  • Citizen Science and Public Engagement: Open data can involve the general public in scientific research through citizen science projects. This engagement not only increases the amount of data available but also fosters public understanding of science and a sense of ownership over research outcomes.

 

  • Challenges and Considerations: While open data has numerous advantages, there are challenges to address. These include ensuring data privacy and security, dealing with issues related to data quality and standardization, and establishing mechanisms for proper data citation to give credit to the original data creators.

 

  • Cultural Shift in Academia: The adoption of open data requires a cultural shift in academia, where researchers and institutions need to recognize the value of sharing data and embrace open science practices. This shift may involve overcoming concerns about data ownership, competition, and career incentives.

 

  • Data Management and Infrastructure: Institutions and researchers need to invest in proper data management practices and infrastructure to ensure data is well-documented, curated, and accessible in the long term. This includes considerations for data storage, metadata creation, and data sharing platforms.

In conclusion, the analysis of open data represents a new and transformative approach to academic research. It has the potential to accelerate scientific progress, promote collaboration, and increase the overall quality and reliability of research findings. As open data practices continue to evolve, researchers and institutions must actively engage with the challenges and opportunities presented by this paradigm shift.

 

ACADEMIC RESEARCH AND BIG DATA

As the use of big data in academic research continues to grow, several challenges arise that researchers need to address. Some of the main challenges include:

 

  • Data Collection and Quality: Gathering and curating large-scale data sets can be a time-consuming and resource-intensive process. Ensuring the quality, completeness, and reliability of the data is crucial, as errors or biases in the data can lead to misleading or incorrect research conclusions.

 

  • Data Storage and Management: Storing and managing massive volumes of data can strain existing IT infrastructure and require sophisticated data management solutions. Researchers must find efficient ways to store, organize, and retrieve data while maintaining its integrity and security.

 

  • Data Privacy and Ethics: Big data often contains sensitive information, raising concerns about privacy and data protection. Researchers must navigate legal and ethical considerations when handling personal or sensitive data, ensuring that proper consent and anonymization protocols are followed.

 

  • Data Integration and Interoperability: Combining data from different sources with varying formats and structures can be complex. Researchers may encounter difficulties in integrating heterogeneous data sets, which can hinder data analysis and interpretation.

 

  • Analytical Challenges: Analyzing big data requires specialized skills and tools. Traditional statistical methods may not be sufficient to extract meaningful insights from large and complex datasets. Researchers need to develop and adopt advanced data analytics techniques, such as machine learning and data mining.

 

  • Scalability: As data continues to grow at an exponential rate, researchers must ensure that their analytical methods and computational infrastructure can handle the increasing volume, velocity, and variety of data.

 

  • Interpretation and Generalization: With big data, it is possible to find statistically significant correlations that might not necessarily imply causation. Researchers must be cautious when interpreting findings and avoid making overgeneralized conclusions based solely on the size of the data.

 

  • Reproducibility and Transparency: The size and complexity of big data research can make it challenging to reproduce studies and validate findings independently. Ensuring transparency in data handling and analysis is crucial to promoting scientific rigor and replicability.

 

  • Cost and Resource Allocation: Accessing and processing large datasets may require significant financial resources and computational power. Academic institutions and researchers must consider the costs associated with big data research and allocate resources accordingly.

 

  • Knowledge Gap and Training: Not all researchers may possess the necessary skills and expertise to work with big data effectively. Bridging the knowledge gap and providing training in big data analytics and methodologies are essential to fostering data-driven academic research.

 

  • Addressing these challenges requires collaboration among researchers, institutions, and policymakers to develop guidelines, best practices, and infrastructure support that promote responsible and impactful big data research.

 

CHALLENGES TO INTELLECTUAL PROPERTY AS A RESULT OF ACADEMIC RESEARCH BASED ON BIG DATA

The Intellectual property (IP) practice is changing due to the nature of data sharing, collaboration, and ownership in the digital age. The key challenges researchers face in navigating IP issues related to big data in academia are the following:

 

  • Data Ownership and Access Control: Determining who owns the data and who has the right to control access to it can be difficult, especially in collaborative research efforts. Big data often involves multiple sources, making it challenging to identify the original creators and their rights.

 

  • Data Privacy and Consent: Big data often contains personal or sensitive information. Researchers must navigate privacy laws and ethical considerations when using and sharing such data. Obtaining proper informed consent and ensuring data anonymization can be complex, as re-identification risks increase with the volume and variety of data.

 

  • Data Licensing and Usage Rights: Clarifying the terms under which data is shared and used is crucial. Researchers need to understand the licensing agreements associated with the data they use, as well as any restrictions on its reuse, redistribution, and commercialization.

 

  • Derivative Works and Transformative Use: Analyzing big data might involve creating derivative works or making transformative use of existing data. Deciding when these new works are separate and protectable under IP law can be challenging, as the line between original work and derivative work can be blurred.

 

  • Public vs. Private Research: Many big data projects involve collaborations between academia, industry, and government. Researchers must consider how different stakeholders’ interests align and how IP rights might be shared or managed between public and private entities.

 

  • Publication and Preemption: The tension between open sharing of research findings and the need to secure IP rights can arise. Publishing data and findings can potentially limit the ability to patent or protect inventions derived from that data.

 

  • International Considerations: Big data research often involves global collaborations, which can lead to conflicts between different jurisdictions’ IP laws and regulations. Researchers must navigate these complexities to ensure compliance and protection.

 

  • Data Integration and Aggregation: Combining data from multiple sources to create a valuable dataset can raise IP issues when dealing with data contributed by different parties. Questions about who owns the integrated dataset and how it can be used may arise.

 

  • IP Rights in Algorithms and Models: Big data analysis often involves developing algorithms, models, and software tools. Researchers must address issues related to the protection and ownership of these intellectual property assets.

 

  • Cultural Shift and Collaboration: The academic culture of open sharing can clash with the need to protect IP. Researchers need to balance the benefits of collaboration and open science with the desire to protect their inventions and discoveries.

 

  • Enforcement and Litigation: In cases of IP disputes, enforcement and litigation can be complex, especially when data ownership and usage rights are not clearly defined. This can lead to lengthy legal battles that hinder research progress.

 

  • Addressing these challenges requires a combination of legal expertise, ethical considerations, and clear communication among researchers, institutions, data providers, and legal advisors. Developing best practices for data sharing, collaboration agreements, and IP protection can help researchers navigate the IP landscape while fostering innovation and advancing knowledge in the realm of big data research.

 

CONCLUDING REMARKS

In conclusion, the social sciences are moving towards combining traditional theoretical studies with empirical review of data and information in a previously unknown volume and magnitude.

The change in the object of analysis becomes a challenge for the way in which research is carried out in practice as well as for the governance of the results.

Intellectual property is one of the aspects to take into account when research involves data and it also offers a new paradigm based on the way of sharing the information based on the research as well as on the dissemination of the results.

In Colombia, it is relevant that the new generation of social scientists can use more and more quantitative analytical instruments that allow them to delve into fundamental issues for our society such as violence, peace, democracy, the fight against poverty, the creation of public policies and the energy transition, among others.

Artículos Recientes