Topic 7: FLOSS futures

As computational techniques develop, what do you see changing in the community you are contributing to? What future challenges and what ethical issues should you consider for a future to come?

Tags: Open Source collaboration

Free and open-source software or also known as FLOSS is a term that essentially means a project or software is free to use and modify by a user. In hindsight, FLOSS is the foundation for Open Source collaboration because it emphasises the principles of transparency and community-driven collaborations that allow users to access and contribute to the projects.

In this Open Source Unit, I have actively contributed to a machine learning (ML) open-source project called MindsDB (READ MINDSDB CONTRIBUTION DOCUMENTATION HERE). Although I recognised that machine learning is a fast-paced industry, it was only when I start contributing to MindsDB and saw the amount of time a single model needs to be updated in a week, that I realise just how fast the industry is. Through this experience I realised that the landscape for ML-based project would most likely change in the near future.

Hence, this blog will cover some of the changes I think we will see in the future for ML-based open-source projects:

An increase in Automated Machine Learning (AutoML) projects
An established standard of inter-operability among open-source machine-learning projects
Establishing (or on the way of establishing) a guideline for responsible ML practices within open-source machine-learning projects.

1. Increase in Automated Machine Learning (AutoML) projects

In late 2022, Fortune Business Insights published a report that highlights just how much the industry of machine learning will grow. The report published by Fortune stated an estimate that the global Machine Learning (ML) market will grow from $21.17 billion in 2022 to $209.91 billion by 2029 (Fortune, 2022). Hence, given the rapidly increasing demand for ML, it can be assumed that there is an opportunity to increase accessibility of integrating ML by automating (AutoML) both the integration of ML in a practice or building ML models (Qamar, 2023).

Possible AutoML projects would attempt to automate complex tasks such as model selection, hyperparameter tuning, and feature engineering, allowing non-experts to leverage machine learning techniques effectively.

As an example, one of the open-source projects that I've contributed to, MindsDB, have focused its efforts on automating machine learning models for cloud database software. Currently, the project team manually develops API handlers to establish connections between the databases and the automated ML model functions.

While MindsDB's approach to automating ML models is considered innovative and advanced, the team has recognized the need to further enhance efficiency. They are actively exploring the automation of API handler creation, aiming to make the entire process of building an ML model truly automated and highly efficient in the future.

2. Established standard of inter-operability among open-source machine-learning projects

As more open-source projects starting to incorporate ML into their workflows and project, it can also be presumed that there may be a need for an interoperability requirement between various open-source projects (Koch, 2022). Essentially this may entail outlining the "Best Practice" of integrating ML into development processes, or a guideline on how ML models from different projects can be "reused" in another project. I believe establishing a standard of inter-operability between projects that involve ML ensures that different projects function seamlessly together, hence preserving the collaborative nature of open-source development.

The potential for establishing a standard of interoperability is endless. However, in my opinion, the open-source community would most benefit from standardising data formats, model representations, and interfaces. This would facilitate easier integration and collaboration across various frameworks and platforms, fostering a more seamless and efficient development process among open-source communities.

3. Guideline for Responsible ML Practices

With the growth of ML, it has become crucial for the open-source community to establish clear guidelines that defines the ethical considerations and responsible applications of ML in the open-source projects (Harder, 2023). These guidelines should provide directions for the development and utilisation of machine learning software while ensuring ethical standards are upheld.

As discussed in the final lecture on "The Future of Licenses", creating an open-source project that uses machine learning raises some ethical considerations. Essentially, as the nature of open-source is "free to use", there needs to be a clearly stated and possibly legal enforcement for stopping users to use "free to use" projects for harm. Currently, common licenses used for open-source projects (GNU GLP or MIT Licenses) does not fully cover the safe utilisation of ML-based projects (Moran, 2021). Hence, in the future, these licenses must be updated to reflect the landscape of ML today, or a new license and guideline needs to be established.

Specifically, projects and communities will need to actively work on developing guidelines to address bias, transparency, fairness, and accountability in machine learning systems. I believe establishing a responsible guidelines for using machine learning in open-source projects will help uphold the core values of the open-source community of safety, transparency, and inclusivity.

In conclusion...

I believe the future of open-source machine learning projects will see an increase in Automated Machine Learning (AutoML) efforts for enhanced accessibility. Establishing interoperability standards and guidelines for responsible ML practices will foster collaboration and uphold ethical values. However, considering the fast-paced progress of machine learning, I also believe that some of these assumptions could materialise sooner than expected.

References

Fortune, F. (2022) Machine learning (ML) market size, share & covid-19 impact analysis, by Enterprise Type (Small & mid-sized enterprises (smes) and large enterprises), by deployment (cloud and on-premise), by end-use industry (healthcare, retail, it and Telecommunication, BFSI, Automotive and transportation, advertising and media, manufacturing, and others), and Regional Forecast, 2023-2030, Machine Learning Market Size, Share, Growth | Trends [2030]. Available at: https://www.fortunebusinessinsights.com/machine-learning-market-102226 (Accessed: 03 June 2023).

Harder, H. de (2023) Ethical considerations in Machine Learning Projects, Medium. Available at: https://towardsdatascience.com/ethical-considerations-in-machine-learning-projects-e17cb283e072 (Accessed: 03 June 2023).

Koch, R. (2022) Machine Learning and Interoperability, clickworker.com. Available at: https://www.clickworker.com/customer-blog/interoperability-and-the-future-of-machine-learning/ (Accessed: 03 June 2023).

Moran, C. (2021) Machine Learning, ethics, and Open Source Licensing (Part I/II), The Gradient. Available at: https://thegradient.pub/machine-learning-ethics-and-open-source-licensing/ (Accessed: 03 June 2023).

Qamar, S. (2023) The Future of Machine Learning: Automl, Analytics Vidhya. Available at: https://www.analyticsvidhya.com/blog/2023/01/the-future-of-machine-learning-automl/ (Accessed: 03 June 2023).

Menu

Labels

HNY Open Source Contributions

Topic 7: FLOSS futures

Friday, May 19, 2023