Jump to content

Code property graph: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Citation bot (talk | contribs)
Alter: pages. Add: issue, volume. | Use this bot. Report bugs. | Suggested by Corvus florensis | #UCB_webform 507/3500
Citation bot (talk | contribs)
Alter: title, template type. Add: chapter. Removed parameters. | Use this bot. Report bugs. | #UCB_CommandLine
Line 1: Line 1:
{{Short description|Representation of a computer program}}
{{Short description|Representation of a computer program}}


In [[computer science]], a '''code property graph''' (CPG) is a [[computer program]] representation that captures [[Abstract syntax tree|syntactic structure]], [[Control-flow graph|control flow]], and [[data dependencies]] in a [[Graph database|property graph]]. The concept was originally introduced to identify security vulnerabilities in [[C (programming language)|C]] and [[C++]] system code,<ref>{{cite journal |last1=Yamaguchi |first1=Fabian |last2=Golde |first2=Nico |last3=Arp |first3=Daniel |last4=Rieck |first4=Konrad |title=Modeling and Discovering Vulnerabilities with Code Property Graphs |journal=2014 IEEE Symposium on Security and Privacy |date=May 2014 |pages=590–604 |doi=10.1109/SP.2014.44|isbn=978-1-4799-4686-0 |s2cid=2231082 }}</ref> but has since been employed to analyze [[web application]]s,<ref>{{cite journal |last1=Backes |first1=Michael |last2=Rieck |first2=Konrad |last3=Skoruppa |first3=Malte |last4=Stock |first4=Ben |last5=Yamaguchi |first5=Fabian |title=Efficient and Flexible Discovery of PHP Application Vulnerabilities |journal=2017 IEEE European Symposium on Security and Privacy (EuroS&P) |date=April 2017 |pages=334–349 |doi=10.1109/EuroSP.2017.14|isbn=978-1-5090-5762-7 |s2cid=206649536 }}</ref><ref>{{cite book |last1=Li |first1=Song |last2=Kang |first2=Mingqing |last3=Hou |first3=Jianwei |last4=Cao |first4=Yinzhi |title=Mining Node.js Vulnerabilities via Object Dependence Graph and Query |date=2022 |pages=143–160 |isbn=9781939133311 |url=https://www.usenix.org/conference/usenixsecurity22/presentation/li-song |language=en}}</ref><ref>{{cite journal |last1=Brito |first1=Tiago |last2=Lopes |first2=Pedro |last3=Santos |first3=Nuno |last4=Santos |first4=José Fragoso |title=Wasmati: An efficient static vulnerability scanner for WebAssembly |journal=Computers & Security |date=1 July 2022 |volume=118 |pages=102745 |doi=10.1016/j.cose.2022.102745|arxiv=2204.12575 |s2cid=248405811 }}</ref><ref>{{cite book |last1=Khodayari |first1=Soheil |last2=Pellegrino |first2=Giancarlo |title=JAW: Studying Client-side CSRF with Hybrid Property Graphs and Declarative Traversals |date=2021 |pages=2525–2542 |isbn=9781939133243 |url=https://www.usenix.org/conference/usenixsecurity21/presentation/khodayari |language=en}}</ref> cloud deployments,<ref>{{cite journal |last1=Banse |first1=Christian |last2=Kunz |first2=Immanuel |last3=Schneider |first3=Angelika |last4=Weiss |first4=Konrad |title=Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis |journal=2021 IEEE 14th International Conference on Cloud Computing (CLOUD) |date=September 2021 |pages=13–19 |doi=10.1109/CLOUD53861.2021.00014|arxiv=2206.06938 |isbn=978-1-6654-0060-2 |s2cid=243946828 }}</ref> and smart contracts.<ref>{{cite journal |last1=Giesen |first1=Jens-Rene |last2=Andreina |first2=Sebastien |last3=Rodler |first3=Michael |last4=Karame |first4=Ghassan |last5=Davi |first5=Lucas |title=Practical Mitigation of Smart Contract Bugs {{!}} TeraFlow |website=www.teraflow-h2020.eu |url=https://www.teraflow-h2020.eu/publications/practical-mitigation-smart-contract-bugs}}</ref> Beyond vulnerability discovery, code property graphs find applications in code clone detection,<ref>{{cite journal |last1=Wi |first1=Seongil |last2=Woo |first2=Sijae |last3=Whang |first3=Joyce Jiyoung |last4=Son |first4=Sooel |title=HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs |journal=Proceedings of the ACM Web Conference 2022 |date=25 April 2022 |pages=755–766 |doi=10.1145/3485447.3512235|isbn=9781450390965 |s2cid=248367462 }}</ref><ref>{{cite journal |last1=Bowman |first1=Benjamin |last2=Huang |first2=H. Howie |title=VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets |journal=2020 IEEE European Symposium on Security and Privacy (EuroS&P) |date=September 2020 |pages=53–69 |doi=10.1109/EuroSP48549.2020.00012|isbn=978-1-7281-5087-1 |s2cid=226268429 }}</ref> attack-surface detection,<ref>{{cite journal |last1=Du |first1=Xiaoning |last2=Chen |first2=Bihuan |last3=Li |first3=Yuekang |last4=Guo |first4=Jianmin |last5=Zhou |first5=Yaqin |last6=Liu |first6=Yang |last7=Jiang |first7=Yu |title=LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment Through Program Metrics |journal=2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) |date=May 2019 |pages=60–71 |doi=10.1109/ICSE.2019.00024|arxiv=1901.11479 |isbn=978-1-7281-0869-8 |s2cid=59523689 }}</ref> exploit generation,<ref>{{cite book |last1=Alhuzali |first1=Abeer |last2=Gjomemo |first2=Rigel |last3=Eshete |first3=Birhanu |last4=Venkatakrishnan |first4=V. N. |title=NAVEX: Precise and Scalable Exploit Generation for Dynamic Web Applications |date=2018 |pages=377–392 |isbn=9781939133045 |url=https://www.usenix.org/conference/usenixsecurity18/presentation/alhuzali |language=en}}</ref> measuring code testability,<ref>{{cite journal |last1=Al Kassar |first1=Feras |last2=Clerici |first2=Giulia |last3=Compagna |first3=Luca |last4=Balzarotti |first4=Davide |last5=Yamaguchi |first5=Fabian |title=Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications – NDSS Symposium |journal=NDSS Symposium |url=https://www.ndss-symposium.org/ndss-paper/auto-draft-206/}}</ref> and backporting of security patches.<ref>{{cite book |last1=Shi |first1=Youkun |last2=Zhang |first2=Yuan |last3=Luo |first3=Tianhan |last4=Mao |first4=Xiangyu |last5=Cao |first5=Yinzhi |last6=Wang |first6=Ziwen |last7=Zhao |first7=Yudi |last8=Huang |first8=Zongan |last9=Yang |first9=Min |title=Backporting Security Patches of Web Applications: A Prototype Design and Implementation on Injection Vulnerability Patches |date=2022 |pages=1993–2010 |isbn=9781939133311 |url=https://www.usenix.org/conference/usenixsecurity22/presentation/shi |language=en}}</ref>
In [[computer science]], a '''code property graph''' (CPG) is a [[computer program]] representation that captures [[Abstract syntax tree|syntactic structure]], [[Control-flow graph|control flow]], and [[data dependencies]] in a [[Graph database|property graph]]. The concept was originally introduced to identify security vulnerabilities in [[C (programming language)|C]] and [[C++]] system code,<ref>{{cite book |last1=Yamaguchi |first1=Fabian |last2=Golde |first2=Nico |last3=Arp |first3=Daniel |last4=Rieck |first4=Konrad |title=2014 IEEE Symposium on Security and Privacy |chapter=Modeling and Discovering Vulnerabilities with Code Property Graphs |date=May 2014 |pages=590–604 |doi=10.1109/SP.2014.44|isbn=978-1-4799-4686-0 |s2cid=2231082 }}</ref> but has since been employed to analyze [[web application]]s,<ref>{{cite book |last1=Backes |first1=Michael |last2=Rieck |first2=Konrad |last3=Skoruppa |first3=Malte |last4=Stock |first4=Ben |last5=Yamaguchi |first5=Fabian |title=2017 IEEE European Symposium on Security and Privacy (EuroS&P) |chapter=Efficient and Flexible Discovery of PHP Application Vulnerabilities |date=April 2017 |pages=334–349 |doi=10.1109/EuroSP.2017.14|isbn=978-1-5090-5762-7 |s2cid=206649536 }}</ref><ref>{{cite book |last1=Li |first1=Song |last2=Kang |first2=Mingqing |last3=Hou |first3=Jianwei |last4=Cao |first4=Yinzhi |title=Mining Node.js Vulnerabilities via Object Dependence Graph and Query |date=2022 |pages=143–160 |isbn=9781939133311 |url=https://www.usenix.org/conference/usenixsecurity22/presentation/li-song |language=en}}</ref><ref>{{cite journal |last1=Brito |first1=Tiago |last2=Lopes |first2=Pedro |last3=Santos |first3=Nuno |last4=Santos |first4=José Fragoso |title=Wasmati: An efficient static vulnerability scanner for WebAssembly |journal=Computers & Security |date=1 July 2022 |volume=118 |pages=102745 |doi=10.1016/j.cose.2022.102745|arxiv=2204.12575 |s2cid=248405811 }}</ref><ref>{{cite book |last1=Khodayari |first1=Soheil |last2=Pellegrino |first2=Giancarlo |title=JAW: Studying Client-side CSRF with Hybrid Property Graphs and Declarative Traversals |date=2021 |pages=2525–2542 |isbn=9781939133243 |url=https://www.usenix.org/conference/usenixsecurity21/presentation/khodayari |language=en}}</ref> cloud deployments,<ref>{{cite book |last1=Banse |first1=Christian |last2=Kunz |first2=Immanuel |last3=Schneider |first3=Angelika |last4=Weiss |first4=Konrad |title=2021 IEEE 14th International Conference on Cloud Computing (CLOUD) |chapter=Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis |date=September 2021 |pages=13–19 |doi=10.1109/CLOUD53861.2021.00014|arxiv=2206.06938 |isbn=978-1-6654-0060-2 |s2cid=243946828 }}</ref> and smart contracts.<ref>{{cite journal |last1=Giesen |first1=Jens-Rene |last2=Andreina |first2=Sebastien |last3=Rodler |first3=Michael |last4=Karame |first4=Ghassan |last5=Davi |first5=Lucas |title=Practical Mitigation of Smart Contract Bugs {{!}} TeraFlow |website=www.teraflow-h2020.eu |url=https://www.teraflow-h2020.eu/publications/practical-mitigation-smart-contract-bugs}}</ref> Beyond vulnerability discovery, code property graphs find applications in code clone detection,<ref>{{cite journal |last1=Wi |first1=Seongil |last2=Woo |first2=Sijae |last3=Whang |first3=Joyce Jiyoung |last4=Son |first4=Sooel |title=HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs |journal=Proceedings of the ACM Web Conference 2022 |date=25 April 2022 |pages=755–766 |doi=10.1145/3485447.3512235|isbn=9781450390965 |s2cid=248367462 }}</ref><ref>{{cite book |last1=Bowman |first1=Benjamin |last2=Huang |first2=H. Howie |title=2020 IEEE European Symposium on Security and Privacy (EuroS&P) |chapter=VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets |date=September 2020 |pages=53–69 |doi=10.1109/EuroSP48549.2020.00012|isbn=978-1-7281-5087-1 |s2cid=226268429 }}</ref> attack-surface detection,<ref>{{cite book |last1=Du |first1=Xiaoning |last2=Chen |first2=Bihuan |last3=Li |first3=Yuekang |last4=Guo |first4=Jianmin |last5=Zhou |first5=Yaqin |last6=Liu |first6=Yang |last7=Jiang |first7=Yu |title=2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) |chapter=LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment Through Program Metrics |date=May 2019 |pages=60–71 |doi=10.1109/ICSE.2019.00024|arxiv=1901.11479 |isbn=978-1-7281-0869-8 |s2cid=59523689 }}</ref> exploit generation,<ref>{{cite book |last1=Alhuzali |first1=Abeer |last2=Gjomemo |first2=Rigel |last3=Eshete |first3=Birhanu |last4=Venkatakrishnan |first4=V. N. |title=NAVEX: Precise and Scalable Exploit Generation for Dynamic Web Applications |date=2018 |pages=377–392 |isbn=9781939133045 |url=https://www.usenix.org/conference/usenixsecurity18/presentation/alhuzali |language=en}}</ref> measuring code testability,<ref>{{cite journal |last1=Al Kassar |first1=Feras |last2=Clerici |first2=Giulia |last3=Compagna |first3=Luca |last4=Balzarotti |first4=Davide |last5=Yamaguchi |first5=Fabian |title=Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications – NDSS Symposium |journal=NDSS Symposium |url=https://www.ndss-symposium.org/ndss-paper/auto-draft-206/}}</ref> and backporting of security patches.<ref>{{cite book |last1=Shi |first1=Youkun |last2=Zhang |first2=Yuan |last3=Luo |first3=Tianhan |last4=Mao |first4=Xiangyu |last5=Cao |first5=Yinzhi |last6=Wang |first6=Ziwen |last7=Zhao |first7=Yudi |last8=Huang |first8=Zongan |last9=Yang |first9=Min |title=Backporting Security Patches of Web Applications: A Prototype Design and Implementation on Injection Vulnerability Patches |date=2022 |pages=1993–2010 |isbn=9781939133311 |url=https://www.usenix.org/conference/usenixsecurity22/presentation/shi |language=en}}</ref>


== Definition ==
== Definition ==
Line 29: Line 29:
'''Plume CPG.''' Developed at [[Stellenbosch University]] in 2020 and sponsored by Amazon Science, the open-source Plume<ref>{{cite web |title=Plume |url=https://plume-oss.github.io/plume-docs/ |website=plume-oss.github.io}}</ref> project provides a code property graph for Java bytecode compatible with the code property graph specification provided by the Joern project. The two projects merged in 2021.
'''Plume CPG.''' Developed at [[Stellenbosch University]] in 2020 and sponsored by Amazon Science, the open-source Plume<ref>{{cite web |title=Plume |url=https://plume-oss.github.io/plume-docs/ |website=plume-oss.github.io}}</ref> project provides a code property graph for Java bytecode compatible with the code property graph specification provided by the Joern project. The two projects merged in 2021.


'''Fraunhofer AISEC CPG.''' The {{ill|Fraunhofer Institute for Applied and Integrated Security|de|Fraunhofer-Institut für Angewandte und Integrierte Sicherheit}} provides open-source code property graph generators for C/C++, Java, Golang, and Python,<ref>{{cite web |title=Code Property Graph |url=https://github.com/Fraunhofer-AISEC/cpg |publisher=Fraunhofer AISEC |date=31 August 2022}}</ref> albeit without a formal schema specification. It also provides the Cloud Property Graph,<ref>{{cite journal |last1=Banse |first1=Christian |last2=Kunz |first2=Immanuel |last3=Schneider |first3=Angelika |last4=Weiss |first4=Konrad |title=Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis |journal=2021 IEEE 14th International Conference on Cloud Computing (CLOUD) |date=September 2021 |pages=13–19 |doi=10.1109/CLOUD53861.2021.00014|arxiv=2206.06938 |isbn=978-1-6654-0060-2 |s2cid=243946828 }}</ref> an extension of the code property graph concept that models details of cloud deployments.
'''Fraunhofer AISEC CPG.''' The {{ill|Fraunhofer Institute for Applied and Integrated Security|de|Fraunhofer-Institut für Angewandte und Integrierte Sicherheit}} provides open-source code property graph generators for C/C++, Java, Golang, and Python,<ref>{{cite web |title=Code Property Graph |url=https://github.com/Fraunhofer-AISEC/cpg |publisher=Fraunhofer AISEC |date=31 August 2022}}</ref> albeit without a formal schema specification. It also provides the Cloud Property Graph,<ref>{{cite book |last1=Banse |first1=Christian |last2=Kunz |first2=Immanuel |last3=Schneider |first3=Angelika |last4=Weiss |first4=Konrad |title=2021 IEEE 14th International Conference on Cloud Computing (CLOUD) |chapter=Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis |date=September 2021 |pages=13–19 |doi=10.1109/CLOUD53861.2021.00014|arxiv=2206.06938 |isbn=978-1-6654-0060-2 |s2cid=243946828 }}</ref> an extension of the code property graph concept that models details of cloud deployments.


'''Galois’ CPG for LLVM.''' Galois Inc. provides a code property graph based on the [[LLVM]] compiler.<ref>{{cite web |title=The Code Property Graph — MATE 0.1.0.0 documentation |url=https://galoisinc.github.io/MATE/cpg.html |website=galoisinc.github.io}}</ref> The graph represents code at different stages of the compilation and a mapping between these representations. It follows a custom schema that is defined in its documentation.
'''Galois’ CPG for LLVM.''' Galois Inc. provides a code property graph based on the [[LLVM]] compiler.<ref>{{cite web |title=The Code Property Graph — MATE 0.1.0.0 documentation |url=https://galoisinc.github.io/MATE/cpg.html |website=galoisinc.github.io}}</ref> The graph represents code at different stages of the compilation and a mapping between these representations. It follows a custom schema that is defined in its documentation.


== Machine learning on code property graphs ==
== Machine learning on code property graphs ==
Code property graphs provide the basis for several machine-learning-based approaches to vulnerability discovery. In particular, [[graph neural network]]s (GNN) have been employed to derive vulnerability detectors.<ref>{{cite journal |last1=Zhou |first1=Yaqin |last2=Liu |first2=Shangqing |last3=Siow |first3=Jingkai |last4=Du |first4=Xiaoning |last5=Liu |first5=Yang |title=Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks |journal=Proceedings of the 33rd International Conference on Neural Information Processing Systems |date=8 December 2019 |pages=10197–10207 |url=https://dl.acm.org/doi/10.5555/3454287.3455202 |publisher=Curran Associates Inc.|arxiv=1909.03496 }}</ref><ref>{{cite journal |last1=Haojie |first1=Zhang |last2=Yujun |first2=Li |last3=Yiwei |first3=Liu |last4=Nanxin |first4=Zhou |title=Vulmg: A Static Detection Solution For Source Code Vulnerabilities Based On Code Property Graph and Graph Attention Network |journal=2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) |date=December 2021 |pages=250–255 |doi=10.1109/ICCWAMTIP53232.2021.9674145|isbn=978-1-6654-1364-0 |s2cid=246039350 }}</ref><ref>{{cite journal |last1=Zheng |first1=Weining |last2=Jiang |first2=Yuan |last3=Su |first3=Xiaohong |title=Vu1SPG: Vulnerability detection based on slice property graph representation learning |journal=2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) |date=October 2021 |pages=457–467 |doi=10.1109/ISSRE52982.2021.00054|isbn=978-1-6654-2587-2 |s2cid=246751595 }}</ref><ref>{{cite journal |last1=Chakraborty |first1=Saikat |last2=Krishna |first2=Rahul |last3=Ding |first3=Yangruibo |last4=Ray |first4=Baishakhi |title=Deep Learning based Vulnerability Detection: Are We There Yet |journal=IEEE Transactions on Software Engineering |date=2021 |volume=48 |issue=9 |pages=3280–3296 |doi=10.1109/TSE.2021.3087402|arxiv=2009.07235 |s2cid=221703797 }}</ref><ref>{{cite journal |last1=Zhou |first1=Li |last2=Huang |first2=Minhuan |last3=Li |first3=Yujun |last4=Nie |first4=Yuanping |last5=Li |first5=Jin |last6=Liu |first6=Yiwei |title=GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on Graph Attention Network |journal=2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC) |date=October 2021 |pages=381–388 |doi=10.1109/DSC53577.2021.00060|arxiv=2202.02501 |isbn=978-1-6654-1815-7 |s2cid=246634824 }}</ref><ref>{{cite journal |last1=Ganz |first1=Tom |last2=Härterich |first2=Martin |last3=Warnecke |first3=Alexander |last4=Rieck |first4=Konrad |title=Explaining Graph Neural Networks for Vulnerability Discovery |journal=Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security |date=15 November 2021 |pages=145–156 |doi=10.1145/3474369.3486866|isbn=9781450386579 |s2cid=240001850 |doi-access=free }}</ref><ref>{{cite journal |last1=Duan |first1=Xu |last2=Wu |first2=Jingzheng |last3=Ji |first3=Shouling |last4=Rui |first4=Zhiqing |last5=Luo |first5=Tianyue |last6=Yang |first6=Mutian |last7=Wu |first7=Yanjun |title=VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities |journal=Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence |date=August 2019 |pages=4665–4671 |doi=10.24963/ijcai.2019/648|isbn=978-0-9992411-4-1 |s2cid=199466292 |doi-access=free }}</ref>
Code property graphs provide the basis for several machine-learning-based approaches to vulnerability discovery. In particular, [[graph neural network]]s (GNN) have been employed to derive vulnerability detectors.<ref>{{cite journal |last1=Zhou |first1=Yaqin |last2=Liu |first2=Shangqing |last3=Siow |first3=Jingkai |last4=Du |first4=Xiaoning |last5=Liu |first5=Yang |title=Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks |journal=Proceedings of the 33rd International Conference on Neural Information Processing Systems |date=8 December 2019 |pages=10197–10207 |url=https://dl.acm.org/doi/10.5555/3454287.3455202 |publisher=Curran Associates Inc.|arxiv=1909.03496 }}</ref><ref>{{cite book |last1=Haojie |first1=Zhang |last2=Yujun |first2=Li |last3=Yiwei |first3=Liu |last4=Nanxin |first4=Zhou |title=2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) |chapter=Vulmg: A Static Detection Solution for Source Code Vulnerabilities Based on Code Property Graph and Graph Attention Network |date=December 2021 |pages=250–255 |doi=10.1109/ICCWAMTIP53232.2021.9674145|isbn=978-1-6654-1364-0 |s2cid=246039350 }}</ref><ref>{{cite book |last1=Zheng |first1=Weining |last2=Jiang |first2=Yuan |last3=Su |first3=Xiaohong |title=2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) |chapter=Vu1SPG: Vulnerability detection based on slice property graph representation learning |date=October 2021 |pages=457–467 |doi=10.1109/ISSRE52982.2021.00054|isbn=978-1-6654-2587-2 |s2cid=246751595 }}</ref><ref>{{cite journal |last1=Chakraborty |first1=Saikat |last2=Krishna |first2=Rahul |last3=Ding |first3=Yangruibo |last4=Ray |first4=Baishakhi |title=Deep Learning based Vulnerability Detection: Are We There Yet |journal=IEEE Transactions on Software Engineering |date=2021 |volume=48 |issue=9 |pages=3280–3296 |doi=10.1109/TSE.2021.3087402|arxiv=2009.07235 |s2cid=221703797 }}</ref><ref>{{cite book |last1=Zhou |first1=Li |last2=Huang |first2=Minhuan |last3=Li |first3=Yujun |last4=Nie |first4=Yuanping |last5=Li |first5=Jin |last6=Liu |first6=Yiwei |title=2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC) |chapter=GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on Graph Attention Network |date=October 2021 |pages=381–388 |doi=10.1109/DSC53577.2021.00060|arxiv=2202.02501 |isbn=978-1-6654-1815-7 |s2cid=246634824 }}</ref><ref>{{cite journal |last1=Ganz |first1=Tom |last2=Härterich |first2=Martin |last3=Warnecke |first3=Alexander |last4=Rieck |first4=Konrad |title=Explaining Graph Neural Networks for Vulnerability Discovery |journal=Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security |date=15 November 2021 |pages=145–156 |doi=10.1145/3474369.3486866|isbn=9781450386579 |s2cid=240001850 |doi-access=free }}</ref><ref>{{cite journal |last1=Duan |first1=Xu |last2=Wu |first2=Jingzheng |last3=Ji |first3=Shouling |last4=Rui |first4=Zhiqing |last5=Luo |first5=Tianyue |last6=Yang |first6=Mutian |last7=Wu |first7=Yanjun |title=VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities |journal=Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence |date=August 2019 |pages=4665–4671 |doi=10.24963/ijcai.2019/648|isbn=978-0-9992411-4-1 |s2cid=199466292 |doi-access=free }}</ref>


== See also ==
== See also ==

Revision as of 22:47, 21 July 2023

In computer science, a code property graph (CPG) is a computer program representation that captures syntactic structure, control flow, and data dependencies in a property graph. The concept was originally introduced to identify security vulnerabilities in C and C++ system code,[1] but has since been employed to analyze web applications,[2][3][4][5] cloud deployments,[6] and smart contracts.[7] Beyond vulnerability discovery, code property graphs find applications in code clone detection,[8][9] attack-surface detection,[10] exploit generation,[11] measuring code testability,[12] and backporting of security patches.[13]

Definition

A code property graph of a program is a graph representation of the program obtained by merging its abstract syntax trees (AST), control-flow graphs (CFG) and program dependence graphs (PDG) at statement and predicate nodes. The resulting graph is a property graph, which is the underlying graph model of graph databases such as Neo4j, JanusGraph and OrientDB where data is stored in the nodes and edges as key-value pairs. In effect, code property graphs can be stored in graph databases and queried using graph query languages.

Example

Consider the function of a C program:

void foo() {
  int x = source();
  if (x < MAX) {
    int y = 2 * x;
    sink(y);
  }
}

The code property graph of the function is obtained by merging its abstract syntax tree, control-flow graph, and program dependence graph at statements and predicates as seen in the following figure:

Code property graph of a sample C code snippet

Implementations

Joern CPG. The original code property graph was implemented for C/C++ in 2013 at University of Göttingen as part of the open-source code analysis tool Joern.[14] This original version has been discontinued and superseded by the open-source Joern Project,[15] which provides a formal code property graph specification[16] applicable to multiple programming languages. The project provides code property graph generators for C/C++, Java, Java bytecode, Kotlin, Python, JavaScript, TypeScript, LLVM bitcode, and x86 binaries (via the Ghidra disassembler).

Plume CPG. Developed at Stellenbosch University in 2020 and sponsored by Amazon Science, the open-source Plume[17] project provides a code property graph for Java bytecode compatible with the code property graph specification provided by the Joern project. The two projects merged in 2021.

Fraunhofer AISEC CPG. The Fraunhofer Institute for Applied and Integrated Security [de] provides open-source code property graph generators for C/C++, Java, Golang, and Python,[18] albeit without a formal schema specification. It also provides the Cloud Property Graph,[19] an extension of the code property graph concept that models details of cloud deployments.

Galois’ CPG for LLVM. Galois Inc. provides a code property graph based on the LLVM compiler.[20] The graph represents code at different stages of the compilation and a mapping between these representations. It follows a custom schema that is defined in its documentation.

Machine learning on code property graphs

Code property graphs provide the basis for several machine-learning-based approaches to vulnerability discovery. In particular, graph neural networks (GNN) have been employed to derive vulnerability detectors.[21][22][23][24][25][26][27]

See also

References

  1. ^ Yamaguchi, Fabian; Golde, Nico; Arp, Daniel; Rieck, Konrad (May 2014). "Modeling and Discovering Vulnerabilities with Code Property Graphs". 2014 IEEE Symposium on Security and Privacy. pp. 590–604. doi:10.1109/SP.2014.44. ISBN 978-1-4799-4686-0. S2CID 2231082.
  2. ^ Backes, Michael; Rieck, Konrad; Skoruppa, Malte; Stock, Ben; Yamaguchi, Fabian (April 2017). "Efficient and Flexible Discovery of PHP Application Vulnerabilities". 2017 IEEE European Symposium on Security and Privacy (EuroS&P). pp. 334–349. doi:10.1109/EuroSP.2017.14. ISBN 978-1-5090-5762-7. S2CID 206649536.
  3. ^ Li, Song; Kang, Mingqing; Hou, Jianwei; Cao, Yinzhi (2022). Mining Node.js Vulnerabilities via Object Dependence Graph and Query. pp. 143–160. ISBN 9781939133311.
  4. ^ Brito, Tiago; Lopes, Pedro; Santos, Nuno; Santos, José Fragoso (1 July 2022). "Wasmati: An efficient static vulnerability scanner for WebAssembly". Computers & Security. 118: 102745. arXiv:2204.12575. doi:10.1016/j.cose.2022.102745. S2CID 248405811.
  5. ^ Khodayari, Soheil; Pellegrino, Giancarlo (2021). JAW: Studying Client-side CSRF with Hybrid Property Graphs and Declarative Traversals. pp. 2525–2542. ISBN 9781939133243.
  6. ^ Banse, Christian; Kunz, Immanuel; Schneider, Angelika; Weiss, Konrad (September 2021). "Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis". 2021 IEEE 14th International Conference on Cloud Computing (CLOUD). pp. 13–19. arXiv:2206.06938. doi:10.1109/CLOUD53861.2021.00014. ISBN 978-1-6654-0060-2. S2CID 243946828.
  7. ^ Giesen, Jens-Rene; Andreina, Sebastien; Rodler, Michael; Karame, Ghassan; Davi, Lucas. "Practical Mitigation of Smart Contract Bugs | TeraFlow". www.teraflow-h2020.eu.
  8. ^ Wi, Seongil; Woo, Sijae; Whang, Joyce Jiyoung; Son, Sooel (25 April 2022). "HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs". Proceedings of the ACM Web Conference 2022: 755–766. doi:10.1145/3485447.3512235. ISBN 9781450390965. S2CID 248367462.
  9. ^ Bowman, Benjamin; Huang, H. Howie (September 2020). "VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets". 2020 IEEE European Symposium on Security and Privacy (EuroS&P). pp. 53–69. doi:10.1109/EuroSP48549.2020.00012. ISBN 978-1-7281-5087-1. S2CID 226268429.
  10. ^ Du, Xiaoning; Chen, Bihuan; Li, Yuekang; Guo, Jianmin; Zhou, Yaqin; Liu, Yang; Jiang, Yu (May 2019). "LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment Through Program Metrics". 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). pp. 60–71. arXiv:1901.11479. doi:10.1109/ICSE.2019.00024. ISBN 978-1-7281-0869-8. S2CID 59523689.
  11. ^ Alhuzali, Abeer; Gjomemo, Rigel; Eshete, Birhanu; Venkatakrishnan, V. N. (2018). NAVEX: Precise and Scalable Exploit Generation for Dynamic Web Applications. pp. 377–392. ISBN 9781939133045.
  12. ^ Al Kassar, Feras; Clerici, Giulia; Compagna, Luca; Balzarotti, Davide; Yamaguchi, Fabian. "Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications – NDSS Symposium". NDSS Symposium.
  13. ^ Shi, Youkun; Zhang, Yuan; Luo, Tianhan; Mao, Xiangyu; Cao, Yinzhi; Wang, Ziwen; Zhao, Yudi; Huang, Zongan; Yang, Min (2022). Backporting Security Patches of Web Applications: A Prototype Design and Implementation on Injection Vulnerability Patches. pp. 1993–2010. ISBN 9781939133311.
  14. ^ "Joern - A Robust Code Analysis Platform for C/C++". www.mlsec.org.
  15. ^ "Joern - The Bug Hunter's Workbench". Joern - The Bug Hunter's Workbench.
  16. ^ "Code Property Graph Specification". cpg-spec.github.io.
  17. ^ "Plume". plume-oss.github.io.
  18. ^ "Code Property Graph". Fraunhofer AISEC. 31 August 2022.
  19. ^ Banse, Christian; Kunz, Immanuel; Schneider, Angelika; Weiss, Konrad (September 2021). "Cloud Property Graph: Connecting Cloud Security Assessments with Static Code Analysis". 2021 IEEE 14th International Conference on Cloud Computing (CLOUD). pp. 13–19. arXiv:2206.06938. doi:10.1109/CLOUD53861.2021.00014. ISBN 978-1-6654-0060-2. S2CID 243946828.
  20. ^ "The Code Property Graph — MATE 0.1.0.0 documentation". galoisinc.github.io.
  21. ^ Zhou, Yaqin; Liu, Shangqing; Siow, Jingkai; Du, Xiaoning; Liu, Yang (8 December 2019). "Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks". Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc.: 10197–10207. arXiv:1909.03496.
  22. ^ Haojie, Zhang; Yujun, Li; Yiwei, Liu; Nanxin, Zhou (December 2021). "Vulmg: A Static Detection Solution for Source Code Vulnerabilities Based on Code Property Graph and Graph Attention Network". 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP). pp. 250–255. doi:10.1109/ICCWAMTIP53232.2021.9674145. ISBN 978-1-6654-1364-0. S2CID 246039350.
  23. ^ Zheng, Weining; Jiang, Yuan; Su, Xiaohong (October 2021). "Vu1SPG: Vulnerability detection based on slice property graph representation learning". 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). pp. 457–467. doi:10.1109/ISSRE52982.2021.00054. ISBN 978-1-6654-2587-2. S2CID 246751595.
  24. ^ Chakraborty, Saikat; Krishna, Rahul; Ding, Yangruibo; Ray, Baishakhi (2021). "Deep Learning based Vulnerability Detection: Are We There Yet". IEEE Transactions on Software Engineering. 48 (9): 3280–3296. arXiv:2009.07235. doi:10.1109/TSE.2021.3087402. S2CID 221703797.
  25. ^ Zhou, Li; Huang, Minhuan; Li, Yujun; Nie, Yuanping; Li, Jin; Liu, Yiwei (October 2021). "GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on Graph Attention Network". 2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC). pp. 381–388. arXiv:2202.02501. doi:10.1109/DSC53577.2021.00060. ISBN 978-1-6654-1815-7. S2CID 246634824.
  26. ^ Ganz, Tom; Härterich, Martin; Warnecke, Alexander; Rieck, Konrad (15 November 2021). "Explaining Graph Neural Networks for Vulnerability Discovery". Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security: 145–156. doi:10.1145/3474369.3486866. ISBN 9781450386579. S2CID 240001850.
  27. ^ Duan, Xu; Wu, Jingzheng; Ji, Shouling; Rui, Zhiqing; Luo, Tianyue; Yang, Mutian; Wu, Yanjun (August 2019). "VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities". Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence: 4665–4671. doi:10.24963/ijcai.2019/648. ISBN 978-0-9992411-4-1. S2CID 199466292.