About Me

Fremont, CA, United States

Thursday, December 19, 2024

Apache Ranger, Solr & Trino

What is Apache Ranger

Apache Ranger is an open-source data security and governance framework designed for managing and enforcing policies for accessing and protecting sensitive data within a Hadoop ecosystem. It provides a centralized platform to define, administer, and audit access control policies across various Hadoop services, ensuring robust data governance and compliance.

Key Features of Apache Ranger:

  • Centralized Policy Management:
    • Allows administrators to define fine-grained access control policies for multiple Hadoop components such as HDFS, Hive, HBase, Kafka, and others.
  • Audit and Reporting :
      • Captures and logs all user access attempts, including successful and failed ones, and provides detailed audit trails.

      • Integration with tools like Solr, Elasticsearch and Kibana for analyzing and visualizing audit logs.
    • Core Components of Apache Ranger:

      • Ranger Admin:
        • A centralized web interface to create, update, and manage security policies.
        • Administrators can define policies for various Hadoop services.
      • Ranger Plugins:
        • Lightweight components installed on individual Hadoop services (e.g., Hive, HDFS, HBase, Kafka, Trino/Presto).
        • Enforce security policies in real-time by intercepting access requests and checking them against defined policies.
      • User Sync Service:
        • Synchronizes users and groups from external directories like LDAP or Active Directory into Apache Ranger.
      • Audit Framework:
        • Collects and stores detailed audit logs of access events, policy evaluations, and security violations.

 

Installating Apache Ranger

Apache Ranger Quick Install Guide 

On host where you want to install Apache Ranger on Ubuntu 20.04

You may have to install git, JAVA ( jdk8) , mvn, python3 if not already installed on your system
create two unix users ranger & solr
e.g. 
sudo adduser ranger
sudo adduser solr
## su to ranger account
## Download Ranger repo and build
su - ranger
mkdir ranger 
cd ranger
git clone https://gitbox.apache.org/repos/asf/ranger.git
## set JAVA_HOME as per your location
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre

mvn -Pall clean 
mvn -Pall -DskipTests=false clean compile package install
After the above build command execution, you should see the following TAR files in the target folder. You will see many tar.gz files in target folder and also base folder as well.
In my case, I see ranger-3.0.0-SNAPSHOT-admin.tar.gz, it could be different version based on current latest version
I have renamed ranger-3.0.0-SNAPSHOT-admin.tar.gz to ranger-3.0.0-admin.tar.gz
Ranger Admin Tool Component (ranger-%version-number%-admin.tar.gz) should be 
installed on a host where Policy Admin Tool web application runs on port 6080 (default).
mkdir -p /usr/lib/ranger/
##copy ranger-3.0.0-admin.tar.gz to /usr/lib/ranger/
cp ranger-3.0.0-admin.tar.gz /usr/lib/ranger/ ## Before we install Apache Ranger Admin, install solr
## on Base folder where you build/ran mvn command, goto following folder security-admin/contrib/solr_for_audit_setup and edit install.properties file ( this is for solr install) edit install.properties SOLR_USER=solr SOLR_GROUP=solr SOLR_DOWNLOAD_URL=http://archive.apache.org/dist/lucene/solr/8.9.0/solr-8.9.0.tgz SOLR_INSTALL=true ## save & exit install.properties ## execute ./setup.sh to install solr ./setup.sh ## Following step is due to error in Solr. This error is shown you goto Solr URL ## and See error in Dashboard. ##ranger_audits: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error initializing QueryElevationComponent cp -p /opt/solr/example/files/conf/elevate.xml /opt/solr/ranger_audit_server/ranger_audits/conf/ ## ## Start Solr /opt/solr/ranger_audit_server/scripts/start_solr.sh ## Test solr URL is working ## http://host-name:6083/solr ## To Stop Solr /opt/solr/ranger_audit_server/scripts/stop_solr.sh
Solr Dashboard
Solr Dashboard

## Install Apache Ranger Admin ## Database Pre-req ## Before you can install Apache Ranger Admin, you either need an existing MySQL/Postgres ## Database or create new one to hold metadata for Apache Ranger ## In my case , I have mysql DB running on same host ## On MySQL database, create a empty database ranger and also create user rangeradmi ## MySQL version 8.0.x ## Was getting issue with default char set utfmb4 so I choose utf8mb3 for ranger admin ## install script to work CREATE DATABASE `ranger` DEFAULT CHARACTER SET utf8mb3 ; CREATE USER `rangeradmin`@`%` IDENTIFIED WITH 'mysql_native_password' YourPassword ; grant all on *.ranger TO `rangeradmin`@`%` ; ## Database pre-req end cd /usr/lib/ranger/ tar -xvfz ranger-3.0.0-admin.tar.gz ## tar will create ranger-3.0.0-admin folder under /usr/lib/ranger/
cd /usr/lib/ranger/ranger-3.0.0-admin
## You need to edit install.properties file

## Change following lines based on your setting
## You need database 
DB_FLAVOR=MYSQL
## MySQL JDBC connector, you may need to download this 
## You can download MySQL connector from here
## create a symlink 
## cd /usr/share/java
## ln -s /usr/share/java/mysql-connector-j-8.0.33.jar mysql-connector-java.jar
SQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar

db_root_user=root
db_root_password=****** db_host=localhost # # DB UserId used for the Ranger schema # We did create ranger database and rangeradmin user earlier in pre-req db_name=ranger db_user=rangeradmin db_password=**** ## # * audit_store is solr audit_store=solr audit_solr_urls=http://localhost:6083/solr/ranger_audits audit_solr_user=solr ## save & exit install.properties file Execute ./setup.sh In case if you setup.sh has error in following step. 2024-12-16 20:23:56,315 [I] --------- Verifying Ranger DB connection --------- 2024-12-16 20:23:56,315 [I] Checking connection.. 2024-12-16 20:23:56,315 [JISQL] /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/lib/ranger/ranger-3.0.0-admin/jisql/lib/* org.apache.util.sql.Jisql -driver mysqlconj -cstring jdbc:mysql://localhost/ranger?useSSL=false -u 'rangeradmin' -p '********' -noheader -trim -c \; -query "select 1;" Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 2024-12-16 20:23:56,756 [I] Checking connection failed. It is due to fact allowPublicKeyRetrieval=true to be added and it is done by editing db_setup.py file in same folder edit db_setup.py file and change ( add allowPublicKeyRetrieval=true ) in the following line
if "useSSL" not in db_name: db_ssl_param="useSSL=false" To if "useSSL" not in db_name: db_ssl_param="?allowPublicKeyRetrieval=true&useSSL=false" ########## ### setup.sh will install ranger-admin, you will see /usr/bin/ranger-admin
### To start/stop ranger admin /usr/bin/ranger-admin start/stop ## Check Ranger Admin URL ##http://your-ranger-host:6080/ ## Rager admin default user is admin and password is also admin ## You have successfully installed Ranger Admin :)

Trino Apache Ranger Plug-in enable
You will see Trino in Apache Ranger Service Manager, click on + sign to add Trino You need to have handly your Trino Server JDBC URL ## My existing setup of Trino doesn't have any auth enabled ## We will enable to audit for any query run







Under Ranger Admin Dashboard, you will see Resource Policies , click there, you will see Trino as well.
You can edit Policies here e.g. I have added extra Users , by default only admin was there. I have added {USER}. I have added {USER} so any user can impersonate e.g. in trino
## Apache Ranger has users, groups, roles etc and it has integration with AD, LDAP etc as well
## Just to keep things simple, you can create users manually instead of just admin only.
trino --user=sanjay





 Trino Server Side configuration

Learn about Trino


Trino Version 466 or 467 , See this blog  Now Trino has plug-in for Apache Ranger and it is no longer dependent on Apache Ranger Java version.

configure access-control.properties on co-ordinator host of Trino

cat access-control.properties


access-control.name=ranger
ranger.service.name=trino
ranger.plugin.config.resource=etc/ranger-trino-security.xml,etc/ranger-trino-audit.xml

####

cat ranger-trino-audit.xml


<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>xasecure.audit.is.enabled</name>
<value>true</value>
<description>Boolean flag to specify if the plugin should generate access audit logs. Default: true</description>
</property>


<property>
<name>xasecure.audit.solr.is.enabled</name>
<value>true</value>
<description>Boolean flag to specify if audit logs should be stored in Solr. Default: false</description>
</property>
<property>
<name>xasecure.audit.solr.solr_url</name>
<value>http://your-host-where-solr-installed:6083/solr/ranger_audits</value>
<description>URL to Solr deployment where the plugin should send access audits to</description>
</property>

</configuration>

### Restart Trino Server

### See Audit logs on Ranger Dashboard, there is audit, you will start seeing audit logs


No comments: