Use RESTful API
Kylin 4.0 支持的 REST API 及其用法在 Apache Kylin Wiki 页面列出:
Build Cube with API
1. Authentication
Currently, Kylin uses basic authentication.
Add Authorization header to first request for authentication
Or you can do a specific request by POST
http://localhost:7070/kylin/api/user/authentication
Once authenticated, client can go subsequent requests with cookies.
POST http://localhost:7070/kylin/api/user/authentication
Authorization:Basic xxxxJD124xxxGFxxxSDF
Content-Type: application/json;charset=UTF-8
2. Get details of cube.
GET http://localhost:7070/kylin/api/cubes?
cubeName={cube_name}&limit=15&offset=0
Client can find cube segment date ranges in returned cube detail.
GET http://localhost:7070/kylin/api/cubes?
cubeName=test_kylin_cube_with_slr&limit=15&offset=0
Authorization:Basic xxxxJD124xxxGFxxxSDF
Content-Type: application/json;charset=UTF-8
3. Then submit a build job of the cube.
PUT http://localhost:7070/kylin/api/cubes/{cube_name}/rebuild
For put request body detail please refer to Build Cube API.
o startTime and endTime should be utc timestamp.
o buildType can be BUILD ,MERGE or REFRESH. BUILD is for building a new
segment, REFRESH for refreshing an existing segment. MERGE is for merging
multiple existing segments into one bigger segment.
This method will return a new created job instance, whose uuid is the unique id of job to
track job status.
PUT
http://localhost:7070/kylin/api/cubes/test_kylin_cube_with_slr/rebuild
Authorization:Basic xxxxJD124xxxGFxxxSDF
Content-Type: application/json;charset=UTF-8
{
"startTime": 0,
"endTime": 1388563200000,
"buildType": "BUILD"
}
4. Track job status.
GET http://localhost:7070/kylin/api/jobs/{job_uuid}
Returned job_status represents current status of job.
5. If the job got errors, you can resume it.
PUT http://localhost:7070/kylin/api/jobs/{job_uuid}/resume
6. Adjust the cuboid list of a cube and trigger optimize segment job
PUT http://localhost:7070/kylin/api/cubes/{cube_name}/optimize2
Backup Metadata
Kylin organizes all of its metadata (including cube descriptions and instances, projects,
inverted index description and instances, jobs, tables and dictionaries) as a hierarchy file
system. However, Kylin uses mysql to store it, rather than normal file system. If you check
your kylin configuration file(kylin.properties) you will find such a line:
## The metadata store in mysql
kylin.metadata.url=kylin_metadata@jdbc,driverClassName=com.mysql.jdbc.Dri
ver,url=jdbc:mysql://localhost:3306/kylin_database,username=,password=
This indicates that the metadata will be saved as a table called kylin_metadata in mysql
database kylin_database.
Metadata directory
Kylin metastore use resource root path + resource name + resource suffix as
key to store metadata. You can refer to the following table to use ./bin/metastore.sh.
Resource root path resource name resource suffix
/cube /cube name .json
/cube_desc /cube name .json
/cube_statistics /cube name/uuid .seq
/model_desc /model name .json
/project /project name .json
/table /DATABASE.TABLE–project name .json
Resource root path resource name resource suffix
/table_exd /DATABASE.TABLE–project name .json
/execute /job id
/execute_output /job id-step index
/user /user name
View metadata
If you want to view some metadata, you can run:
./bin/metastore.sh list /path/to/store/metadata
to list the entity stored in specified directory, and then run:
./bin/metastore.sh cat /path/to/store/entity/metadata.
to view one entity metadata.
Backup metadata with binary package
Sometimes you need to backup the Kylin’s metadata store from mysql to your disk file
system.
In such cases, assuming you’re on the hadoop CLI(or sandbox) where you deployed Kylin,
you can go to KYLIN_HOME and run :
./bin/metastore.sh backup
to dump your metadata to your local folder a folder under
KYLIN_HOME/metadata_backps, the folder is named after current time with the syntax:
KYLIN_HOME/meta_backups/meta_year_month_day_hour_minute_second
In addition, you can run:
./bin/metastore.sh fetch /path/to/store/metadata
to dump metadata selectively. For example, run ./bin/metastore.sh fetch
/cube_desc/ to get all cube desc metadata, or run ./bin/metastore.sh fetch
/cube_desc/kylin_sales_cube.json to get single cube desc metadata.
Restore metadata with binary package
In case you find your metadata store messed up, and you want to restore to a previous
backup:
Firstly, reset the metadata store (this will clean everything of the Kylin metadata store in
mysql, make sure to backup):
./bin/metastore.sh reset
Then upload the backup metadata to Kylin’s metadata store:
./bin/metastore.sh restore
$KYLIN_HOME/meta_backups/meta_xxxx_xx_xx_xx_xx_xx
Restore metadata selectively (Recommended)
If only changes a couple of metadata files, the administrator can just pick these files to
restore, without having to cover all the metadata. Compared to the full recovery, this
approach is more efficient, safer, so it is recommended.
Create a new empty directory, and then create subdirectories in it according to the location
of the metadata files to restore; for example, to restore a Cube instance, you should create a
“cube” subdirectory:
mkdir /path/to/restore_new
mkdir /path/to/restore_new/cube
Copy the metadata file to be restored to this new directory:
cp meta_backups/meta_2016_06_10_20_24_50/cube/kylin_sales_cube.json
/path/to/restore_new/cube/
At this point, you can modify/fix the metadata manually.
Restore from this directory:
cd $KYLIN_HOME
./bin/metastore.sh restore /path/to/restore_new
Only the files in the folder will be uploaded to Kylin metastore. Similarly, after the
recovery is finished, click Reload Metadata button on the Web UI to flush cache.
Backup/restore metadata in development env
When developing/debugging Kylin, typically you have a dev machine with an IDE, and a
backend sandbox. Usually you’ll write code and run test cases at dev machine. It would be
troublesome if you always have to put a binary package in the sandbox to check the
metadata. There is a helper class called SandboxMetastoreCLI to help you
download/upload metadata locally at your dev machine. Follow the Usage information and
run it in your IDE.
Cleanup Storage
Wiki :
https://cwiki.apache.org/confluence/display/KYLIN/How+to+clean+up+storage+in+Kylin
+4
Optimize Build and Query
Kylin 4 is a major architecture upgrade version, both cube building engine and query
engine use spark as calculation engine, and cube data is stored in parquet files instead of
HBase.So the build/query performance tuning is very different from Kylin 3 tuning.
About the build/query performance tuning of Apache Kylin4.0, Please refer to:
How to improve cube building and query performance of Apache Kylin4.0.
At the same time, you can refer to kylin4.0 user’s optimization practice blog:
why did Youzan choose Kylin4
Config different spark Pool for different
types of SQL
Please check document: Use different spark pool for different query
Upgrade From Old Versions
Compared with Kylin 3.x and previous versions, Kylin 4.0’s storage engine has changed
from HBase to Parquet. Therefore, if you need to upgrade from Kylin 3.x and previous
versions to kylin4.0, the built cuboid data can’t be upgraded, you can only upgrade
metadata.
Please refer to : How to migrate metadata to Kylin 4
Use Utility CLIs
Kylin has some client utility tools. This document will introduce the following class:
KylinConfigCLI.java, CubeMetaExtractor.java, CubeMetaIngester.java,
CubeMigrationCLI.java and CubeMigrationCheckCLI.java. Before using these tools, you
have to switch to the KYLIN_HOME directory.
KylinConfigCLI.java
Function
KylinConfigCLI.java outputs the value of Kylin properties.
How to use
After the class name, you can only write one parameter, conf_name which is the parameter
name that you want to know its value.
./bin/kylin.sh org.apache.kylin.tool.KylinConfigCLI <conf_name>
For example:
./bin/kylin.sh org.apache.kylin.tool.KylinConfigCLI kylin.server.mode
Result:
all
If you do not know the full parameter name, you can use the following command, then all
parameters prefixed by this prefix will be listed:
./bin/kylin.sh org.apache.kylin.tool.KylinConfigCLI <prefix>.
For example:
./bin/kylin.sh org.apache.kylin.tool.KylinConfigCLI kylin.job.
Result:
max-concurrent-jobs=10
retry=3
sampling-percentage=100
CubeMetaExtractor.java
Function
CubeMetaExtractor.java is to extract Cube related info for debugging / distributing
purpose.
How to use
At least two parameters should be followed.
./bin/kylin.sh org.apache.kylin.tool.CubeMetaExtractor -<conf_name>
<conf_value> -destDir <your_dest_dir>
For example:
./bin/kylin.sh org.apache.kylin.tool.CubeMetaExtractor -cube
kylin_sales_cube -destDir /tmp/kylin_sales_cube
Result:
After the command is executed, the cube, project or hybrid you want to extract will be
dumped in the specified path.
All supported parameters are listed below:
Parameter Description
allProjects Specify realizations in all projects to extract
compress Specify whether to compress the output with zip. Default true.
cube Specify which Cube to extract
destDir (Required) Specify the dest dir to save the related information
hybrid Specify which hybrid to extract
includeJobs Set this to true if want to extract job info/outputs too. Default false
Set this to true if want to extract segment details too, such as dict,
includeSegmentDetails
tablesnapshot. Default false
includeSegments Set this to true if want extract the segments info. Default true
onlyOutput When include jobs, only extract output of job. Default true
packagetype Specify the package type
project Which project to extract
CubeMetaIngester.java
Function
CubeMetaIngester.java is to ingest the extracted cube meta data into another metadata
store. It only supports ingest cube now.
How to use
At least two parameters should be specified. Please make sure the cube you want to ingest
does not exist in the target project.
Note: The zip file must contain only one directory after it has been decompressed.
./bin/kylin.sh org.apache.kylin.tool.CubeMetaIngester -project
<target_project> -srcPath <your_src_dir>
For example:
./bin/kylin.sh org.apache.kylin.tool.CubeMetaIngester -project querytest
-srcPath /tmp/newconfigdir1/cubes.zip
Result:
After the command is successfully executed, the cube you want to ingest will exist in the
srcPath.
All supported parameters are listed below:
Parameter Description
Skip the target Cube, model and table check and ingest by force. Use in caution
forceIngest because it might break existing cubes! Suggest to backup metadata store first.
Default false.
If table meta conflicts, overwrite the one in metadata store with the one in
overwriteTables srcPath. Use in caution because it might break existing cubes! Suggest to backup
metadata store first. Default false.
project (Required) Specify the target project for the new cubes.
srcPath (Required) Specify the path to the extracted Cube metadata zip file.
CubeMigrationCLI.java
Function
Apache Kylin have provided migration tool to support migrating metadata across different
clusters since version 2.0. Recently, we have refined and added new ability to
CubeMigration tool, The list of enhanced functions is showed as below:
- Support migrating all cubes in source cluster
- Support migrating a whole project in source cluster
- Support migrating and upgrading metadata from older version to Kylin 4
How to use
Please check: How to migrate metadata to Kylin4
Secure with LDAP and SSO
Enable LDAP authentication
Kylin supports LDAP authentication for enterprise or production deployment; This is
implemented with Spring Security framework; Before enable LDAP, please contact your
LDAP administrator to get necessary information, like LDAP server URL,
username/password, search patterns;
Configure LDAP server info
Firstly, provide LDAP URL, and username/password if the LDAP server is secured; The
password in kylin.properties need be encrypted; You can run the following command to get
the encrypted value:
cd $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib
java -classpath kylin-server-base-\<versioin\>.jar:kylin-core-
common-\<versioin\>.jar:spring-beans-4.3.10.RELEASE.jar:spring-core-
4.3.10.RELEASE.jar:commons-codec-1.7.jar
org.apache.kylin.rest.security.PasswordPlaceholderConfigurer AES
<your_password>
Config them in the conf/kylin.properties. When you use the customized CA certificate
library for user authentication based on LDAPs, you need to configure
‘kylin.security.ldap.connection-truststore’, the value of this configuration will be added to
the JVM parameter javax.net.ssl.trustStore:
kylin.security.ldap.connection-server=ldap://<your_ldap_host>:<port>
kylin.security.ldap.connection-username=<your_user_name>
kylin.security.ldap.connection-password=<your_password_encrypted>
kylin.security.ldap.connection-
truststore=<your_customized_CA_certificate_library>
Secondly, provide the user search patterns, this is by LDAP design, here is just a sample:
kylin.security.ldap.user-search-base=OU=UserAccounts,DC=mycompany,DC=com
kylin.security.ldap.user-search-pattern=(&(cn={0})(memberOf=CN=MYCOMPANY-
USERS,DC=mycompany,DC=com))
kylin.security.ldap.user-group-search-base=OU=Group,DC=mycompany,DC=com
If you have service accounts (e.g, for system integration) which also need be authenticated,
configure them in kylin.security.ldap.service-.*; Otherwise, leave them be empty;
Configure the administrator group
To map an LDAP group to the admin group in Kylin, need set the
“kylin.security.acl.admin-role” to the LDAP group name (shall keep the original case), and
the users in this group will be global admin in Kylin.
For example, in LDAP the group “KYLIN-ADMIN-GROUP” is the list of administrators,
here need set it as:
kylin.security.acl.admin-role=KYLIN-ADMIN-GROUP
Attention: When upgrading from Kylin 2.3 ealier version to 2.3 or later, please remove the
“ROLE_” in this setting as this required in the 2.3 earlier version and keep the group
name in original case. And the kylin.security.acl.default-role is deprecated.
Enable LDAP
Set “kylin.security.profile=ldap” in conf/kylin.properties, then restart Kylin server.
Enable SSO authentication
From v1.5, Kylin provides SSO with SAML. The implementation is based on Spring
Security SAML Extension. You can read this reference to get an overall understand.
Before trying this, you should have successfully enabled LDAP and managed users with it,
as SSO server may only do authentication, Kylin need search LDAP to get the user’s detail
information.
Generate IDP metadata xml
Contact your IDP (ID provider), asking to generate the SSO metadata file; Usually you
need provide three piece of info:
1. Partner entity ID, which is an unique ID of your app, e.g,: https://host-
name/kylin/saml/metadata
2. App callback endpoint, to which the SAML assertion be posted, it need be: https://host-
name/kylin/saml/SSO
3. Public certificate of Kylin server, the SSO server will encrypt the message with it.
Generate JKS keystore for Kylin
As Kylin need send encrypted message (signed with Kylin’s private key) to SSO server, a
keystore (JKS) need be provided. There are a couple ways to generate the keystore, below
is a sample.
Assume kylin.crt is the public certificate file, kylin.key is the private certificate file; firstly
create a PKCS#12 file with openssl, then convert it to JKS with keytool:
$ openssl pkcs12 -export -in kylin.crt -inkey kylin.key -out kylin.p12
Enter Export Password: <export_pwd>
Verifying - Enter Export Password: <export_pwd>
$ keytool -importkeystore -srckeystore kylin.p12 -srcstoretype PKCS12
-srcstorepass <export_pwd> -alias 1 -destkeystore samlKeystore.jks
-destalias kylin -destkeypass changeit
Enter destination keystore password: changeit
Re-enter new password: changeit
It will put the keys to “samlKeystore.jks” with alias “kylin”;
Enable Higher Ciphers
Make sure your environment is ready to handle higher level crypto keys, you may need to
download Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files,
copy local_policy.jar and US_export_policy.jar to $JAVA_HOME/jre/lib/security .
Deploy IDP xml file and keystore to Kylin
The IDP metadata and keystore file need be deployed in Kylin web app’s classpath in
$KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/classes
1. Name the IDP file to sso_metadata.xml and then copy to Kylin’s classpath;
2. Name the keystore as “samlKeystore.jks” and then copy to Kylin’s classpath;
3. If you use another alias or password, remember to update that kylinSecurity.xml
accordingly:
<!-- Central storage of cryptographic keys -->
<bean id="keyManager"
class="org.springframework.security.saml.key.JKSKeyManager">
<constructor-arg value="classpath:samlKeystore.jks"/>
<constructor-arg type="java.lang.String" value="changeit"/>
<constructor-arg>
<map>
<entry key="kylin" value="changeit"/>
</map>
</constructor-arg>
<constructor-arg type="java.lang.String" value="kylin"/>
</bean>
Other configurations
In conf/kylin.properties, add the following properties with your server information:
saml.metadata.entityBaseURL=https://host-name/kylin
saml.context.scheme=https
saml.context.serverName=host-name
saml.context.serverPort=443
saml.context.contextPath=/kylin
Please note, Kylin assume in the SAML message there is a “email” attribute representing
the login user, and the name before @ will be used to search LDAP.
Enable SSO
Set “kylin.security.profile=saml” in conf/kylin.properties, then restart Kylin server; After
that, type a URL like “/kylin” or “/kylin/cubes” will redirect to SSO for login, and jump
back after be authorized. While login with LDAP is still available, you can type
“/kylin/login” to use original way. The Rest API (/kylin/api/*) still use LDAP + basic
authentication, no impact.
Install Ranger Plugin
Please refer to https://cwiki.apache.org/confluence/display/RANGER/Kylin+Plugin.
Enable Zookeeper ACL
Edit $KYLIN_HOME/conf/kylin.properties to add following configuration item:
Add “kylin.env.zookeeper.zk-auth”. It is the configuration item you can specify the
zookeeper authenticated information. Its formats is “scheme:id”. The value of
scheme that the zookeeper supports is “world”, “auth”, “digest”, “ip” or “super”.
The “id” is the authenticated information of the scheme. For example:
kylin.env.zookeeper.zk-auth=digest:ADMIN:KYLIN
The scheme equals to “digest”. The id equals to “ADMIN:KYLIN”, which
expresses the “username:password”.
Add “kylin.env.zookeeper.zk-acl”. It is the configuration item you can set access
permission. Its formats is “scheme:id:permissions”. The value of permissions that
the zookeeper supports is “READ”, “WRITE”, “CREATE”, “DELETE” or
“ADMIN”. For example, we configure that everyone has all the permissions:
kylin.env.zookeeper.zk-acl=world:anyone:rwcda
The scheme equals to “world”. The id equals to “anyone” and the permissions
equals to “rwcda”.